







参考文章:你尝试过更改为 .zip 之后解压 .xlsx 文件拆解 Excel 的储存原理吗?


Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy. This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time. You can provide a template workbook which is used as basis for the written data. See poi.apache.org/spreadsheet... for details. Please note that there are still things that still may consume a large amount of memory based on which features you are using, e.g. merged regions, comments, ... are still only stored in memory and thus may require a lot of memory if used extensively. SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled. Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.\

实现"BigGridDemo"策略的 XSSFWorkbook 的流式版本。这允许写入非常大的文件而不会耗尽内存,因为任何时候只有行的可配置部分保留在内存中。您可以提供一个模板工作簿,用作书面数据的基础。有关详细信息,请参阅 poi.apache.org/spreadsheet... SXSSFWorkbook 默认使用内联字符串而不是共享字符串表。这是非常有效的,因为不需要将文档内容保存在内存中,但也已知会生成与某些客户端不兼容的文档。启用共享字符串后,文档中的所有唯一字符串都必须保存在内存中。根据您的文档内容,这可能会使用比禁用共享字符串更多的资源。在决定是否启用共享字符串之前,请仔细检查您的内存预算和兼容性需求。

java 复制代码
     * Constructs an workbook from an existing workbook.
     * <p>
     * When a new node is created via {@link SXSSFSheet#createRow} and the total number
     * of unflushed records would exceed the specified value, then the
     * row with the lowest index value is flushed and cannot be accessed
     * via {@link SXSSFSheet#getRow} anymore.
     * </p>
     * <p>
     * A value of <code>-1</code> indicates unlimited access. In this case all
     * records that have not been flushed by a call to <code>flush()</code> are available
     * for random access.
     * </p>
     * <p>
     * A value of <code>0</code> is not allowed because it would flush any newly created row
     * without having a chance to specify any cells.
     * </p>
     * @param workbook  the template workbook
     * @param rowAccessWindowSize the number of rows that are kept in memory until flushed out, see above.
     * @param compressTmpFiles whether to use gzip compression for temporary files
     * @param useSharedStringsTable whether to use a shared strings table
    public SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles, boolean useSharedStringsTable) {
        if (workbook == null) {
            _wb = new XSSFWorkbook();
            _sharedStringSource = useSharedStringsTable ? _wb.getSharedStringSource() : null;
        } else {
            _sharedStringSource = useSharedStringsTable ? _wb.getSharedStringSource() : null;
            for ( Sheet sheet : _wb ) {
                createAndRegisterSXSSFSheet( (XSSFSheet)sheet );


java 复制代码
    XSSFWorkbook workbook,  
    int rowAccessWindowSize,  
    boolean compressTmpFiles,  
    boolean useSharedStringsTable


· workbook - 参数workbook对象很好理解,就是实际excel操作的对象--XSSFWorkbook

· rowAccessWindowSize - the number of rows that are kept in memory until flushed out,内存中保留的数据行数,超过这个数,就把数据写入磁盘

· compressTmpFiles - whether to use gzip compression for temporary files,是否压缩生成的临时文件

· useSharedStringsTable - whether to use a shared strings table,是否使用共享字符串表(这个参数正是我所遇到问题的正在症结所在,且是在使用过程中很少被关注的一个参数,度娘上搜索SXSSFWorkbook时,也很少有人关注,大家关注的重点都是rowAccessWindowSize

java 复制代码
_sharedStringSource = useSharedStringsTable ? _wb.getSharedStringSource() : null;

查看根构造发现useSharedStringsTable=true时,其实就是通过workbook对象获取对象SharedStringsTable, 那这个SharedStringsTable到底是个什么东西呢?

java 复制代码
     * shared string table - a cache of strings in this workbook
    protected final SharedStringsTable _sharedStringSource;


Table of strings shared across all sheets in a workbook.A workbook may contain thousands of cells containing string (non-numeric) data. Furthermore this data is very likely to be repeated across many rows or columns. The goal of implementing a single string table that is shared across the workbook is to improve performance in opening and saving the file by only reading and writing the repetitive information once.Consider for example a workbook summarizing information for cities within various countries. There may be a column for the name of the country, a column for the name of each city in that country, and a column containing the data for each city. In this case the country name is repetitive, being duplicated in many cells. In many cases the repetition is extensive, and a tremendous savings is realized by making use of a shared string table when saving the workbook. When displaying text in the spreadsheet, the cell table will just contain an index into the string table as the value of a cell, instead of the full string.The shared string table contains all the necessary information for displaying the string: the text, formatting properties, and phonetic properties (for East Asian languages).





而当useSharedStringsTable=true时,生成的excel文件解压缩后与标准的文件结构有差异(请容许我偷个懒,给你个机会自己看清楚 ),这种情况导出来的Excel文件内容展示都是正常的,在保存一次后,最终压缩包解压后格式就又与保持一致了。




