Excel文件解析 - SAX startRow cell endRow 执行顺序

好的！让我详细解释 SAX 解析 Excel 时 startRow、endRow、cell 这三个方法的调用顺序和作用。

SAX 事件驱动机制

1. 基本概念

SAX（Simple API for XML）是一种事件驱动的 XML 解析方式。解析器在读取 XML 文件时，会根据遇到的不同元素触发不同的回调方法。

2. Excel XML 结构

Excel (.xlsx) 文件的 sheet 部分本质上是 XML 格式：

xml 复制代码

<?xml version="1.0" encoding="UTF-8"?>
<worksheet>
    <sheetData>
        <row r="1">                              <!-- 第1行开始 -->
            <c r="A1" t="s">                    <!-- A1单元格开始 -->
                <v>0</v>                        <!-- 单元格值 -->
            </c>                                <!-- A1单元格结束 -->
            <c r="B1">                          <!-- B1单元格开始 -->
                <v>张三</v>                     <!-- 单元格值 -->
            </c>                                <!-- B1单元格结束 -->
            <c r="C1">                          <!-- C1单元格开始 -->
                <v>25</v>                       <!-- 单元格值 -->
            </c>                                <!-- C1单元格结束 -->
        </row>                                  <!-- 第1行结束 -->
        <row r="2">                              <!-- 第2行开始 -->
            <c r="A1">                          <!-- A2单元格开始 -->
                <v>李四</v>
            </c>
            <c r="B1">                          <!-- B2单元格开始 -->
                <v>30</v>
            </c>
        </row>                                  <!-- 第2行结束 -->
    </sheetData>
</worksheet>

方法调用顺序

完整调用流程

复制代码

读取 Excel XML 文件
    ↓
遇到 <row r="1">
    ↓
调用 startRow(0)  ← 行号从0开始
    ↓
遇到 <c r="A1">
    ↓
调用 cell("A1", "张三", null)
    ↓
遇到 </c>
    ↓
遇到 <c r="B1">
    ↓
调用 cell("B1", "25", null)
    ↓
遇到 </c>
    ↓
遇到 <c r="C1">
    ↓
调用 cell("C1", "test@qq.com", null)
    ↓
遇到 </c>
    ↓
遇到 </row>
    ↓
调用 endRow(0)
    ↓
遇到 <row r="2">
    ↓
调用 startRow(1)
    ↓
遇到 <c r="A2">
    ↓
调用 cell("A2", "李四", null)
    ↓
遇到 </c>
    ↓
遇到 <c r="B2">
    ↓
调用 cell("B2", "30", null)
    ↓
遇到 </c>
    ↓
遇到 </row>
    ↓
调用 endRow(1)
    ↓
... 继续读取下一行

时间线示例

java 复制代码

// 假设 Excel 有以下数据：
// A1: 姓名, B1: 年龄, C1: 邮箱
// A2: 张三, B2: 25, C2: test@qq.com
// A3: 李四, B3: 30, C3: test2@qq.com

// 执行顺序：
t1: startRow(0)           // 第1行开始
t2: cell("A1", "姓名", null)
t3: cell("B1", "年龄", null)
t4: cell("C1", "邮箱", null)
t5: endRow(0)             // 第1行结束

t6: startRow(1)           // 第2行开始
t7: cell("A2", "张三", null)
t8: cell("B2", "25", null)
t9: cell("C2", "test@qq.com", null)
t10: endRow(1)            // 第2行结束

t11: startRow(2)          // 第3行开始
t12: cell("A3", "李四", null)
t13: cell("B3", "30", null)
t14: cell("C3", "test2@qq.com", null)
t15: endRow(2)            // 第3行结束

各方法详细说明

1. startRow(int rowNum)

作用：

当解析器遇到 <row> 标签时调用
表示新的一行开始
rowNum 是行号（从 0 开始）

典型用法：

java 复制代码

@Override
public void startRow(int rowNum) {
    System.out.println("开始处理第 " + (rowNum + 1) + " 行");
    
    // 初始化当前行的数据结构
    currentRowData = new LinkedHashMap<>();
    currentRowNum = rowNum;
    
    // 检查是否需要停止
    if (rowNum >= maxRowsToProcess) {
        shouldStop = true;
    }
}

参数说明：

rowNum：行号，从 0 开始
- Excel 的第 1 行 → rowNum = 0
- Excel 的第 2 行 → rowNum = 1
- ...

2. cell(String cellReference, String formattedValue, XSSFComment comment)

作用：

当解析器遇到 <c> 标签时调用
表示一个单元格的数据
在 startRow 和 endRow 之间可能被调用多次

典型用法：

java 复制代码

@Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
    // cellReference: 单元格位置，如 "A1", "B2", "C10"
    // formattedValue: 格式化后的值（POI 已处理共享字符串、数字格式等）
    // comment: 批注（如果有）
    
    System.out.println("单元格 " + cellReference + ": " + formattedValue);
    
    // 解析列名
    String columnName = parseColumnName(cellReference);  // "A1" -> "A"
    
    // 存储当前行的数据
    currentRowData.put(columnName, formattedValue);
}

参数说明：

cellReference：单元格引用，如 "A1", "B2", "C10"
formattedValue：格式化后的值
- 字符串类型：直接返回字符串
- 数字类型：已格式化的字符串（如 "100", "3.14"）
- 日期类型：已格式化的日期字符串
- 布尔值："true" 或 "false"
comment：单元格批注（如果没有则为 null）

3. endRow(int rowNum)

作用：

当解析器遇到 </row> 标签时调用
表示一行结束
通常在这里处理整行数据

典型用法：

java 复制代码

@Override
public void endRow(int rowNum) {
    System.out.println("第 " + (rowNum + 1) + " 行结束");
    
    // 检查是否是表头行
    if (rowNum == headerRowNum) {
        parseHeader(currentRowData);
    } else {
        // 处理数据行
        processDataRow(rowNum, currentRowData);
    }
    
    // 清空当前行数据
    currentRowData.clear();
}

参数说明：

rowNum：行号（与 startRow 的参数一致）

完整示例

示例1：收集所有数据

java 复制代码

/**
 * 收集 Excel 所有数据的处理器
 */
private static class DataCollectingHandler implements XSSFSheetXMLHandler.SheetContentsHandler {
    private List<Map<String, String>> allData = new ArrayList<>();
    private Map<String, String> currentRow = new LinkedHashMap<>();
    private int currentRowNum = 0;
    
    @Override
    public void startRow(int rowNum) {
        currentRowNum = rowNum;
        currentRow = new LinkedHashMap<>();
        System.out.println("=== 开始第 " + (rowNum + 1) + " 行 ===");
    }
    
    @Override
    public void cell(String cellReference, String formattedValue, XSSFComment comment) {
        String columnName = parseColumnName(cellReference);
        System.out.println("  单元格 " + cellReference + ": " + formattedValue);
        currentRow.put(columnName, formattedValue);
    }
    
    @Override
    public void endRow(int rowNum) {
        System.out.println("=== 结束第 " + (rowNum + 1) + " 行 ===");
        if (!currentRow.isEmpty()) {
            allData.add(new LinkedHashMap<>(currentRow));
        }
        currentRow.clear();
    }
    
    @Override
    public void headerFooter(String text, boolean isHeader, String tagName) {
        // 忽略页眉页脚
    }
    
    private String parseColumnName(String cellReference) {
        if (cellReference == null || cellReference.isEmpty()) {
            return "";
        }
        return cellReference.replaceAll("[0-9]", "");
    }
    
    public List<Map<String, String>> getAllData() {
        return allData;
    }
}

执行输出：

复制代码

=== 开始第 1 行 ===
  单元格 A1: 姓名
  单元格 B1: 年龄
  单元格 C1: 邮箱
=== 结束第 1 行 ===
=== 开始第 2 行 ===
  单元格 A2: 张三
  单元格 B2: 25
  单元格 C2: test@qq.com
=== 结束第 2 行 ===
=== 开始第 3 行 ===
  单元格 A3: 李四
  单元格 B3: 30
  单元格 C3: test2@qq.com
=== 结束第 3 行 ===

示例2：跳过表头，只处理数据行

java 复制代码

/**
 * 跳过表头，只处理数据行的处理器
 */
private static class SkipHeaderHandler implements XSSFSheetXMLHandler.SheetContentsHandler {
    private List<Map<String, String>> dataRows = new ArrayList<>();
    private Map<String, String> currentRow = new LinkedHashMap<>();
    private Map<String, String> headerMapping = new LinkedHashMap<>();  // 列名映射
    private int headerRowNum = 0;  // 表头在第 0 行
    private int currentRowNum = 0;
    
    @Override
    public void startRow(int rowNum) {
        currentRowNum = rowNum;
        currentRow = new LinkedHashMap<>();
    }
    
    @Override
    public void cell(String cellReference, String formattedValue, XSSFComment comment) {
        String columnName = parseColumnName(cellReference);
        currentRow.put(columnName, formattedValue);
    }
    
    @Override
    public void endRow(int rowNum) {
        if (rowNum == headerRowNum) {
            // 这是表头行，建立列名映射
            System.out.println("解析表头: " + currentRow);
            headerMapping.putAll(currentRow);
        } else if (rowNum > headerRowNum) {
            // 这是数据行，使用表头映射
            Map<String, String> rowData = mapRowWithHeader(currentRow, headerMapping);
            dataRows.add(rowData);
            System.out.println("数据行: " + rowData);
        }
        currentRow.clear();
    }
    
    @Override
    public void headerFooter(String text, boolean isHeader, String tagName) {
        // 忽略
    }
    
    private String parseColumnName(String cellReference) {
        if (cellReference == null || cellReference.isEmpty()) {
            return "";
        }
        return cellReference.replaceAll("[0-9]", "");
    }
    
    private Map<String, String> mapRowWithHeader(Map<String, String> row, Map<String, String> header) {
        Map<String, String> result = new LinkedHashMap<>();
        for (Map.Entry<String, String> entry : header.entrySet()) {
            String columnLetter = entry.getKey();    // "A", "B", "C"
            String columnName = entry.getValue();     // "姓名", "年龄", "邮箱"
            String value = row.get(columnLetter);    // "张三", "25", "test@qq.com"
            result.put(columnName, value);
        }
        return result;
    }
    
    public List<Map<String, String>> getDataRows() {
        return dataRows;
    }
}

示例3：限制处理行数

java 复制代码

/**
 * 限制处理行数的处理器
 */
private static class LimitRowsHandler implements XSSFSheetXMLHandler.SheetContentsHandler {
    private int maxRows;
    private int currentRowNum = 0;
    private List<Map<String, String>> data = new ArrayList<>();
    private Map<String, String> currentRow = new LinkedHashMap<>();
    private boolean shouldStop = false;
    
    public LimitRowsHandler(int maxRows) {
        this.maxRows = maxRows;
    }
    
    @Override
    public void startRow(int rowNum) {
        if (shouldStop) {
            return;  // 提前返回，不处理
        }
        
        currentRowNum = rowNum;
        currentRow = new LinkedHashMap<>();
        
        // 检查是否超过最大行数
        if (rowNum >= maxRows) {
            shouldStop = true;
            System.out.println("已达到最大行数 " + maxRows + "，停止处理");
        }
    }
    
    @Override
    public void cell(String cellReference, String formattedValue, XSSFComment comment) {
        if (shouldStop) {
            return;  // 提前返回，不处理
        }
        
        String columnName = parseColumnName(cellReference);
        currentRow.put(columnName, formattedValue);
    }
    
    @Override
    public void endRow(int rowNum) {
        if (shouldStop) {
            return;  // 提前返回，不处理
        }
        
        if (!currentRow.isEmpty()) {
            data.add(new LinkedHashMap<>(currentRow));
        }
        currentRow.clear();
    }
    
    @Override
    public void headerFooter(String text, boolean isHeader, String tagName) {
        // 忽略
    }
    
    private String parseColumnName(String cellReference) {
        if (cellReference == null || cellReference.isEmpty()) {
            return "";
        }
        return cellReference.replaceAll("[0-9]", "");
    }
    
    public List<Map<String, String>> getData() {
        return data;
    }
}

示例4：统计单元格数量

java 复制代码

/**
 * 统计单元格数量的处理器
 */
private static class CellCountHandler implements XSSFSheetXMLHandler.SheetContentsHandler {
    private int totalRows = 0;
    private int totalCells = 0;
    private int currentRowCells = 0;
    
    @Override
    public void startRow(int rowNum) {
        currentRowCells = 0;
        System.out.println("第 " + (rowNum + 1) + " 行开始");
    }
    
    @Override
    public void cell(String cellReference, String formattedValue, XSSFComment comment) {
        currentRowCells++;
        totalCells++;
        System.out.println("  单元格 " + cellReference + ": " + formattedValue);
    }
    
    @Override
    public void endRow(int rowNum) {
        totalRows++;
        System.out.println("第 " + (rowNum + 1) + " 行结束，本行有 " + currentRowCells + " 个单元格");
        System.out.println("累计: " + totalRows + " 行, " + totalCells + " 个单元格");
    }
    
    @Override
    public void headerFooter(String text, boolean isHeader, String tagName) {
        // 忽略
    }
    
    public int getTotalRows() {
        return totalRows;
    }
    
    public int getTotalCells() {
        return totalCells;
    }
}

方法调用时序图

复制代码

Excel 文件内容:
+-------+-------+-------+
| A1    | B1    | C1    |
| 姓名  | 年龄  | 邮箱  |
+-------+-------+-------+
| A2    | B2    | C2    |
| 张三  | 25    | test  |
+-------+-------+-------+

解析流程:

startRow(0)  ─────────────────────────────┐
    ↓                                     │
cell("A1", "姓名", null)                  │
    ↓                                     │
cell("B1", "年龄", null)                  │
    ↓                                     │
cell("C1", "邮箱", null)                  │
    ↓                                     │
endRow(0)    ─────────────────────────────┤
    ↓                                     │
startRow(1)  ─────────────────────────────┤
    ↓                                     │
cell("A2", "张三", null)                  │
    ↓                                     │
cell("B2", "25", null)                    │
    ↓                                     │
cell("C2", "test", null)                  │
    ↓                                     │
endRow(1)    ─────────────────────────────┘

关键注意事项

1. 行号从 0 开始

java 复制代码

startRow(0)  // Excel 的第 1 行
startRow(1)  // Excel 的第 2 行
startRow(2)  // Excel 的第 3 行

2. startRow 和 endRow 的 rowNum 相同

java 复制代码

startRow(0)  // 开始第 1 行
...
endRow(0)    // 结束第 1 行

startRow(1)  // 开始第 2 行
...
endRow(1)    // 结束第 2 行

3. cell 在 startRow 和 endRow 之间调用

java 复制代码

startRow(0)           // 第 1 行开始
    ↓
cell("A1", "...")     // 第 1 个单元格
cell("B1", "...")     // 第 2 个单元格
cell("C1", "...")     // 第 3 个单元格
    ↓
endRow(0)             // 第 1 行结束

4. 空行也会触发 startRow 和 endRow

java 复制代码

startRow(2)           // 第 3 行（空行）
    ↓
// 没有 cell 调用
    ↓
endRow(2)             // 第 3 行结束

5. cell 可能被调用多次（同一行）

java 复制代码

startRow(0)           // 第 1 行
    ↓
cell("A1", "...")     // 第 1 次调用
cell("B1", "...")     // 第 2 次调用
cell("C1", "...")     // 第 3 次调用
cell("D1", "...")     // 第 4 次调用
...
cell("Z1", "...")     // 第 26 次调用
    ↓
endRow(0)             // 第 1 行结束

总结

方法	调用时机	作用	典型用法
startRow	遇到 `<row>` 标签	新行开始	初始化当前行数据结构
cell	遇到 `<c>` 标签	单元格数据	收集单元格值
endRow	遇到 `</row>` 标签	行结束	处理整行数据，保存结果

核心要点：

✅ startRow → cell (多次) → endRow 的顺序是固定的
✅ 行号从 0 开始
✅ startRow 和 endRow 的 rowNum 参数相同
✅ cell 在 startRow 和 endRow 之间可能被调用多次
✅ 空行也会触发 startRow 和 endRow，但不会有 cell 调用

希望这个详细的解释能帮你理解 SAX 解析的工作机制！