Excel AI Converter：用大模型自动转换excel表格格式

Excel AI Converter：用 DeepSeek 自动转换表格格式

上传原始表格和参考模板，AI 自动完成列映射、数据填充和公式重写。告别手动复制粘贴。

这个工具解决什么问题

你有一份数据表，需要转换成另一个部门的报表格式。两个表的数据差不多，但列名不一样、顺序不一样、有些列需要手动计算。

手动做这件事的流程：打开两个 Excel 窗口，左边是原始数据，右边是目标模板。对着列标题一个个复制粘贴------"姓名"对应"Name"，"手机号"对应"Phone"，"入职日期"对应"Date of Hire"。还得把模板里的公式手动调整到每一行，最后调格式。

一张 200 行的表，手动转换大概要半小时。而且容易出错------漏了一列、公式行号没改、格式没对齐。

Excel AI Converter 就是为了解决这个问题。上传原始表格和参考模板，AI 自动识别列之间的对应关系，完成数据转换，生成的文件直接继承模板的格式和公式。

GitHub 地址：excel-ai-converter

功能特性

AI 智能列映射------不需要手动指定"源表的哪一列对应目标表的哪一列"。AI 分析两组列标题的语义，自动建立映射。"姓名"→"Name"，"手机号"→"Phone"，语义相同就能匹配上。

AI 数据生成------目标模板里有些列在源表中找不到对应数据。比如模板要求"部门"，但源表只有"工号"。AI 会根据上下文推断生成合理的值，而不是留空。

公式自动重写 ------参考模板里的公式（比如 =SUM(C2:D2)）会自动适配到每一行数据。不用手动改行号。

格式保留------输出文件以参考模板为基础复制，继承表头样式、列宽、合并单元格等格式信息。

拖拽上传 + 数据预览------支持拖放文件到界面，上传后即时预览表格内容。

使用流程

复制代码

上传原始表格 → 上传参考模板 → 点击转换 → 下载结果

具体步骤：

在界面中设置 DeepSeek API Key
上传原始表格（包含待转换数据的 Excel 文件）
上传参考表格（期望输出格式的 Excel 模板）
点击「开始转换」，等待 AI 完成分析和转换
下载转换后的 Excel 文件

安装和启动

bash 复制代码

git clone https://github.com/xiaodangjia105/excel-ai-converter.git
cd excel-ai-converter
pip install -r excel_converter/requirements.txt
cd excel_converter
python -m flask run --host 127.0.0.1 --port 5000

浏览器打开 http://127.0.0.1:5000 即可使用。

技术栈

组件	技术
后端	Python + Flask
前端	HTML + CSS + JavaScript
Excel 处理	openpyxl（Python 的 Excel 读写库）
AI 能力	DeepSeek API（兼容 OpenAI SDK）

技术架构

#mermaid-svg-9Inxws8EDV0gLDwp{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-9Inxws8EDV0gLDwp .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-9Inxws8EDV0gLDwp .error-icon{fill:#552222;}#mermaid-svg-9Inxws8EDV0gLDwp .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-9Inxws8EDV0gLDwp .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-9Inxws8EDV0gLDwp .marker{fill:#333333;stroke:#333333;}#mermaid-svg-9Inxws8EDV0gLDwp .marker.cross{stroke:#333333;}#mermaid-svg-9Inxws8EDV0gLDwp svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-9Inxws8EDV0gLDwp p{margin:0;}#mermaid-svg-9Inxws8EDV0gLDwp .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-9Inxws8EDV0gLDwp .cluster-label text{fill:#333;}#mermaid-svg-9Inxws8EDV0gLDwp .cluster-label span{color:#333;}#mermaid-svg-9Inxws8EDV0gLDwp .cluster-label span p{background-color:transparent;}#mermaid-svg-9Inxws8EDV0gLDwp .label text,#mermaid-svg-9Inxws8EDV0gLDwp span{fill:#333;color:#333;}#mermaid-svg-9Inxws8EDV0gLDwp .node rect,#mermaid-svg-9Inxws8EDV0gLDwp .node circle,#mermaid-svg-9Inxws8EDV0gLDwp .node ellipse,#mermaid-svg-9Inxws8EDV0gLDwp .node polygon,#mermaid-svg-9Inxws8EDV0gLDwp .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-9Inxws8EDV0gLDwp .rough-node .label text,#mermaid-svg-9Inxws8EDV0gLDwp .node .label text,#mermaid-svg-9Inxws8EDV0gLDwp .image-shape .label,#mermaid-svg-9Inxws8EDV0gLDwp .icon-shape .label{text-anchor:middle;}#mermaid-svg-9Inxws8EDV0gLDwp .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-9Inxws8EDV0gLDwp .rough-node .label,#mermaid-svg-9Inxws8EDV0gLDwp .node .label,#mermaid-svg-9Inxws8EDV0gLDwp .image-shape .label,#mermaid-svg-9Inxws8EDV0gLDwp .icon-shape .label{text-align:center;}#mermaid-svg-9Inxws8EDV0gLDwp .node.clickable{cursor:pointer;}#mermaid-svg-9Inxws8EDV0gLDwp .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-9Inxws8EDV0gLDwp .arrowheadPath{fill:#333333;}#mermaid-svg-9Inxws8EDV0gLDwp .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-9Inxws8EDV0gLDwp .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-9Inxws8EDV0gLDwp .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-9Inxws8EDV0gLDwp .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-9Inxws8EDV0gLDwp .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-9Inxws8EDV0gLDwp .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-9Inxws8EDV0gLDwp .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-9Inxws8EDV0gLDwp .cluster text{fill:#333;}#mermaid-svg-9Inxws8EDV0gLDwp .cluster span{color:#333;}#mermaid-svg-9Inxws8EDV0gLDwp div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-9Inxws8EDV0gLDwp .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-9Inxws8EDV0gLDwp rect.text{fill:none;stroke-width:0;}#mermaid-svg-9Inxws8EDV0gLDwp .icon-shape,#mermaid-svg-9Inxws8EDV0gLDwp .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-9Inxws8EDV0gLDwp .icon-shape p,#mermaid-svg-9Inxws8EDV0gLDwp .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-9Inxws8EDV0gLDwp .icon-shape .label rect,#mermaid-svg-9Inxws8EDV0gLDwp .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-9Inxws8EDV0gLDwp .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-9Inxws8EDV0gLDwp .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-9Inxws8EDV0gLDwp :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 上传/转换/下载
DeepSeek API
浏览器前端
Flask API
excel_parser
conversion_engine
ai_service
大语言模型

四个核心模块各司其职：

app.py ：Flask Web 服务。定义了四个 API 端点------/api/configure（设置 API Key）、/api/upload（上传文件）、/api/convert（执行转换）、/api/download（下载结果）。用 Flask Session 存储用户上传的文件路径和 API Key。
excel_parser.py：Excel 文件解析。用 openpyxl 读取 Excel 文件的表头、数据行、公式、列宽、合并单元格和样式信息。
conversion_engine.py：转换引擎。串联整个转换流程------调 excel_parser 解析两个文件、调 ai_service 做列映射、按映射关系填充数据、重写公式、输出结果文件。
ai_service.py：AI 服务。封装 DeepSeek API 的调用，提供两个核心能力------列映射分析和缺失数据生成。

核心实现详解

AI 列映射

这是整个工具最核心的能力。ai_service.py 的 analyze_column_mapping 函数把源表和目标表的列标题发给 DeepSeek，让它判断哪些列语义相同。

Prompt 的关键设计：

python 复制代码

prompt = """你是一个数据表格格式转换专家。请分析以下两组表格的列标题，
判断源表格的哪些列可以映射到目标表格的列。

源表格列标题：{src}
目标表格列标题：{tgt}

请返回严格的 JSON 格式：
{{
  "mappings": [
    {{"source_col": "源列名", "target_col": "目标列名", "confidence": 0.95}}
  ],
  "unmapped_targets": ["无法映射的目标列名"]
}}

规则：
1. 只映射含义相同或高度相似的列
2. confidence 0-1，完全匹配 0.95，相似 0.7
3. 源表无匹配的目标列放入 unmapped_targets"""

几个设计考量：

要求返回 confidence 分数：方便后续过滤低置信度的映射。完全匹配的列给 0.95，语义相似但不完全一样的给 0.7。
把未映射的列单独列出来：这些列要么由 AI 生成数据，要么用公式填充，和已映射的列走不同的处理路径。
JSON 容错解析 ：AI 返回的 JSON 不一定干净------有时被包在 ```json ````代码块里，有时前面多了一段解释文字。_parse_json_response函数做了多层提取：先试整体解析，再试从代码块里提取，最后试找{和}` 之间的内容。

降级策略：如果 AI 调用失败（网络超时、API 限流等），自动降级为本地模糊匹配------按列名的包含关系做字符串匹配。准确率不如 AI，但至少不会让整个转换流程挂掉。

python 复制代码

def _fallback_mapping(source_headers, target_headers):
    mappings = []
    unmapped = []
    used = set()
    for th in target_headers:
        matched = False
        for sh in source_headers:
            if sh in used:
                continue
            if th == sh or (th and sh and (th in sh or sh in th)):
                mappings.append({"source_col": sh, "target_col": th, "confidence": 0.5})
                used.add(sh)
                matched = True
                break
        if not matched:
            unmapped.append(th)
    return {"mappings": mappings, "unmapped_targets": unmapped}

AI 数据生成

对于源表中没有对应数据的列（unmapped_targets），调用 AI 根据上下文生成值。

分批处理是关键------每 10 行一批调用 API。原因：一次性传太多行数据，prompt 会很长，AI 的生成质量会下降，而且容易超时。

python 复制代码

BATCH_SIZE = 10
for i in range(0, len(source_data_rows), BATCH_SIZE):
    batch = source_data_rows[i:i + BATCH_SIZE]
    results = _generate_batch(target_headers, batch, mapping_info, unmapped_cols)
    all_results.extend(results)

Prompt 里会把已有的映射关系、需要生成的列、当前批次的源数据一起传给 AI，让它根据上下文推断。比如源表有"工号=1001"、"姓名=张三"，目标表需要"部门"，AI 会根据工号和姓名推断一个合理的部门值。

生成失败时不会阻断整个流程------直接填充空字符串，保证转换能完成。

公式重写

参考模板里可能有公式，比如 =SUM(C2:D2)。转换后数据行数变了，公式里的行号也得跟着变。

核心思路：从参考模板中提取公式，把当前行号替换成占位符 {row}，生成公式模板。写入每一行数据时，把 {row} 替换成实际行号。

python 复制代码

def _formula_to_template(formula, source_row):
    """=SUM(A2:B2) with source_row=2 → =SUM(A{row}:B{row})"""
    def replace_ref(match):
        col_part = match.group(1)  # "$A" or "A"
        row_part = match.group(2)  # "$2" or "2"
        row_num = int(row_part.replace("$", ""))
        if row_num == source_row:
            return f"{col_part}{{row}}"
        return match.group(0)

    pattern = r"(\$?[A-Z]+)(\$?\d+)"
    return re.sub(pattern, replace_ref, formula)

这个函数处理了 Excel 公式中所有的引用格式：A2、$A$2、$A2、A$2。只替换当前行号的引用，不影响公式中引用其他行的部分。

Excel 解析

excel_parser.py 用 openpyxl 的 data_only=False 模式打开 Excel 文件。这个模式很重要------data_only=True 只读取计算后的值，公式会丢失；data_only=False 保留公式字符串，但读不到计算值。这里选 False，因为公式重写是核心功能。

解析返回的信息包括：表头列表、数据行（每行每个单元格的值、公式、类型）、列宽、合并单元格范围、表头样式。这些信息支撑了后续的数据填充和格式保留。

项目结构

复制代码

excel-ai-converter/
├── excel_converter/
│   ├── app.py                 # Flask Web 服务（4 个 API 端点）
│   ├── excel_parser.py        # Excel 文件解析（表头、数据、公式、样式）
│   ├── conversion_engine.py   # 转换引擎（列映射、数据填充、公式重写）
│   ├── ai_service.py          # AI 服务（DeepSeek API 列映射与数据生成）
│   ├── requirements.txt       # Python 依赖
│   ├── templates/index.html   # 前端页面
│   ├── static/app.js          # 前端交互逻辑
│   ├── static/style.css       # 样式
│   ├── uploads/               # 上传文件存储
│   └── output/                # 转换结果存储
└── README.md

总结与后续计划

Excel AI Converter 解决的核心问题是：两个格式不同但数据相关的 Excel 表之间的转换。AI 负责理解列之间的语义关系，代码负责数据搬运和公式处理。

当前的局限：

只支持单 Sheet，多 Sheet 的 Excel 需要手动拆分
API 调用有延迟，大数据量（几千行以上）转换较慢
AI 生成的缺失列数据质量取决于上下文丰富度，数据太少时可能不准

后续计划：

多 Sheet 支持
批量文件转换
支持本地模型（Ollama），不依赖云端 API

项目地址：github.com/xiaodangjia105/excel-ai-converter

觉得有用的话，点个 Star 支持一下。

Excel AI Converter：用 大模型 自动转换excel表格格式

Excel AI Converter：用 DeepSeek 自动转换表格格式

目录

这个工具解决什么问题

功能特性

使用流程

安装和启动

技术栈

技术架构

核心实现详解

AI 列映射

AI 数据生成

公式重写

Excel 解析

项目结构

总结与后续计划

Excel AI Converter：用大模型自动转换excel表格格式