【MinerU】API 服务与 Router服务

MinerU API 服务与 Router 负载均衡指南

一、API 服务架构概览

MinerU 提供 mineru-api 命令启动 FastAPI 服务，支持同步和异步两种处理模式：

复制代码

客户端（CLI / HTTP / SDK）
    ↓
┌─────────────────────────────────────────────┐
│              mineru-api (FastAPI)            │
│                                             │
│  POST /file_parse  ← 同步模式（等待完成）      │
│  POST /tasks       ← 异步模式（立即返回）      │
│  GET  /tasks/{id}           ↑ 轮询状态       │
│  GET  /tasks/{id}/result    ↑ 获取结果       │
│  GET  /health       ← 健康检查               │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │  AsyncTaskManager (任务管理器)       │    │
│  │  - 任务队列 (asyncio.Queue)          │    │
│  │  - 并发信号量 (Semaphore=3)          │    │
│  │  - 派发循环 (dispatcher loop)        │    │
│  │  - 自动清理 (24小时保留)             │    │
│  └─────────────────────────────────────┘    │
│                    ↓                         │
│  ┌─────────────────────────────────────┐    │
│  │  文档解析引擎                        │    │
│  │  pipeline / hybrid / vlm            │    │
│  └─────────────────────────────────────┘    │
└─────────────────────────────────────────────┘

二、同步模式 vs 异步模式

2.1 同步模式 `POST /file_parse`

特点：提交请求后阻塞等待，直到解析完成才返回结果。

复制代码

客户端 → POST /file_parse (上传文件+参数)
            ↓
         服务器开始解析（阻塞等待）
            ↓
         解析完成
            ↓
客户端 ← 返回 ZIP 结果包

适用场景：

单个小文件快速解析
简单的脚本/工具集成
不需要同时处理多个请求的场景

限制：

请求会一直占用连接，大文件可能超时
并发受 MINERU_API_MAX_CONCURRENT_REQUESTS 限制（默认 3）
如果前面有 3 个任务正在处理，新请求会被信号量阻塞等待

调用示例：

bash 复制代码

# curl 调用同步接口
curl -X POST http://localhost:8000/file_parse \
  -F "file=@input.pdf" \
  -F "backend=hybrid-auto-engine" \
  -F "parse_method=auto" \
  -F "formula_enable=true" \
  -F "table_enable=true" \
  -o result.zip

# Python 调用
import requests

with open("input.pdf", "rb") as f:
    resp = requests.post(
        "http://localhost:8000/file_parse",
        files={"file": f},
        data={
            "backend": "hybrid-auto-engine",
            "parse_method": "auto",
            "formula_enable": True,
            "table_enable": True,
        },
    )

with open("result.zip", "wb") as out:
    out.write(resp.content)

2.2 异步模式 `POST /tasks`

特点：提交请求后立即返回 task_id，通过轮询获取状态和结果。

复制代码

客户端 → POST /tasks (上传文件+参数)
            ↓
客户端 ← 立即返回 { task_id, status: "pending", queued_ahead: 2 }
            ↓
客户端 → GET /tasks/{task_id} (轮询状态)
            ↓
客户端 ← { status: "processing", queued_ahead: 0 }
            ↓
客户端 → GET /tasks/{task_id} (继续轮询)
            ↓
客户端 ← { status: "completed" }
            ↓
客户端 → GET /tasks/{task_id}/result (获取结果)
            ↓
客户端 ← ZIP 结果包

适用场景：

大文件或大批量解析
需要同时提交多个任务
Web 应用 / 后台服务集成
需要展示任务队列位置

任务生命周期：

复制代码

pending → queued → processing → completed
                                  ↓
                               failed

任务状态说明：

状态	说明
`pending`	已提交，等待进入队列
`queued`	已进入队列，等待处理
`processing`	正在解析
`completed`	解析完成，可下载结果
`failed`	解析失败

调用示例：

python 复制代码

import requests
import time

API_URL = "http://localhost:8000"

# 1. 提交异步任务
with open("input.pdf", "rb") as f:
    resp = requests.post(
        f"{API_URL}/tasks",
        files={"file": f},
        data={
            "backend": "hybrid-auto-engine",
            "parse_method": "auto",
        },
    )

task_info = resp.json()
task_id = task_info["task_id"]
print(f"任务已提交: {task_id}, 前方排队: {task_info['queued_ahead']}")

# 2. 轮询等待完成
while True:
    status_resp = requests.get(f"{API_URL}/tasks/{task_id}")
    status_data = status_resp.json()

    print(f"状态: {status_data['status']}, 前方排队: {status_data.get('queued_ahead', 0)}")

    if status_data["status"] == "completed":
        break
    elif status_data["status"] == "failed":
        print(f"任务失败: {status_data.get('error')}")
        exit(1)

    time.sleep(2)  # 每 2 秒轮询一次

# 3. 下载结果
result_resp = requests.get(f"{API_URL}/tasks/{task_id}/result")
with open("result.zip", "wb") as f:
    f.write(result_resp.content)
print("结果已下载")

2.3 同步 vs 异步对比

维度	同步 `/file_parse`	异步 `/tasks`
响应方式	阻塞等待完成	立即返回 task_id
适用文件	小文件、快速任务	任意大小文件
并发支持	受信号量限制	支持任务队列排队
进度查询	无	可查询排队位置和状态
超时风险	大文件可能超时	无超时风险
结果获取	直接返回	轮询后下载
推荐场景	单文件、快速验证	生产环境、批量处理

三、并发限制与任务管理

3.1 并发控制机制

API 服务使用 asyncio.Semaphore 控制并发：

python 复制代码

# 并发信号量
_request_semaphore = asyncio.Semaphore(max_concurrent_requests)

# 每个任务处理时获取信号量
async with _request_semaphore:
    await self._run_task(task)

默认值：

Linux/Windows：3 个并发请求
macOS：1 个并发请求（macOS 多进程限制）

3.2 任务队列

超过并发限制的请求进入队列排队：

复制代码

请求1 → [processing]  ← 信号量许可 1
请求2 → [processing]  ← 信号量许可 2
请求3 → [processing]  ← 信号量许可 3
请求4 → [queued]      ← 等待信号量释放
请求5 → [queued]      ← 等待信号量释放

3.3 任务自动清理

参数	默认值	说明
`MINERU_API_TASK_RETENTION_SECONDS`	86400（24小时）	已完成任务保留时间
`MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS`	300（5分钟）	清理轮询间隔

3.4 相关环境变量

bash 复制代码

# 并发控制
MINERU_API_MAX_CONCURRENT_REQUESTS=3     # 最大并发请求数

# 任务生命周期
MINERU_API_TASK_RETENTION_SECONDS=86400  # 任务保留时间（秒）
MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS=300  # 清理间隔（秒）

# 处理参数
MINERU_PROCESSING_WINDOW_SIZE=64         # Pipeline 模式每批最大页数

# 服务配置
MINERU_API_OUTPUT_ROOT=./output          # 输出目录
MINERU_API_ENABLE_FASTAPI_DOCS=1         # 启用 /docs Swagger UI
MINERU_API_DISABLE_ACCESS_LOG=0          # 禁用访问日志
MINERU_LOCAL_API_STARTUP_TIMEOUT_SECONDS=300  # API 启动超时

# 安全（公网部署时）
MINERU_API_PUBLIC_BIND_EXPOSED=0         # 绑定 0.0.0.0 时设置
MINERU_API_ALLOW_PUBLIC_HTTP_CLIENT=0    # 允许公网 http-client

3.5 健康检查

bash 复制代码

# 查看服务状态
curl http://localhost:8000/health

响应示例：

json 复制代码

{
  "status": "healthy",
  "version": "3.1.0",
  "protocol_version": 1,
  "queued_tasks": 2,
  "processing_tasks": 3,
  "completed_tasks": 15,
  "failed_tasks": 1,
  "max_concurrent_requests": 3,
  "processing_window_size": 64,
  "task_retention_seconds": 86400,
  "task_cleanup_interval_seconds": 300
}

四、Router 服务（负载均衡）

4.1 Router 是什么

mineru-router 是一个反向代理 + 负载均衡器 ，将请求分发到多个 mineru-api 后端实例：

复制代码

                    ┌─ mineru-api (GPU 0, :8001)  ← 处理中: 2, 排队: 0
                    │
客户端 → router  ───┼─ mineru-api (GPU 1, :8002)  ← 处理中: 1, 排队: 1  ✓ 选中
   :8002            │
                    └─ mineru-api (GPU 2, :8003)  ← 处理中: 3, 排队: 2

4.2 负载均衡算法

Router 使用 最小负载优先（Least-Loaded）算法：

复制代码

1. 筛选健康（healthy）的后端实例
2. 计算每个实例的负载分数：
   score = (queued + processing + pending_assignments) / max_concurrent
3. 按分数排序，选择分数最低的实例
4. 相同分数时，优先选择本地实例（source=local）
5. 分数也相同时，随机选择

负载分数公式：

复制代码

score = (排队任务数 + 处理中任务数 + 待分配任务数) / 最大并发数

示例：

实例	排队	处理中	待分配	最大并发	score	选中？
GPU-0	0	2	0	3	0.67
GPU-1	1	1	0	3	0.67	✓（本地优先）
GPU-2	2	3	1	3	2.00

4.3 健康监控

Router 每 2 秒检查一次所有后端的健康状态：

python 复制代码

# 健康检查间隔
MINERU_ROUTER_WORKER_REFRESH_INTERVAL_SECONDS = 2

# 连续失败 5 次后标记为不健康，自动重启本地 worker
MAX_CONSECUTIVE_FAILURES = 5

4.4 Router 两种部署模式

模式 A：本地 Worker 自动管理（推荐）

Router 自动在本地 GPU 上启动和管理 mineru-api worker：

bash 复制代码

# 自动检测所有 GPU，每个 GPU 启动一个 worker
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

# 指定使用 GPU 0 和 GPU 1
mineru-router --host 0.0.0.0 --port 8002 --local-gpus "0,1"

# 仅使用 CPU（无 GPU）
mineru-router --host 0.0.0.0 --port 8002 --local-gpus none

Router 会自动管理 Worker 的完整生命周期：

启动时：按 GPU 列表创建并启动 worker
运行时：健康检查，失败后自动重启
关闭时：优雅停止所有 worker

模式 B：上游 API 代理

Router 连接已有的远程 mineru-api 实例：

bash 复制代码

# 不启动本地 worker，代理到远程 API
mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus none \
  --upstream-url http://192.168.1.101:8000 \
  --upstream-url http://192.168.1.102:8000 \
  --upstream-url http://192.168.1.103:8000

混合模式

同时管理本地 Worker 和代理远程 API：

bash 复制代码

mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus "0,1" \
  --upstream-url http://192.168.1.101:8000

4.5 Router 环境变量

bash 复制代码

# Worker 管理
MINERU_ROUTER_LOCAL_GPUS=auto            # GPU 分配："auto"/"none"/"0,1,2"
MINERU_ROUTER_WORKER_HOST=127.0.0.1      # Worker 监听地址
MINERU_ROUTER_ENABLE_VLM_PRELOAD=0       # 启动时预加载 VLM 模型
MINERU_ROUTER_WORKER_ARGS_JSON='{"key":"val"}'  # Worker 额外 CLI 参数

# 上游 API
MINERU_ROUTER_UPSTREAM_URLS_JSON='["http://api1:8000","http://api2:8000"]'

# 健康检查
MINERU_ROUTER_WORKER_REFRESH_INTERVAL_SECONDS=2  # 检查间隔

4.6 Router 健康检查

bash 复制代码

curl http://localhost:8002/health

响应包含所有 worker 的详细状态：

json 复制代码

{
  "status": "healthy",
  "servers": [
    {
      "server_id": "local-gpu-0",
      "base_url": "http://127.0.0.1:8001",
      "source": "local",
      "healthy": true,
      "queued_tasks": 1,
      "processing_tasks": 2,
      "max_concurrent_requests": 3
    },
    {
      "server_id": "local-gpu-1",
      "base_url": "http://127.0.0.1:8002",
      "source": "local",
      "healthy": true,
      "queued_tasks": 0,
      "processing_tasks": 1,
      "max_concurrent_requests": 3
    },
    {
      "server_id": "remote-192.168.1.101:8000",
      "base_url": "http://192.168.1.101:8000",
      "source": "upstream",
      "healthy": true,
      "queued_tasks": 0,
      "processing_tasks": 0,
      "max_concurrent_requests": 3
    }
  ]
}

五、部署示例

5.1 单机单 GPU 部署

bash 复制代码

# 启动 API 服务
mineru-api --host 0.0.0.0 --port 8000

# 调用
curl -X POST http://localhost:8000/file_parse -F "file=@doc.pdf" -o result.zip

5.2 单机多 GPU 部署（Router 自动管理）

bash 复制代码

# Router 自动在每个 GPU 上启动一个 API worker
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

# 客户端只需连接 Router
curl -X POST http://localhost:8002/file_parse -F "file=@doc.pdf" -o result.zip

架构：

复制代码

客户端 → Router(:8002)
            ├─ Worker-0 (:8001, GPU 0, 并发=3)
            ├─ Worker-1 (:8002, GPU 1, 并发=3)
            └─ Worker-2 (:8003, GPU 2, 并发=3)
                                         总并发 = 9

5.3 多机集群部署

机器 A（192.168.1.101）：

bash 复制代码

mineru-api --host 0.0.0.0 --port 8000

机器 B（192.168.1.102）：

bash 复制代码

mineru-api --host 0.0.0.0 --port 8000

机器 C（Router 入口）：

bash 复制代码

mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus none \
  --upstream-url http://192.168.1.101:8000 \
  --upstream-url http://192.168.1.102:8000

架构：

复制代码

                    ┌─ 机器A (:8000, GPU, 并发=3)
客户端 → Router(C) ─┤
                    └─ 机器B (:8000, GPU, 并发=3)
                                         总并发 = 6

5.4 Docker Compose 部署

yaml 复制代码

# docker-compose.yaml

# 单 API 服务
mineru-api:
  image: mineru:latest
  profiles: ["api"]
  ports:
    - "8000:8000"
  environment:
    MINERU_MODEL_SOURCE: local
  command: mineru-api --host 0.0.0.0 --port 8000

# Router（自动管理本地 Worker）
mineru-router:
  image: mineru:latest
  profiles: ["router"]
  ports:
    - "8002:8002"
  environment:
    MINERU_MODEL_SOURCE: local
  command: >
    mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

bash 复制代码

# 启动 API
docker-compose --profile api up -d

# 启动 Router
docker-compose --profile router up -d

5.5 OpenAI 兼容服务

bash 复制代码

# 启动 OpenAI 兼容 API
mineru-openai-server --host 0.0.0.0 --port 30000

# 调用（兼容 OpenAI SDK）
from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="none")
response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "Parse this document"}],
    extra_body={"file_url": "http://your-file-server/doc.pdf"},
)

六、CLI 工具的 API 集成

6.1 自动启动本地 API

不指定 --api-url 时，CLI 会自动启动临时 API 服务：

bash 复制代码

# CLI 自动启动本地 API，解析完后自动关闭
mineru -p input.pdf -o output/

# 流程：
# 1. 启动临时 mineru-api（localhost:random_port）
# 2. 等待 API 就绪（最长 300 秒）
# 3. 通过 API 提交任务
# 4. 等待完成并下载结果
# 5. 关闭临时 API

6.2 指定远程 API

bash 复制代码

# 连接到已有的 API 服务
mineru -p input.pdf -o output/ --api-url http://192.168.1.100:8000

# 连接到 Router
mineru -p input.pdf -o output/ --api-url http://192.168.1.100:8002

6.3 CLI 的任务派发策略

后端	任务拆分方式	说明
`pipeline`	按 `processing_window_size` 分批	每 64 页一个任务，高效利用批处理
`hybrid-auto-engine`	每个文件一个任务	VLM 按文件处理
`vlm-auto-engine`	每个文件一个任务	VLM 按文件处理

七、性能调优建议

7.1 单机调优

bash 复制代码

# 增大并发数（适合高配 GPU）
MINERU_API_MAX_CONCURRENT_REQUESTS=6

# 减小窗口大小（适合小显存）
MINERU_PROCESSING_WINDOW_SIZE=32

# 减小批处理比例（适合 8GB 显存）
MINERU_HYBRID_BATCH_RATIO=2

7.2 集群调优

bash 复制代码

# 每个 API 实例
MINERU_API_MAX_CONCURRENT_REQUESTS=3   # 根据 GPU 显存调整

# Router
--local-gpus auto                       # 自动分配所有 GPU
--upstream-url http://...               # 添加更多后端扩展吞吐

7.3 吞吐量估算

假设单个 API 实例并发数为 3，每个任务平均处理时间 30 秒：

部署方式	API 实例数	总并发	理论吞吐（文档/分钟）
单 API	1	3	6
Router + 2 GPU	2	6	12
Router + 4 GPU	4	12	24
Router + 2 台机器	2	6	12
Router + 4 台机器	4	12	24

实际吞吐量取决于文档复杂度、GPU 型号和后端选择。

八、Router 负载均衡详解

8.1 Router 核心架构

复制代码

                    ┌─────────────────────────────────────┐
                    │          Router (mineru-router)       │
                    │                                       │
客户端请求 ──────→  │  ┌───────────────────────────────┐   │
  POST /tasks      │  │   WorkerPool (负载均衡器)       │   │
  POST /file_parse │  │                               │   │
                    │  │  ① 筛选健康 Worker            │   │
                    │  │  ② 计算负载分数               │   │
                    │  │  ③ 选择最低分数的 Worker      │   │
                    │  │  ④ 转发请求                   │   │
                    │  │  ⑤ 跟踪结果并返回             │   │
                    │  └───────────────────────────────┘   │
                    │                                       │
                    │  ┌───────────────────────────────┐   │
                    │  │   健康监控循环 (每 2 秒)       │   │
                    │  │   GET /health 每个 Worker      │   │
                    │  │   连续 5 次失败 → 自动重启     │   │
                    │  └───────────────────────────────┘   │
                    │                                       │
                    └──────────┬──────────┬────────────────┘
                               │          │
                    ┌──────────▼──┐  ┌────▼──────────┐
                    │ Worker-0    │  │ Worker-1       │
                    │ GPU 0:8001  │  │ GPU 1:8002     │
                    │ 并发=3      │  │ 并发=3          │
                    └─────────────┘  └───────────────┘

8.2 负载均衡算法详解

Router 使用 最小负载优先（Least-Loaded） 算法，精确实现如下：

第一步：负载分数计算

复制代码

score = (queued_tasks + processing_tasks + pending_assignments) / max_concurrent_requests

字段	含义
`queued_tasks`	排队中的任务数
`processing_tasks`	正在处理的任务数
`pending_assignments`	已分配但尚未到达的任务数（关键！）
`max_concurrent_requests`	该 Worker 的最大并发数

pending_assignments 的作用：防止 Router 在短时间内将多个任务分配给同一个"看起来空闲"的 Worker。在任务提交过程中就预先占位。

第二步：选择流程

python 复制代码

# 精确选择算法
async def acquire_submission_server(excluded_server_ids=None):
    # 1. 筛选健康的 Worker
    candidates = [s for s in servers if s.healthy and s.id not in excluded]

    # 2. 随机打乱（同分数时保证公平性）
    random.shuffle(candidates)

    # 3. 按负载分数排序（分数相同则按 pending_assignments 排序）
    #    相同时本地 Worker 优先于远程 Worker
    candidates.sort(key=lambda s: (
        s.score(),                   # 主排序：负载分数
        s.pending_assignments,       # 次排序：待分配数
        0 if s.source == "local" else 1,  # 优先本地
    ))

    # 4. 选第一个（最低负载）
    selected = candidates[0]
    selected.pending_assignments += 1  # 预占位
    return selected

第三步：具体选择示例

假设 3 个 Worker，max_concurrent_requests 均为 3：

Worker	queued	processing	pending	score	选中？
GPU-0	1	2	0	(1+2+0)/3 = 1.00
GPU-1	0	1	0	(0+1+0)/3 = 0.33	✓ 最低分
GPU-2	2	3	1	(2+3+1)/3 = 2.00

8.3 Worker 健康监控与自动恢复

复制代码

                    健康监控循环（每 2 秒）
                           │
                    ┌──────▼──────┐
                    │ GET /health │ ←── 每个 Worker
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ↓            ↓            ↓
          成功          失败          失败
       healthy=True  failures+1   failures >= 5
       重置计数器    记录错误      │
                                  ↓
                          ┌──────────────┐
                          │ 是否为本地？ │
                          └──────┬───────┘
                           Yes ↙     ↘ No
                    ┌──────────┐   标记不健康
                    │自动重启   │
                    │restart() │
                    └──────────┘

健康检查间隔：2 秒 （MINERU_ROUTER_WORKER_REFRESH_INTERVAL_SECONDS）
自动重启阈值：连续 5 次失败 （WORKER_HEALTH_FAILURE_RESTART_THRESHOLD）
仅本地 Worker 会自动重启，远程 Worker 仅标记为不健康

8.4 Router 的请求转发机制

Router 对客户端完全透明，接口与 mineru-api 完全一致：

复制代码

客户端 → POST /tasks (提交任务)
    ↓
Router 收到请求
    ↓
acquire_submission_server() 选择最优 Worker
    ↓
转发请求到 Worker 的 POST /tasks
    ↓
获得 task_id
    ↓
返回 task_id 给客户端（客户端不知道背后有多个 Worker）

后续轮询：
客户端 → GET /tasks/{task_id}
    ↓
Router 查询所有 Worker 找到该 task_id 所在的 Worker
    ↓
转发并返回状态

九、多卡多服务调度实战

9.1 场景一：单机 4 卡 GPU 服务器

目标：一台机器上有 4 块 RTX 4090，最大化吞吐。

bash 复制代码

# 一条命令启动，Router 自动管理 4 个 Worker
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

Router 会自动：

检测 4 块 GPU（0, 1, 2, 3）
每块 GPU 启动一个 mineru-api Worker
Worker 分别绑定到 :8001, :8002, :8003, :8004
持续监控健康状态

架构：
客户端 → Router(:8002)
├─ Worker-0 (:8001, GPU 0, 并发=3) ← score 低时优先
├─ Worker-1 (:8002, GPU 1, 并发=3)
├─ Worker-2 (:8003, GPU 2, 并发=3)
└─ Worker-3 (:8004, GPU 3, 并发=3)
总并发 = 12

客户端调用：

bash 复制代码

# CLI 直接连接 Router
mineru -p /data/docs/ -o /data/output/ --api-url http://localhost:8002

# 或者 HTTP 调用
curl -X POST http://localhost:8002/tasks \
  -F "file=@big_doc.pdf" \
  -F "backend=hybrid-auto-engine"

指定特定 GPU：

bash 复制代码

# 只用 GPU 0 和 GPU 2
mineru-router --host 0.0.0.0 --port 8002 --local-gpus "0,2"

9.2 场景二：多机集群

目标：3 台服务器，每台 2 块 GPU，集群化部署。

服务器 A（192.168.1.101，2 块 GPU）：

bash 复制代码

mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus auto \
  --port-range-start 9000

服务器 B（192.168.1.102，2 块 GPU）：

bash 复制代码

mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus auto \
  --port-range-start 9000

服务器 C（入口 Router，不负责解析）：

bash 复制代码

mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus none \
  --upstream-url http://192.168.1.101:8002 \
  --upstream-url http://192.168.1.102:8002

复制代码

架构：
                    ┌─ 服务器A Router ─┬─ Worker(:9000, GPU0, 并发=3)
                    │                  └─ Worker(:9001, GPU1, 并发=3)
客户端 → 入口Router ─┤
                    │                  ┌─ Worker(:9000, GPU0, 并发=3)
                    └─ 服务器B Router ─┤
                                       └─ Worker(:9001, GPU1, 并发=3)
                                                              总并发 = 12

9.3 场景三：混合部署（本地 GPU + 远程 API）

bash 复制代码

mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus "0,1" \
  --upstream-url http://192.168.1.100:8000 \
  --upstream-url http://192.168.1.200:8000

复制代码

客户端 → Router(:8002)
            ├─ local-gpu-0 (:8003, 本地 GPU 0)
            ├─ local-gpu-1 (:8004, 本地 GPU 1)
            ├─ remote-192.168.1.100 (:8000, 远程 API)
            └─ remote-192.168.1.200 (:8000, 远程 API)

优先级规则：负载分数相同时，本地 Worker 优先于远程 Worker。

9.4 并发数调优

bash 复制代码

# 每个 Worker 的并发数（根据显存调整）
MINERU_API_MAX_CONCURRENT_REQUESTS=3   # 默认 3，适合 16GB 显存
MINERU_API_MAX_CONCURRENT_REQUESTS=6   # 32GB 显存可尝试
MINERU_API_MAX_CONCURRENT_REQUESTS=1   # 8GB 显存建议降低

# Worker 数 × 并发数 = 集群总并发
# 例：4 Worker × 3 并发 = 12 个任务可同时处理

十、150 页超大文档处理方案

10.1 问题分析

用户上传一个 150 页的 PDF，面临的挑战：

挑战	说明
显存压力	一次加载 150 页图片会 OOM
处理时间	单线程处理可能需要数十分钟
并发阻塞	占用一个并发槽位很长时间
结果拼接	需要合并多段结果

10.2 MinerU 内置的分批处理机制

MinerU 已经内置了完善的超大文档处理方案，无需用户手动拆分：

复制代码

150 页 PDF
    ↓
CLI/API 自动按窗口分批
    ↓
    processing_window_size = 64（默认）
    ↓
    批次 1: 第 1-64 页
    批次 2: 第 65-128 页
    批次 3: 第 129-150 页
    ↓
每批独立处理（每批内部再次批处理）
    ↓
流式写入中间结果（每批完成就写入磁盘）
    ↓
全部批次完成后合并输出

关键源码逻辑：

python 复制代码

# Pipeline 后端：自动按 64 页分批
# 文档按页数降序排列（大文档优先获得独立任务）
# 多个小文档可以合并到一个批次（最佳适应装箱算法）

# VLM/Hybrid 后端：每个文件独立处理
# 但内部仍按 processing_window_size 分窗口处理
# 每个窗口处理完后流式写入磁盘

10.3 不同后端的处理方式

Pipeline 后端

复制代码

150 页 PDF
    ↓
按 processing_window_size=64 分批
    ↓
任务 1: 第 1-64 页   → Worker-GPU0（每个 Worker 内部并发处理模型推理）
任务 2: 第 65-128 页 → Worker-GPU1
任务 3: 第 129-150 页 → Worker-GPU2
    ↓
三个任务可以并行（如果有多个 Worker）
    ↓
结果合并

多卡加速效果显著：150 页被拆成 3 个任务，在 3 卡 Router 上可完全并行。

Hybrid/VLM 后端

复制代码

150 页 PDF
    ↓
作为单个任务提交
    ↓
内部按 processing_window_size=64 分窗口
    ↓
窗口 1: 第 1-64 页   → VLM 推理
    ↓ 完成后流式写入
窗口 2: 第 65-128 页 → VLM 推理
    ↓ 完成后流式写入
窗口 3: 第 129-150 页 → VLM 推理
    ↓ 完成后流式写入
    ↓
合并输出

注意：VLM 后端下，单个 150 页文档只占用一个 Worker，不能跨 Worker 并行。但流式写入保证不会 OOM。

10.4 实战操作

方案 A：单机单卡（最简单）

bash 复制代码

# 直接处理，无需额外配置
# 内置分窗口机制会自动处理
mineru -p big_doc_150pages.pdf -o output/ -b hybrid-auto-engine

# 如果显存不足，降低批处理比例
export MINERU_HYBRID_BATCH_RATIO=2
mineru -p big_doc_150pages.pdf -o output/

# 使用 Pipeline 后端（显存要求更低）
mineru -p big_doc_150pages.pdf -o output/ -b pipeline

方案 B：单机多卡（推荐）

bash 复制代码

# 启动 Router（自动管理多卡）
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

# 使用 Pipeline 后端（自动拆分任务到多卡并行）
mineru -p big_doc_150pages.pdf -o output/ \
  --api-url http://localhost:8002 \
  -b pipeline

# Pipeline 下 150 页会被拆成 3 个任务：
#   任务 1: 64 页 → GPU 0
#   任务 2: 64 页 → GPU 1
#   任务 3: 22 页 → GPU 2
# 三个任务并行处理！

方案 C：调整处理窗口大小

bash 复制代码

# 更小的窗口 → 更多并行任务 → 更快完成（需要更多 Worker）
export MINERU_PROCESSING_WINDOW_SIZE=32

# 150 页 / 32 = 5 个任务
# 在 4 卡 Router 上：
#   任务 1: 32 页 → GPU 0
#   任务 2: 32 页 → GPU 1
#   任务 3: 32 页 → GPU 2
#   任务 4: 32 页 → GPU 3
#   任务 5: 22 页 → 排队等待

方案 D：通过 API 异步处理

python 复制代码

import requests
import time

API_URL = "http://localhost:8002"  # Router 地址

# 提交异步任务
with open("big_doc_150pages.pdf", "rb") as f:
    resp = requests.post(
        f"{API_URL}/tasks",
        files={"file": f},
        data={"backend": "pipeline"},  # Pipeline 可多卡并行
    )

task_id = resp.json()["task_id"]
print(f"任务已提交: {task_id}")

# 轮询等待（不会超时，异步模式支持长时间处理）
while True:
    status = requests.get(f"{API_URL}/tasks/{task_id}").json()
    print(f"状态: {status['status']}")

    if status["status"] == "completed":
        break
    elif status["status"] == "failed":
        print(f"失败: {status.get('error')}")
        exit(1)

    time.sleep(5)

# 下载结果
result = requests.get(f"{API_URL}/tasks/{task_id}/result")
with open("result.zip", "wb") as f:
    f.write(result.content)

10.5 性能对比

假设单页平均处理时间 1 秒（Pipeline），4 卡 Router：

方案	processing_window_size	任务数	并行度	总时间估算
单 API, Pipeline	64	3	1	~150 秒
Router 4 卡, Pipeline	64	3	3	~50 秒（64/64/22 并行）
Router 4 卡, Pipeline	32	5	4	~40 秒（32×4+22）
单 API, Hybrid	64	1	1	~150 秒（VLM 内部串行窗口）
Router 4 卡, Hybrid	64	1	1	~150 秒（单文档不能跨卡并行）

结论：

Pipeline + 多卡 Router 是处理大文档最快的方式（可跨卡并行）
Hybrid/VLM 虽然精度高，但单个大文档不能跨卡并行，优势在精度而非速度
减小 processing_window_size 可以增加并行度，但需要更多 Worker

10.6 超大文档的最佳实践

bash 复制代码

# 1. 启动多卡 Router
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

# 2. 调小窗口增加并行度
export MINERU_PROCESSING_WINDOW_SIZE=32

# 3. 使用 Pipeline 后端（支持多任务并行）
mineru -p huge_doc.pdf -o output/ \
  --api-url http://localhost:8002 \
  -b pipeline

# 4. 如果追求精度而非速度，使用 Hybrid
mineru -p huge_doc.pdf -o output/ \
  --api-url http://localhost:8002 \
  -b hybrid-auto-engine
# 注意：Hybrid 下单文档不能跨卡并行，但流式写入不会 OOM

内存安全保证 ：无论文档多大，MinerU 都按窗口流式处理，不会一次性加载全部页面到内存，因此 不会因为文档太大而 OOM。

十一、mineru-api 命令参考

11.1 启动命令

bash 复制代码

mineru-api [选项]

选项	默认值	说明
`--host`	`127.0.0.1`	服务监听地址
`--port`	`8000`	服务监听端口
`--reload`	关闭	开发模式，代码变更自动重载
`--allow-public-http-client`	关闭	绑定 0.0.0.0 时允许 http-client 后端（有 SSRF 风险）
`--enable-vlm-preload`	`false`	启动时预加载 VLM 模型（首次请求更快但启动更慢）

11.2 启动示例

bash 复制代码

# 最简启动（本地访问）
mineru-api

# 对外开放
mineru-api --host 0.0.0.0 --port 8000

# 启动时预加载模型（适合生产环境）
mineru-api --host 0.0.0.0 --port 8000 --enable-vlm-preload true

# 开发模式
mineru-api --host 0.0.0.0 --port 8000 --reload

11.3 相关环境变量

bash 复制代码

# 并发控制
MINERU_API_MAX_CONCURRENT_REQUESTS=3    # 最大并发请求数（默认 3，macOS 为 1）

# 任务管理
MINERU_API_TASK_RETENTION_SECONDS=86400     # 任务保留时间（默认 24 小时）
MINERU_API_TASK_CLEANUP_INTERVAL_SECONDS=300 # 清理轮询间隔（默认 5 分钟）

# 处理参数
MINERU_PROCESSING_WINDOW_SIZE=64        # Pipeline 每批最大页数
MINERU_API_OUTPUT_ROOT=./output         # 输出目录

# 功能开关
MINERU_API_ENABLE_FASTAPI_DOCS=1        # 启用 Swagger UI（/docs）
MINERU_API_DISABLE_ACCESS_LOG=0         # 禁用访问日志

# 安全
MINERU_API_PUBLIC_BIND_EXPOSED=0        # 绑定 0.0.0.0 时设置
MINERU_API_ALLOW_PUBLIC_HTTP_CLIENT=0   # 允许公网 http-client

# 启动
MINERU_API_ENABLE_VLM_PRELOAD=0         # 预加载 VLM
MINERU_API_SHUTDOWN_ON_STDIN_EOF=0      # stdin EOF 时关闭（容器化用）

# 日志
MINERU_LOG_LEVEL=INFO                   # 日志级别：DEBUG/INFO/WARNING/ERROR

十二、mineru-router 命令参考

12.1 启动命令

bash 复制代码

mineru-router [选项]

选项	默认值	说明
`--host`	`127.0.0.1`	Router 监听地址
`--port`	`8002`	Router 监听端口
`--local-gpus`	`auto`	本地 GPU Worker：`auto`（自动检测）/ `none`（不启动）/ `0,1,2`（指定）
`--upstream-url`	无	远程 API 地址，可重复指定多个
`--worker-host`	`127.0.0.1`	本地 Worker 的监听地址
`--enable-vlm-preload`	`false`	在 Worker 中预加载 VLM 模型
`--allow-public-http-client`	关闭	允许 http-client 后端
`--reload`	关闭	开发模式

12.2 启动示例

bash 复制代码

# 1. 自动检测所有 GPU，每个 GPU 启动一个 Worker
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

# 2. 指定使用 GPU 0 和 GPU 1
mineru-router --host 0.0.0.0 --port 8002 --local-gpus "0,1"

# 3. 不启动本地 Worker，代理到远程 API
mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus none \
  --upstream-url http://192.168.1.101:8000 \
  --upstream-url http://192.168.1.102:8000

# 4. 混合模式：本地 GPU + 远程 API
mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus "0,1" \
  --upstream-url http://192.168.1.200:8000

# 5. 仅使用 CPU（无 GPU）
mineru-router --host 0.0.0.0 --port 8002 --local-gpus none

# 6. 预加载 VLM（减少首次请求延迟）
mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus auto \
  --enable-vlm-preload true

12.3 相关环境变量

bash 复制代码

# Worker 管理
MINERU_ROUTER_LOCAL_GPUS=auto               # 同 --local-gpus
MINERU_ROUTER_WORKER_HOST=127.0.0.1          # 同 --worker-host
MINERU_ROUTER_ENABLE_VLM_PRELOAD=0           # 同 --enable-vlm-preload
MINERU_ROUTER_WORKER_ARGS_JSON='[]'          # Worker 额外 CLI 参数

# 上游 API
MINERU_ROUTER_UPSTREAM_URLS_JSON='[]'        # 同 --upstream-url（JSON 数组）

# 健康监控
MINERU_ROUTER_WORKER_REFRESH_INTERVAL_SECONDS=2  # 健康检查间隔（秒）

# 安全
MINERU_ROUTER_PUBLIC_BIND_EXPOSED=0
MINERU_ROUTER_ALLOW_PUBLIC_HTTP_CLIENT=0

# GPU 可见性（标准 CUDA 变量）
CUDA_VISIBLE_DEVICES=0,1,2,3                 # 限制可见 GPU

十三、API 接口完整参考

13.1 接口总览

方法	路径	说明	Router 支持
`GET`	`/health`	健康检查	✓
`POST`	`/file_parse`	同步解析（等待完成）	✓
`POST`	`/tasks`	异步提交（立即返回 task_id）	✓
`GET`	`/tasks/{task_id}`	查询任务状态	✓
`GET`	`/tasks/{task_id}/result`	获取任务结果	✓

Router 的接口与 API 完全一致，客户端无需区分连接的是 API 还是 Router。

13.2 POST /file_parse --- 同步解析

阻塞等待，解析完成后直接返回结果。

请求参数（multipart/form-data）

参数	类型	默认值	说明
`files`	File[]	必填	PDF/图片/DOCX/PPTX/XLSX 文件
`backend`	string	`hybrid-auto-engine`	解析后端
`parse_method`	string	`auto`	解析方法：auto/txt/ocr
`lang_list`	string[]	`["ch"]`	OCR 语言（ch/en/korean/japan 等）
`formula_enable`	boolean	`true`	启用公式识别
`table_enable`	boolean	`true`	启用表格识别
`start_page_id`	integer	`0`	起始页码（从 0 开始）
`end_page_id`	integer	`99999`	结束页码
`return_md`	boolean	`true`	返回 Markdown
`return_middle_json`	boolean	`false`	返回中间 JSON
`return_content_list`	boolean	`false`	返回内容列表 JSON
`return_model_output`	boolean	`false`	返回模型输出 JSON
`return_images`	boolean	`false`	返回提取的图片
`return_original_file`	boolean	`false`	在 ZIP 中包含原始文件
`response_format_zip`	boolean	`false`	以 ZIP 文件返回
`server_url`	string	无	远程 OpenAI 兼容服务地址（http-client 后端用）

调用示例

bash 复制代码

# 最简同步解析
curl -X POST http://localhost:8000/file_parse \
  -F "files=@doc.pdf" \
  -o result.json

# 返回 ZIP 包（含 Markdown + 图片）
curl -X POST http://localhost:8000/file_parse \
  -F "files=@doc.pdf" \
  -F "response_format_zip=true" \
  -F "return_images=true" \
  -F "return_original_file=true" \
  -o result.zip

# 使用 Pipeline 后端
curl -X POST http://localhost:8000/file_parse \
  -F "files=@doc.pdf" \
  -F "backend=pipeline" \
  -o result.json

# 只解析第 3-10 页
curl -X POST http://localhost:8000/file_parse \
  -F "files=@doc.pdf" \
  -F "start_page_id=2" \
  -F "end_page_id=9" \
  -o result.json

# 通过 Router 调用（Router 自动转发到最优 Worker）
curl -X POST http://localhost:8002/file_parse \
  -F "files=@doc.pdf" \
  -F "backend=hybrid-auto-engine" \
  -o result.json

响应格式

JSON 模式 （response_format_zip=false）：

json 复制代码

{
  "task_id": "uuid",
  "status": "completed",
  "backend": "hybrid-auto-engine",
  "file_names": ["doc.pdf"],
  "version": "3.1.0",
  "results": {
    "doc.pdf": {
      "md_content": "# 文档标题\n\n正文内容...",
      "middle_json": "{...}",
      "content_list": "[...]",
      "model_output": "{...}",
      "images": {
        "image_0.jpg": "data:image/jpeg;base64,..."
      }
    }
  }
}

ZIP 模式 （response_format_zip=true）：

返回 ZIP 文件下载，响应头包含：

X-MinerU-Task-Id: 任务 ID
X-MinerU-Task-Status: 任务状态
Content-Disposition: attachment; filename="{task_id}.zip"

13.3 POST /tasks --- 异步提交

立即返回 task_id，通过轮询获取结果。

请求参数

与 /file_parse 完全一致。

调用示例

bash 复制代码

# 提交异步任务
curl -X POST http://localhost:8000/tasks \
  -F "files=@doc.pdf" \
  -F "backend=hybrid-auto-engine" \
  -F "return_md=true"

# 通过 Router 提交
curl -X POST http://localhost:8002/tasks \
  -F "files=@doc.pdf" \
  -F "backend=pipeline"

响应（202 Accepted）

json 复制代码

{
  "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "pending",
  "backend": "hybrid-auto-engine",
  "file_names": ["doc.pdf"],
  "created_at": "2026-05-03T10:00:00Z",
  "started_at": null,
  "completed_at": null,
  "error": null,
  "status_url": "http://localhost:8000/tasks/a1b2c3d4-...",
  "result_url": "http://localhost:8000/tasks/a1b2c3d4-.../result",
  "queued_ahead": 2,
  "message": "Task submitted successfully"
}

13.4 GET /tasks/{task_id} --- 查询任务状态

bash 复制代码

curl http://localhost:8000/tasks/{task_id}

# 通过 Router 查询
curl http://localhost:8002/tasks/{task_id}

响应

json 复制代码

{
  "task_id": "uuid",
  "status": "processing",
  "backend": "hybrid-auto-engine",
  "file_names": ["doc.pdf"],
  "created_at": "2026-05-03T10:00:00Z",
  "started_at": "2026-05-03T10:00:01Z",
  "completed_at": null,
  "error": null,
  "status_url": "http://localhost:8000/tasks/uuid",
  "result_url": "http://localhost:8000/tasks/uuid/result",
  "queued_ahead": 0
}

状态值 ：pending → processing → completed / failed

13.5 GET /tasks/{task_id}/result --- 获取任务结果

bash 复制代码

curl http://localhost:8000/tasks/{task_id}/result -o result.zip

响应码

状态码	说明
`200`	任务完成，返回结果
`202`	任务未完成，返回当前状态
`404`	任务不存在
`409`	任务失败

13.6 GET /health --- 健康检查

bash 复制代码

# API 健康检查
curl http://localhost:8000/health

# Router 健康检查（包含所有 Worker 状态）
curl http://localhost:8002/health

API 响应

json 复制代码

{
  "status": "healthy",
  "version": "3.1.0",
  "protocol_version": 1,
  "queued_tasks": 2,
  "processing_tasks": 3,
  "completed_tasks": 50,
  "failed_tasks": 1,
  "max_concurrent_requests": 3,
  "processing_window_size": 64,
  "task_retention_seconds": 86400,
  "task_cleanup_interval_seconds": 300
}

Router 响应（额外包含 Worker 列表）

json 复制代码

{
  "status": "healthy",
  "version": "3.1.0",
  "servers": [
    {
      "server_id": "local-gpu-0",
      "base_url": "http://127.0.0.1:9000",
      "source": "local",
      "healthy": true,
      "queued_tasks": 1,
      "processing_tasks": 2,
      "completed_tasks": 15,
      "failed_tasks": 0,
      "max_concurrent_requests": 3
    },
    {
      "server_id": "local-gpu-1",
      "base_url": "http://127.0.0.1:9001",
      "source": "local",
      "healthy": true,
      "queued_tasks": 0,
      "processing_tasks": 1,
      "completed_tasks": 20,
      "failed_tasks": 0,
      "max_concurrent_requests": 3
    }
  ]
}

十四、完整使用示例（Python SDK）

14.1 同步解析

python 复制代码

import requests

API_URL = "http://localhost:8000"  # 或 Router: http://localhost:8002

def parse_sync(file_path, backend="hybrid-auto-engine"):
    """同步解析：等待完成后直接获取结果"""
    with open(file_path, "rb") as f:
        resp = requests.post(
            f"{API_URL}/file_parse",
            files={"files": f},
            data={
                "backend": backend,
                "return_md": True,
                "return_images": True,
                "response_format_zip": True,
            },
        )

    if resp.status_code == 200:
        task_id = resp.headers.get("X-MinerU-Task-Id")
        with open(f"result_{task_id}.zip", "wb") as out:
            out.write(resp.content)
        print(f"结果已保存: result_{task_id}.zip")
    else:
        print(f"解析失败: {resp.status_code} {resp.text}")

# 使用
parse_sync("input.pdf")
parse_sync("input.docx", backend="pipeline")
parse_sync("input.pptx")

14.2 异步解析（轮询）

python 复制代码

import requests
import time

API_URL = "http://localhost:8002"  # Router 地址

def parse_async(file_path, backend="hybrid-auto-engine"):
    """异步解析：提交后轮询等待"""
    # 1. 提交任务
    with open(file_path, "rb") as f:
        resp = requests.post(
            f"{API_URL}/tasks",
            files={"files": f},
            data={"backend": backend, "return_md": True},
        )

    data = resp.json()
    task_id = data["task_id"]
    print(f"任务已提交: {task_id}, 排队位置: {data['queued_ahead']}")

    # 2. 轮询等待
    while True:
        status_resp = requests.get(f"{API_URL}/tasks/{task_id}")
        status = status_resp.json()

        if status["status"] == "completed":
            break
        elif status["status"] == "failed":
            print(f"任务失败: {status.get('error')}")
            return None

        print(f"状态: {status['status']}, 前方排队: {status.get('queued_ahead', 0)}")
        time.sleep(3)

    # 3. 下载结果
    result_resp = requests.get(f"{API_URL}/tasks/{task_id}/result")
    output_file = f"result_{task_id}.zip"
    with open(output_file, "wb") as f:
        f.write(result_resp.content)
    print(f"结果已保存: {output_file}")
    return output_file

# 使用
parse_async("big_doc.pdf", backend="pipeline")

14.3 批量文件解析

python 复制代码

import requests
import time
from concurrent.futures import ThreadPoolExecutor

API_URL = "http://localhost:8002"  # Router 地址

def parse_single(file_path):
    """提交单个文件的异步任务"""
    with open(file_path, "rb") as f:
        resp = requests.post(
            f"{API_URL}/tasks",
            files={"files": f},
            data={"backend": "pipeline", "return_md": True},
        )
    return resp.json()["task_id"], file_path

def wait_and_download(task_id, file_path):
    """等待并下载结果"""
    while True:
        status = requests.get(f"{API_URL}/tasks/{task_id}").json()
        if status["status"] == "completed":
            break
        elif status["status"] == "failed":
            print(f"失败: {file_path} - {status.get('error')}")
            return
        time.sleep(3)

    result = requests.get(f"{API_URL}/tasks/{task_id}/result")
    filename = file_path.replace(".pdf", ".zip")
    with open(filename, "wb") as f:
        f.write(result.content)
    print(f"完成: {file_path}")

def batch_parse(file_list):
    """批量解析多个文件"""
    # 提交所有任务
    tasks = [parse_single(f) for f in file_list]
    print(f"已提交 {len(tasks)} 个任务")

    # 并发等待下载
    with ThreadPoolExecutor(max_workers=len(file_list)) as pool:
        pool.map(lambda t: wait_and_download(*t), tasks)

# 使用
batch_parse(["doc1.pdf", "doc2.pdf", "doc3.pdf", "presentation.pptx"])

14.4 通过 Swagger UI 交互测试

启动 API 或 Router 后，浏览器访问：

复制代码

http://localhost:8000/docs    # API Swagger UI
http://localhost:8002/docs    # Router Swagger UI

可以直接在网页上测试所有接口，无需写代码。

【MinerU】API 服务与 Router服务

MinerU API 服务与 Router 负载均衡指南

一、API 服务架构概览

二、同步模式 vs 异步模式

2.1 同步模式 POST /file_parse

2.2 异步模式 POST /tasks

2.3 同步 vs 异步对比

三、并发限制与任务管理

3.1 并发控制机制

3.2 任务队列

3.3 任务自动清理

3.4 相关环境变量

3.5 健康检查

四、Router 服务（负载均衡）

4.1 Router 是什么

4.2 负载均衡算法

4.3 健康监控

4.4 Router 两种部署模式

模式 A：本地 Worker 自动管理（推荐）

模式 B：上游 API 代理

混合模式

4.5 Router 环境变量

4.6 Router 健康检查

五、部署示例

5.1 单机单 GPU 部署

5.2 单机多 GPU 部署（Router 自动管理）

5.3 多机集群部署

5.4 Docker Compose 部署

5.5 OpenAI 兼容服务

六、CLI 工具的 API 集成

6.1 自动启动本地 API

6.2 指定远程 API

6.3 CLI 的任务派发策略

七、性能调优建议

7.1 单机调优

7.2 集群调优

7.3 吞吐量估算

八、Router 负载均衡详解

8.1 Router 核心架构

8.2 负载均衡算法详解

第一步：负载分数计算

第二步：选择流程

第三步：具体选择示例

8.3 Worker 健康监控与自动恢复

8.4 Router 的请求转发机制

九、多卡多服务调度实战

9.1 场景一：单机 4 卡 GPU 服务器

9.2 场景二：多机集群

9.3 场景三：混合部署（本地 GPU + 远程 API）

9.4 并发数调优

十、150 页超大文档处理方案

10.1 问题分析

10.2 MinerU 内置的分批处理机制

10.3 不同后端的处理方式

Pipeline 后端

Hybrid/VLM 后端

10.4 实战操作

方案 A：单机单卡（最简单）

方案 B：单机多卡（推荐）

方案 C：调整处理窗口大小

方案 D：通过 API 异步处理

10.5 性能对比

10.6 超大文档的最佳实践

十一、mineru-api 命令参考

11.1 启动命令

11.2 启动示例

11.3 相关环境变量

十二、mineru-router 命令参考

12.1 启动命令

12.2 启动示例

12.3 相关环境变量

十三、API 接口完整参考

13.1 接口总览

13.2 POST /file_parse --- 同步解析

请求参数（multipart/form-data）

调用示例

响应格式

13.3 POST /tasks --- 异步提交

请求参数

调用示例

2.1 同步模式 `POST /file_parse`

2.2 异步模式 `POST /tasks`