私有化部署Ragflow的预训练模型

部署ragflow代码库中的det.onnx模型（通常是目标检测或文档结构解析类模型，如版面分析模型）到火山云，需基于ONNX Runtime推理框架，结合火山云的计算资源和服务能力实现。以下是具体步骤：

一、模型特性与依赖确认

det.onnx是ONNX格式的预训练模型，需明确：

功能：ragflow中的det.onnx通常用于文档版面分析（如识别文本块、表格、图片等区域），为后续RAG流程提供结构化支持。
依赖：需onnxruntime（或onnxruntime-gpu，推荐GPU加速）、OpenCV（图像处理）、Python 3.8+等环境。
输入输出 ：输入为图像数据（如(1, 3, H, W)的张量），输出为检测到的区域坐标和类别。

二、火山云部署步骤

1. 选择计算资源

根据模型推理需求选择火山云实例：

推荐实例 ：GPU计算型ECS（如ecs.gni3cg.5xlarge，1×A10 GPU，16GB显存），满足轻量化ONNX模型推理需求。
系统镜像：Ubuntu 20.04（便于安装GPU驱动和依赖）。

2. 环境配置

登录实例后，安装基础依赖和ONNX Runtime：

bash 复制代码

# 更新系统
sudo apt update && sudo apt install -y python3-pip python3-dev libgl1-mesa-glx

# 安装ONNX Runtime（GPU版本，需匹配CUDA）
# 查看CUDA版本：nvidia-smi，假设为11.7
pip3 install onnxruntime-gpu==1.14.1  # 1.14.1版本兼容CUDA 11.7

# 安装其他依赖（图像处理、API服务）
pip3 install opencv-python fastapi uvicorn numpy pillow

若使用CPU推理，安装onnxruntime（非GPU版本）：pip3 install onnxruntime==1.14.1。

3. 上传模型与代码

模型获取 ：从ragflow代码库的models/目录中提取det.onnx，或通过以下方式下载：

bash 复制代码

# 克隆ragflow代码库获取模型
git clone https://github.com/infiniflow/ragflow.git
cp ragflow/models/det.onnx ./  # 假设模型位于此路径

模型存储优化 ：若模型较大，先上传至火山云对象存储（TOS），再通过tosutil工具下载到实例：

bash 复制代码

# 安装TOS客户端
wget https://tos-tools.tos-cn-beijing.volces.com/tosutil/v1.7.2/linux-amd64/tosutil
chmod +x tosutil
# 从TOS下载模型（需配置AK/SK）
./tosutil cp tos://your-bucket/det.onnx ./det.onnx

4. 编写推理服务代码

用FastAPI包装det.onnx模型，提供HTTP接口供外部调用。创建det_server.py：

python 复制代码

import onnxruntime as ort
import cv2
import numpy as np
from fastapi import FastAPI, UploadFile, File
from PIL import Image
import io

app = FastAPI()

# 加载ONNX模型
model_path = "./det.onnx"
# 配置GPU推理（若使用CPU，删除providers参数）
session = ort.InferenceSession(
    model_path,
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

# 获取模型输入输出信息
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape  # 如(1, 3, 640, 640)
output_names = [out.name for out in session.get_outputs()]

def preprocess(image: Image.Image) -> np.ndarray:
    """预处理图像：resize、归一化等（需与训练时一致）"""
    img = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
    img = cv2.resize(img, (input_shape[3], input_shape[2]))  # 按模型输入尺寸resize
    img = img.transpose(2, 0, 1)  # HWC -> CHW
    img = img.astype(np.float32) / 255.0  # 归一化
    img = np.expand_dims(img, axis=0)  # 增加batch维度
    return img

def postprocess(outputs: list) -> list:
    """后处理：解析输出为检测框和类别（根据模型输出格式调整）"""
    # 示例：假设输出为[boxes, scores, classes]
    boxes = outputs[0].squeeze().tolist()
    scores = outputs[1].squeeze().tolist()
    classes = outputs[2].squeeze().tolist()
    # 过滤低置信度结果
    results = []
    for box, score, cls in zip(boxes, scores, classes):
        if score > 0.5:
            results.append({
                "box": box,  # [x1, y1, x2, y2]
                "score": score,
                "class": int(cls)
            })
    return results

@app.post("/detect")
async def detect(file: UploadFile = File(...)):
    """接收图像文件，返回检测结果"""
    # 读取图像
    image = Image.open(io.BytesIO(await file.read()))
    # 预处理
    input_data = preprocess(image)
    # 推理
    outputs = session.run(output_names, {input_name: input_data})
    # 后处理
    results = postprocess(outputs)
    return {"results": results}

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)  # 绑定所有网卡，端口8000

注意：preprocess和postprocess需根据det.onnx的实际训练预处理逻辑调整（如归一化参数、输入尺寸等）。

5. 启动服务与网络配置

启动服务：

bash 复制代码

# 后台启动服务（日志输出到det.log）
nohup python3 det_server.py > det.log 2>&1 &

验证服务 ：

在实例内执行curl http://localhost:8000/health，返回{"status":"healthy"}说明启动成功。
开放网络访问：
1. 进入火山云ECS控制台，找到实例对应的安全组 ，添加入方向规则：
  - 端口：8000
  - 来源：0.0.0.0/0（或限制为特定IP）。
2. 若需公网访问，为实例绑定弹性公网IP（EIP）。

6. 测试模型服务

通过公网IP调用接口测试：

bash 复制代码

# 发送图像文件测试检测接口
curl -X POST "http://[EIP]:8000/detect" -F "file=@test_image.jpg"

返回结果示例：

json 复制代码

{
  "results": [
    {"box": [100, 200, 300, 400], "score": 0.92, "class": 0},  # 0表示文本块
    {"box": [500, 300, 700, 500], "score": 0.88, "class": 1}   # 1表示表格
  ]
}

三、优化与扩展

容器化部署 ：

将服务打包为Docker镜像，上传至火山云容器镜像服务（CR），再通过容器服务（VKE）部署，支持弹性扩缩容：

dockerfile 复制代码

# Dockerfile
FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04
RUN apt update && apt install -y python3-pip libgl1-mesa-glx
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY det.onnx det_server.py ./
CMD ["python3", "det_server.py"]

性能优化：
- 启用ONNX Runtime的CUDAExecutionProvider加速推理；
- 批量处理图像（修改接口支持多图输入）；
- 调整输入尺寸（在精度允许范围内减小尺寸，提升速度）。
监控与运维：
- 集成火山云日志服务（LTS） 收集推理日志；
- 通过云监控监控GPU利用率、内存占用等指标，设置告警。

四、注意事项

版本兼容性 ：确保onnxruntime-gpu版本与实例的CUDA版本匹配（参考ONNX Runtime文档）。
模型输入格式 ：det.onnx的输入尺寸、通道顺序（RGB/BGR）需与预处理逻辑严格一致，否则会导致推理结果异常。
权限控制：生产环境中建议通过火山云API网关配置认证（如API-Key），限制接口访问权限。

通过以上步骤，可在火山云上稳定部署det.onnx模型，为ragflow的文档解析流程提供后端推理能力。