FastChat 对战评测

FastChat 项目的详细介绍和使用说明，帮助你快速在公司内部部署一个支持 Elo 对战评测的系统（类似 LMSYS Chatbot Arena）。

🎯 一、FastChat 项目简介

📌 项目地址：

👉 https://github.com/lm-sys/FastChat
🧠 功能核心：
- 支持多模型部署；
- 提供 Web UI 进行对话或 对战评测（arena）；
- 支持 匿名 A/B 投票 + Elo 排名；
- 可使用 OpenAI、LLaMA、ChatGLM、Mistral 等模型。

🧩 二、FastChat 架构概览

复制代码

┌────────────┐
│ 用户浏览器 │
└─────┬──────┘
      │
      ▼
┌────────────────────────────┐
│ FastChat Web UI（前端）    │ ← 投票系统、聊天界面
└─────┬──────────────────────┘
      │
┌─────▼──────┐     ┌──────────────┐
│ Controller │────▶│ Model Worker │ ← 每个模型一个 worker
└────────────┘     └──────────────┘

🧪 三、安装与运行（以本地部署为例）

✅ 1. 安装环境依赖（推荐虚拟环境）

bash 复制代码

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

# 安装依赖
pip install -e ".[model_worker,webui]"

如果用 OpenAI API，不需要本地模型，只需要 openai 的配置。

✅ 2. 启动各个组件

（1）启动 controller：

bash 复制代码

python3 -m fastchat.serve.controller

（2）启动 model_worker（每个模型一个）

以 OpenAI 为例：

bash 复制代码

python3 -m fastchat.serve.model_worker \
    --model-path gpt-4 \
    --worker-address http://localhost:21001 \
    --controller-address http://localhost:21000 \
    --model-names gpt-4 \
    --worker-id gpt4

也支持本地模型（如 llama, chatglm, mistral），只要 HuggingFace 上能下载的模型路径都可以。

（3）启动 Web UI（带 arena 功能）

bash 复制代码

python3 -m fastchat.serve.gradio_web_server --share

默认地址是 http://localhost:7860，其中包括：

Chat 界面；
Arena 对战页面；
可选历史记录。

🕹️ 四、使用 Elo 对战功能

✅ 1. 进入 Arena 页面

在浏览器访问：

复制代码

http://localhost:7860/?arena=true

你会看到一个 A/B 对战界面：

显示一个随机 prompt；
模型 A 与模型 B 的匿名回答；
用户投票："A 更好" | "B 更好" | "平局"。

每投一票，后台会自动更新 Elo 分数，并记录到本地数据库。

📊 五、如何查看 Elo 分数结果？

项目会在以下文件记录评测结果和 Elo：

复制代码

fastchat/logs/arena_vote.json

格式类似于：

json 复制代码

{
  "prompt": "...",
  "model_a": "gpt-4",
  "model_b": "claude-3",
  "winner": "a"
}

你可以使用脚本或 Excel 分析 Elo 演化。

如需要，我可以帮你写个 Python 脚本：

bash 复制代码

python analyze_elo.py --log_path logs/arena_vote.json

🧱 六、自定义 prompt 与评测题目（批量）

如果想用自己的一批测试题，推荐使用：

bash 复制代码

python3 -m fastchat.llm_judge.gen_model_answer \
    --model-list gpt-4 claude-3 llama3 \
    --question-file path/to/my_questions.json \
    --answer-dir ./answers/

然后配合投票界面批量比对。

✅ 七、适合场景

场景	是否适用
公司内部对比多个大模型能力	✅
对抗评测、找最优微调模型	✅
对话 UX、提示词优化评估	✅
业务偏好模型偏好测试（安全、合理）	✅

🛠️ 八、注意事项与建议

项目	建议
模型 worker 数量	每个模型建议单独 worker，避免干扰
投票数据采样	Prompt 尽量多样化，避免 prompt 过拟合
投票者管理	可部署在内网、加上登录机制（或后期打标签）
数据分析	定期汇总 Elo 排行榜，可用 Excel / pandas 实现