使用Rag 命中用户feedback提升triage agent 准确率

简述

使用 RAG（Retrieval-Augmented Generation），提升 Triage Agent 对用户反馈的处理准确率。这个方案的背景源于当前系统服务多个租户，各租户在业务场景、问题描述方式、术语使用习惯等方面存在极大差异，导致通用模型难以在所有租户场景下保持一致的分类准确率。同时，部分租户出于数据安全、成本或维护复杂性的考虑，明确表示不接受通过机器学习模型训练或 LoRA 微调等需要长期迭代和算力投入的方式进行个性化适配。

在此约束下，项目选择以大语言模型（LLM）的基座能力为核心，通过 API 调用快速实现 Triage Agent 的功能落地，确保在最短时间内为用户提供可用的智能分诊服务。现阶段的目标并非追求绝对精准，而是先让 AI 基于其通用理解能力对用户反馈进行初步判断，形成可运行的闭环。

在此基础上，引入 RAG 机制作为关键增强手段：将历史已标注的用户反馈（包括问题描述、分类结果、人工修正记录等）构建成向量知识库。当新反馈进入系统时，RAG 首先检索与之语义相似的历史案例，并将这些高相关性上下文注入到 LLM 的提示词中，引导其参考过往经验进行推理与分类。这种方式无需修改模型参数，即可实现"类记忆"效果，显著提升模型在特定租户语境下的判断准确性与一致性。

随着用户反馈数据的持续积累，系统将进一步构建"反馈闭环"------通过分析用户对 AI 分类结果的修正行为，动态优化检索库中的标注质量与向量表示，并探索轻量化的模型演进路径（如定向 prompt 优化、小样本学习或增量式微调），逐步推动 LLM 朝着更贴合实际业务需求的方向持续演进。当前阶段的 RAG 方案，既是快速交付的务实选择，也为后续智能化升级奠定了可扩展的数据与架构基础。

原型方案设计

为了让方案能尽可能快的方式呈现，我选择使用了 MaxKB 作为核心框架，充分利用其开箱即用的可视化知识库管理、RAG 流程支持和轻量级部署优势，显著缩短了从开发到上线的周期。MaxKB 支持灵活的文档导入、切片策略配置和检索测试功能，使得在多租户环境下能快速为每个租户独立配置专属知识空间，满足数据隔离与个性化分诊的需求。

向量数据库方面，选择使用 PostgreSQL 的扩展 pgvector，主要基于以下几点考虑：

技术栈统一 ：项目已使用 PostgreSQL 作为主业务数据库，引入 pgvector 无需额外维护独立的向量数据库（如 Milvus 或 Pinecone），降低运维复杂度；
数据一致性保障：用户反馈、工单记录与向量索引可共存于同一事务体系，确保数据同步的实时性与可靠性；
成本可控：避免引入新组件带来的资源开销和学习成本，适合现阶段快速验证与中小规模部署；
兼容性强 ：MaxKB 支持自定义向量数据库对接，通过简单配置即可集成 pgvector，实现无缝替换。

详细设计

将feedback的ticket ， triage result 结构为纯文本可以轻易chrunk化的结构

import json

读取JSON文件

with open('../sample/Billing_Codes.json', 'r') as f:
data = json.load(f)

转换格式

output_lines = []
billing_codes = set() # 用于存储唯一的billing codes

修改:为每个item添加编号

for index, item in enumerate(data, 1):
title = item['title']
description = item['description']
billing_code = item['fields_value_name']
复制代码
```
 # 收集唯一的billing codes
 billing_codes.add(billing_code)
 
 # 创建格式化文本，包含编号
 formatted_text = f"{index}. text: {title}  {description}, billing codes: {billing_code}"
 output_lines.append(formatted_text)
```
用###分割子项目

output_text = "\n######\n".join(output_lines)

输出结果

print(output_text)

输出billing code类别总数

print(f"\n总共有 {len(billing_codes)} 类别的billing code")

print('billing_codes', "\n".join(billing_codes))

with open('./Billing_Codes_output.txt', 'w') as f:
f.write(output_text)
f.write(f"\n\n总共有 {len(billing_codes)} 类别的billing code")

每个ticket 单独trunk一个片段，为了让嵌入模型能尽可能处理多的token, 我选择使用之前利用的longformer-4096-base模型, 配置文件修改

复制代码

# 模型路径 如果EMBEDDING_MODEL_NAME是绝对路径则无效,反之则会从https://huggingface.co/下载模型到当前目录
EMBEDDING_MODEL_PATH: /opt/maxkb/model/
# 模型名称 如果模型名称是绝对路径 则会加载目录下的模型,如果是模型名称,则会在https://huggingface.co/下载模型 模型的下载位置为EMBEDDING_MODEL_PATH
EMBEDDING_MODEL_NAME: /opt/maxkb/model/longformer-base-4096

调试过程里发现虽然配置了

但是分段嵌入一直失败，调试代码发现embedding 的调用方式走了web访问的方式

需要在tools.py里修改下面方法，如果provider携带使用local字符，use_local设置为true

复制代码

def get_model_(provider, model_type, model_name, credential, model_id, use_local=False, **kwargs):
    """
    获取模型实例
    @param provider:   供应商
    @param model_type: 模型类型
    @param model_name: 模型名称
    @param credential: 认证信息
    @param model_id:   模型id
    @param use_local:  是否调用本地模型 只适用于本地供应商
    @return: 模型实例
    """
    if 'local' in provider:
        use_local = True
    model = get_provider(provider).get_model(model_type,
                                             model_name,
                                             json.loads(rsa_long_decrypt(credential)),
                                             model_id=model_id,
                                             use_local=use_local,
                                             streaming=True, **kwargs)
    return model

2）先检索ticket相似度，返回相似度高的

maxkb使用/hit_test接口走命中测试

复制代码

        def hit_test(self):
            self.is_valid()
            vector = VectorStore.get_embedding_vector()
            exclude_document_id_list = [
                str(
                    document.id
                ) for document in QuerySet(Document).filter(knowledge_id=self.data.get('knowledge_id'), is_active=False)
            ]
            model = get_embedding_model_by_knowledge_id(self.data.get('knowledge_id'))
            # 向量库检索
            hit_list = vector.hit_test(
                self.data.get('query_text'),
                [self.data.get('knowledge_id')],
                exclude_document_id_list,
                self.data.get('top_number'),
                self.data.get('similarity'),
                SearchMode(self.data.get('search_mode')),
                model
            )
            hit_dict = reduce(lambda x, y: {**x, **y}, [{hit.get('paragraph_id'): hit} for hit in hit_list], {})
            p_list = list_paragraph([h.get('paragraph_id') for h in hit_list])
            return [
                {
                    **p,
                    'similarity': hit_dict.get(p.get('id')).get('similarity'),
                    'comprehensive_score': hit_dict.get(p.get('id')).get('comprehensive_score')
                } for p in p_list
            ]

在maxkb页面进行命中测试

3）将相似度高的ticket作为提示词中的参考用例子给llm进行推理

这里使用之前预测率偏低的billing code做试验

1）每个ticke content + billing codes 结果做trunk，通过嵌入模型编码，制作知识库

2）flow 工具集成了知识库+ llm, 问题先在知识库上做命中，然后传入llm作为推理依据，这就是一般说的谜底在谜面上

提示词修改为下面，其中{data}就是文本相似命中的chrunk

复制代码

Task: Select Billing Code.
Candidates:
Fixed Price Small Project Labour (Pre-paid)
Remote Urgent Support
Non-Billable Sales
Onsite Normal Support
Onsite Urgent Support
Outside Contract - Urgent Support
Non-Billable Support
Contract Labour - GST
Remote Normal Support
Development Support
Outside Contract 7pm-7am - Normal Support
Outside Contract - Normal Support

Input ticket content:
{question}

Experience:
Find the best matching triage result from the following experience
{data}

Output Json Format:
Generate a raw JSON response containing the predicted classification value for the given ticket text. Do not include any Markdown code block markers (e.g., ```json), comments, or additional formatting. Output only the JSON object:
1 classification_value Assign full Billing Code name
2 confidence_value Estimate confidence score, [0~100], number based on clarity of text and dimension alignment
3 ref_ticket Assign the billing code experience ticket
4 ref_result Assign the billing code pointed to by the reference triage experience

3）ouput格式输出

测试

使用maxkb的接口进行测试

复制代码

HEADERS = {
    'accept': '*/*',
    'Authorization': 'Bearer application-fe728b15cbd1949fb6c555f2ac821da1',
    'Content-Type': 'application/json',
    'X-CSRFTOKEN': 'My7YR2pbk6XVNnpTjOqHv2V49ke11CAJY9BdeQbj0DOljUwWNSFJNjIS9DByQj5V'
}


def call_api(message: str, retries: int = 3, delay: int = 10) -> dict:
    """
    调用分类 API with retry logic
    """
    payload = {
        "message": message,
        "stream": False,
        "re_chat": True
    }
    for attempt in range(retries):
        try:
            response = requests.post(API_URL, headers=HEADERS, json=payload)
            response.raise_for_status()
            return response.json()
        except RequestException as e:
            print(f"API 调用失败 (尝试 {attempt + 1}/{retries}): {e}")
            if attempt < retries - 1:
                sleep(delay)
                continue
            return {}


def extract_classification(data: dict) -> str:
    """
    从 API 返回中提取 classification_value
    """
    try:
        if 'data' not in data or 'content' not in data['data']:
            print(f"Invalid response structure: {data}")
            return ""
        content = data['data']['content']
        print(f"Raw content: {content}")  # Log the raw content
        parsed_content = json.loads(content)
        classification = parsed_content.get('classification_value', '').strip()
        return classification
    except json.JSONDecodeError as e:
        print(f"JSON 解析错误: {content}, 错误: {e}")
        return ""
    except Exception as e:
        print(f"从 {content} 里解析 classification_value 失败: {e}")
        return ""


def main():
    # 读取 JSON 文件
    with open(FILE_PATH, 'r', encoding='utf-8') as f:
        records: List[Dict] = json.load(f)

    total = len(records)
    correct = 0
    results = []

    print(f"开始处理 {total} 条记录...\n")

    for i, record in enumerate(records):
        title = record.get('title', '').strip()
        description = record.get('description', '').strip()
        expected = record.get('fields_value_name', '').strip()

        # 构造 message
        message = f"{title} {description}".replace('\n', ' ').replace('\r', ' ').replace('\t', ' ')

        # 调用 API
        print(f"[{i+1}/{total}] 正在处理:\n {message}，\nexpected: {expected}")
        api_response = call_api(message)

        # 修改:即使API调用失败也继续处理，而不是跳过
        if not api_response:
            print(f"  API调用失败，记录预测为空")
            predicted = ""
        else:
            predicted = extract_classification(api_response)

        # 比较
        is_correct = predicted == expected
        if is_correct:
            correct += 1

        # 保存结果
        results.append({
            "title": title,
            "description": description,
            "expected": expected,
            "predicted": predicted,
            "is_correct": is_correct
        })

        print(f"  预测: '{predicted}' -> {'✅ 正确' if is_correct else '❌ 错误'}\n")

    # 输出统计结果
    accuracy = correct / total if total > 0 else 0
    print(f"✅ 处理完成！")
    print(f"总数: {total}")
    print(f"正确: {correct}")
    print(f"准确率: {accuracy:.2%}")

    # 可选：保存结果到文件
    with open('evaluation_results.json', 'w', encoding='utf-8') as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    print(f"详细结果已保存到 evaluation_results.json")


if __name__ == '__main__':
    main()

    # message = "My problem isn't listed or I have feedback If you couldn't find your problem listed under a category or you would like to submit feedback for the support portal    Details  -----------------------------------------  Requesting an IT Service: Service Requests are for things that aren't broken but IT need to help with, like changing mailbox access or setting up a computer. Request an IT Service here  Would you like to provide feedback or report a different problem?: Different problem  Briefly describe the problem that is occurring: Requesting admin access to be able to install triconvey，"
    # api_response = call_api(message)
    # predicted = extract_classification(api_response)

下面是试验结果，结果好的有点让人兴奋了

开始处理 43 条记录...
$1/43\] 正在处理: Remote Urgent Support 预测: 'Remote Urgent Support' -\> ✅ 正确 \[2/43\] 正在处理: Remote Urgent Support 预测: 'Remote Urgent Support' -\> ✅ 正确 \[3/43\] 正在处理: Remote Urgent Support 预测: 'Remote Urgent Support' -\> ✅ 正确 \[4/43\] 正在处理: Remote Urgent Support 预测: 'Remote Urgent Support' -\> ✅ 正确 \[5/43\] 正在处理: Non-Billable Sales 预测: 'Non-Billable Sales' -\> ✅ 正确 \[6/43\] 正在处理: Non-Billable Sales 预测: 'Non-Billable Sales' -\> ✅ 正确 \[7/43\] 正在处理: Non-Billable Sales 预测: 'Non-Billable Sales' -\> ✅ 正确 \[8/43\] 正在处理: Non-Billable Sales 预测: 'Non-Billable Sales' -\> ✅ 正确 \[9/43\] 正在处理: Non-Billable Sales 预测: 'Non-Billable Sales' -\> ✅ 正确 \[10/43\] 正在处理: Non-Billable Support 预测: 'Non-Billable Support' -\> ✅ 正确 \[11/43\] 正在处理: Non-Billable Support 预测: 'Non-Billable Support' -\> ✅ 正确 \[12/43\] 正在处理: Non-Billable Support 预测: 'Non-Billable Support' -\> ✅ 正确 \[13/43\] 正在处理: Non-Billable Support 预测: 'Non-Billable Support' -\> ✅ 正确 \[14/43\] 正在处理: Non-Billable Support 预测: 'Non-Billable Support' -\> ✅ 正确 \[15/43\] 正在处理: Onsite Urgent Support 预测: 'Onsite Urgent Support' -\> ✅ 正确 \[16/43\] 正在处理: Onsite Normal Support 预测: 'Onsite Normal Support' -\> ✅ 正确 \[17/43\] 正在处理: Onsite Normal Support 预测: 'Onsite Normal Support' -\> ✅ 正确 \[18/43\] 正在处理: Onsite Normal Support 预测: 'Onsite Normal Support' -\> ✅ 正确 \[19/43\] 正在处理: Onsite Normal Support 预测: 'Onsite Normal Support' -\> ✅ 正确 \[20/43\] 正在处理: Remote Normal Support 预测: 'Remote Normal Support' -\> ✅ 正确 \[21/43\] 正在处理: Remote Normal Support 预测: 'Remote Normal Support' -\> ✅ 正确 \[22/43\] 正在处理: Remote Normal Support 预测: 'Remote Normal Support' -\> ✅ 正确 \[23/43\] 正在处理: Remote Normal Support 预测: 'Remote Normal Support' -\> ✅ 正确 \[24/43\] 正在处理: Outside Contract 7pm-7am - Normal Support 预测: 'Outside Contract 7pm-7am - Normal Support' -\> ✅ 正确 \[25/43\] 正在处理: Outside Contract 7pm-7am - Normal Support 预测: 'Outside Contract 7pm-7am - Normal Support' -\> ✅ 正确 \[26/43\] 正在处理: Outside Contract 7pm-7am - Normal Support 预测: 'Outside Contract 7pm-7am - Normal Support' -\> ✅ 正确 \[27/43\] 正在处理: Outside Contract 7pm-7am - Normal Support 预测: 'Outside Contract 7pm-7am - Normal Support' -\> ✅ 正确 \[28/43\] 正在处理: Outside Contract 7pm-7am - Normal Support 预测: 'Outside Contract 7pm-7am - Normal Support' -\> ✅ 正确 \[29/43\] 正在处理: Outside Contract - Normal Support 预测: 'Outside Contract - Normal Support' -\> ✅ 正确 \[30/43\] 正在处理: Outside Contract - Normal Support 预测: 'Outside Contract - Normal Support' -\> ✅ 正确 \[31/43\] 正在处理: Outside Contract - Normal Support 预测: 'Outside Contract - Normal Support' -\> ✅ 正确 \[32/43\] 正在处理: Outside Contract - Normal Support 预测: 'Outside Contract - Normal Support' -\> ✅ 正确 \[33/43\] 正在处理: Outside Contract - Normal Support 预测: 'Outside Contract - Normal Support' -\> ✅ 正确 \[34/43\] 正在处理: Outside Contract - Urgent Support 预测: 'Outside Contract - Urgent Support' -\> ✅ 正确 \[35/43\] 正在处理: Outside Contract - Urgent Support 预测: 'Outside Contract - Urgent Support' -\> ✅ 正确 \[36/43\] 正在处理: Development Support 预测: 'Development Support' -\> ✅ 正确 \[37/43\] 正在处理: Development Support 预测: 'Development Support' -\> ✅ 正确 \[38/43\] 正在处理: Fixed Price Small Project Labour (Pre-paid) 预测: 'Fixed Price Small Project Labour (Pre-paid)' -\> ✅ 正确 \[39/43\] 正在处理: Fixed Price Small Project Labour (Pre-paid) 预测: 'Fixed Price Small Project Labour (Pre-paid)' -\> ✅ 正确 \[40/43\] 正在处理: Fixed Price Small Project Labour (Pre-paid) 预测: 'Fixed Price Small Project Labour (Pre-paid)' -\> ✅ 正确 \[41/43\] 正在处理: Fixed Price Small Project Labour (Pre-paid) 预测: 'Fixed Price Small Project Labour (Pre-paid)' -\> ✅ 正确 \[42/43\] 正在处理: Fixed Price Small Project Labour (Pre-paid) 预测: 'Fixed Price Small Project Labour (Pre-paid)' -\> ✅ 正确 \[43/43\] 正在处理: Contract Labour - GST 预测: 'Contract Labour - GST' -\> ✅ 正确 ✅ 处理完成！总数: 43 正确: 43 准确率: 100%$

为了让命中有更好的健壮性，还是采用了向量检索+多路召回+重拍的方式来做最后的有效命中

复制代码

 Tickets (256-4000 tokens)

↓

[Solution ] Directly encode using bge-large-en-v1.5 (no segmentation required)

↓

Search for the top 10 similar historical tickets in pgvector (initial screening)

↓

(Optional) Use bge-reranker-large for rerank scoring

↓

Submit to LLM for classification decision

BAAI/bge-large-en-v1.5（向量检索模型）和 BAAI/bge-reranker-large（重排序模型）都是由BAAI 开发，且都用于语义匹配任务，但它们在设计目标、架构原理、使用方式和性能特点上存在本质区别。

维度	`bge-large-en-v1.5`(Embedding)	`bge-reranker-large`(Reranker)
任务类型	句子嵌入（Sentence Embedding）	语义匹配打分（Cross-Encoder）
输出形式	生成句向量（768维）	输出一个相似度分数（0~1）
输入方式	单句编码	句子对联合编码
速度	快（可预计算）	慢（需实时计算）
精度	高	更高（SOTA级）
用途	初筛（Retrieval）	精排（Reranking）
是否可预计算	✅ 是	❌ 否

bge-large-en-v1.5：向量检索模型（Dense Retriever）

基于 BERT 架构（Encoder-only）
使用 双塔结构（Siamese Network） 在训练时对齐两个句子的向量空间
目标：让语义相近的句子在向量空间中"距离近"（余弦相似度高）
两个句子是独立编码的（双塔）
向量可以预先计算并存入向量数据库（如 pgvector、FAISS）

✅ 优点

速度快：支持大规模初筛（百万级向量检索）
可缓存：历史工单向量可预计算
适合 RAG 初筛

❌ 缺点

无法捕捉句子间的细粒度交互（如指代、逻辑关系）
对表达差异大的句子区分能力有限

bge-reranker-large：重排序模型（Cross-Encoder）

基于 BERT 架构 ，但使用 交叉编码（Cross-Attention）
输入是 一对句子，拼接后一起送入模型
输出是一个标量分数，表示这对句子的语义相似度
两个句子在模型内部进行深度交互（每一层都有 attention）
不生成句向量，只输出一个匹配分数

$CLS\] 句子A \[SEP\] 句子B \[SEP\] → BERT → \[Score$

优点

精度极高：能理解句子间的上下文和逻辑
在 MTEB、MS-MARCO 等榜单上 SOTA
能识别"看似无关实则相关"的句子对

❌ 缺点

速度慢：无法预计算，必须实时推理
不能用于大规模检索（只能对 Top-K 候选重排）
资源消耗高

CrossEncoder 是一种用于语义匹配、文本蕴含、重排序（Reranking） 的深度学习模型结构，与 SentenceTransformer（双塔模型）不同，它采用 交叉编码（Cross-Attention） 的方式，将两个文本（如问题和答案、查询和文档）拼接后一起输入模型，从而实现更精细的交互和更高的匹配精度。

CrossEncoder 的核心思想

❌ 不是分别编码两个句子 → 计算向量相似度

✅ 而是把两个句子拼在一起 → 用 BERT 等模型做联合推理 → 输出一个匹配分数

典型用途：

重排序（Reranker）：对检索出的 Top-K 候选进行精细打分
问答匹配
文本蕴含（Entailment）
同义句判断

计算方式详解

输入拼接（Input Concatenation）

给定两个文本：

query: "What is the capital of France?"
document: "Paris is the capital city of France."

模型输入为：

CLS\] What is the capital of France? \[SEP\] Paris is the capital city of France. \[SEP

[CLS]：用于最终分类的特殊 token
[SEP]：分隔两个句子

模型内部处理（BERT / Transformer）

这个拼接后的序列被送入 BERT 类模型中，每一层都进行 自注意力（Self-Attention），允许两个句子的每个 token 相互关注。

例如：

"capital" 会关注 "capital" 和 "Paris"
"France" 会关注 "France" 两次

这种深度交互使得模型能理解：

指代关系
同义替换
逻辑一致性

输出层（Scoring）

取 [CLS] 位置的最终隐藏状态 h_cls ∈ R^768
通过一个全连接层（Feed-Forward Layer）映射为一个标量分数：

score=sigmoid(W⋅hcls+b)
输出范围通常是 [0, 1]，表示"相关性得分"

与双塔模型（Bi-Encoder）对比

特性	CrossEncoder（交叉编码）	Bi-Encoder（双塔）
输入方式	拼接两个句子	分别编码
交互程度	全面交互（每层 attention）	仅在最后计算相似度
是否可预计算	❌ 否（必须实时推理）	✅ 是（向量可缓存）
速度	慢（O(n) 对比 n 个文档）	快（ANN 检索）
精度	✅ 高（SOTA）	一般
适用场景	重排序（Top-K 精排）	初筛（大规模检索）

RAG / Triage Agent 工单匹配

用户反馈："I can't connect to the company VPN"

目标：从 10,000 条历史工单中找出最相似的案例

复制代码

用户反馈："I can't connect to the company VPN"

目标：从 10,000 条历史工单中找出最相似的案例

───────────────────────────────────────────────
方案 A：只用 bge-large-en-v1.5
  1. 将 10,000 条历史工单预编码为向量，存入 pgvector
  2. 将新反馈编码为向量
  3. 在 pgvector 中检索 Top-50 最相似工单
  ✅ 快，但可能漏掉语义相近但词不同的案例

───────────────────────────────────────────────
方案 B：bge-large-en-v1.5 + bge-reranker-large（推荐）
  1. 用 bge-large-en-v1.5 初筛出 Top-50
  2. 用 bge-reranker-large 对这 50 个候选与新反馈**逐个打分**
  3. 按 reranker 分数重新排序，取 Top-5
  ✅ 准确率显著提升，尤其对 paraphrase、术语差异敏感

流程图如下所示

总结

关于 使用 RAG（检索增强生成）还是 LoRA 微调 的系统性经验总结，结合了技术原理、适用场景、优缺点和实际落地建议，帮助你在不同项目中做出最优选择。

核心对比：RAG vs LoRA 微调

维度	RAG（Retrieval-Augmented Generation）	LoRA 微调（Low-Rank Adaptation）
本质	外挂知识库 + 提示工程	修改模型参数以适应任务
是否修改模型	❌ 不修改	✅ 修改部分权重
知识来源	外部文档（向量数据库）	训练数据内化到模型中
更新成本	⭐ 极低（增删文档即可）	⭐⭐⭐ 高（需重新训练）
推理延迟	⭐⭐ 稍高（检索+生成）	⭐ 低（直接生成）
可解释性	✅ 高（能溯源）	❌ 低（黑盒）
数据依赖	原始文本（无需标注）	标注数据（QA对、指令）
适合场景	动态知识、多租户、快速上线	固定风格、固定任务、高一致性要求

✅ 二、RAG 的经验总结

优势（什么时候用 RAG？）

✅ 知识更新快：只需更新知识库，无需重新训练；
✅ 无需标注数据：直接使用原始文档（PDF、工单、FAQ）；
✅ 可溯源：能返回"答案来自哪篇文档"，适合审计、客服；
✅ 适合多租户：每个租户有自己的知识库，隔离方便；
✅ 快速上线：几小时内就能构建一个可用的问答系统；
✅ 支持长文本：通过分块+向量检索处理百万字文档。

劣势与挑战

⚠️ 检索不准 → 答案错误：如果检索不到相关文档，LLM 容易"幻觉"；
⚠️ 分块策略敏感：切得太碎丢失上下文，太长影响召回；
⚠️ 推理延迟较高：增加一次向量检索；
⚠️ 对复杂推理支持弱：依赖 LLM 自身能力。

实战经验

经验	说明
🔹 用`bge-large-en-v1.5`替代`text2vec`	英文语义匹配效果更好
🔹 检索后加`bge-reranker`重排	提升 Top-K 准确率 10%~20%
🔹 多路召回：RAG + BM25 + 规则	提高召回率
🔹 设置`max_tokens`防止上下文溢出	合理控制 prompt 长度
🔹 监控"无检索结果"场景	避免 LLM 自由发挥

✅ 三、LoRA 微调的经验总结

优势（什么时候用 LoRA？）

✅ 任务定制性强：让模型学会特定话术、格式、风格；
✅ 推理速度快：无额外检索开销，响应更快；
✅ 减少幻觉：知识内化，不易胡说；
✅ 轻量级：只训练低秩矩阵，参数量小（MB级）；
✅ 部署简单：合并后仍是单个模型文件。

劣势与挑战

⚠️ 训练成本高：需要 GPU（A100/V100）、训练时间长；
⚠️ 更新困难：知识变更需重新收集数据 + 重新训练；
⚠️ 过拟合风险：数据少时容易记住样本，泛化差；
⚠️ 不可解释：不知道答案是"学来的"还是"编的"；
⚠️ 数据要求高：需要高质量标注数据（如 500+ QA 对）。

实战经验

经验	说明
🔹 数据质量 > 数据数量	100 条高质量数据胜过 1000 条噪声数据
🔹 使用`QLoRA`降低显存需求	用 48GB GPU 微调 7B 模型
🔹 用`bge`评估生成结果	计算生成答案与标准答案的语义相似度
🔹 定期增量训练	随着反馈积累，持续优化模型
🔹 避免灾难性遗忘	保留原始任务数据一起训练

🧩 四、如何选择？决策树

复制代码

你的知识会频繁更新吗？
├── 是 → 用 RAG ✅
└── 否
    └── 你有高质量标注数据（>200条）吗？
        ├── 是 → 可考虑 LoRA
        └── 否 → 用 RAG ✅

你希望答案可溯源吗？
├── 是 → 用 RAG ✅
└── 否 → 可考虑 LoRA

你对延迟敏感吗？（要求 <500ms）
├── 是 → 考虑 LoRA（更快）
└── 否 → RAG 也可接受

你想让模型学会"说话风格"吗？
├── 是 → LoRA 更擅长
└── 否 → RAG 足够

🌟 五、高级方案：RAG + LoRA 混合

方案	说明
✅RAG + LoRA	用 LoRA 让模型学会"怎么答"，用 RAG 提供"答什么"
示例	LoRA 学习客服话术风格，RAG 提供产品知识

这是当前最前沿的做法，兼顾准确性 与可控性。

✅ 六、推荐组合策略

场景	推荐方案
初创项目、知识多变	✅ RAG（快速验证）
成熟产品、风格统一	✅ LoRA 微调 + RAG 检索
缺少标注数据	✅ RAG
有标注数据 + 稳定知识	✅ LoRA
高合规要求（金融、医疗）	✅ RAG（可溯源）
多租户 SaaS 产品	✅ RAG（按租户隔离知识库）

✅ 七、总结：一句话经验

🔹 用 RAG 解决"知道什么" ------ 动态知识、可解释、易维护；

🔹 用 LoRA 解决"怎么说" ------ 风格一致、响应快、定制化强；

🔹 最强组合：RAG 提供知识，LoRA 控制风格，实现"既准又像人"。

如果你的项目是：

快速上线 → 选 RAG
追求极致体验 → 选 LoRA + RAG
数据少、知识变 → 只用 RAG
数据多、风格重 → 只用 LoRA

使用Rag 命中用户feedback提升triage agent 准确率

简述

原型方案设计

详细设计

读取JSON文件

转换格式

修改:为每个item添加编号

用###分割子项目

输出结果

输出billing code类别总数

测试

CrossEncoder 的核心思想

RAG / Triage Agent 工单匹配

总结