第7章:RAG(检索增强生成)架构设计与实现
前言
大家好,我是鲫小鱼。是一名不写前端代码
的前端工程师,热衷于分享非前端的知识,带领切图仔逃离切图圈子,欢迎关注我,微信公众号:《鲫小鱼不正经》
。欢迎点赞、收藏、关注,一键三连!!
🎯 本章学习目标
- 全面掌握 RAG 的三层结构:检索(Retrieve)→ 融合(Augment/Fuse)→ 生成(Generate)
- 落地数据管道:文档加载、清洗、切分、向量化、索引构建、增量更新
- 掌握检索策略:TopK、MMR、时间感知(Time-aware)、个性化(User-aware)、混合搜索
- 构建"可引用、可追溯、低幻觉"的回答链,强制引用来源与置信度
- 用 Runnable/LangGraph 编排复杂 RAG 工作流,并实现可观测回调与缓存
- 在 Next.js 中提供 /api/ingest 与 /api/rag 接口,支持流式回答与移动端适配
- 实战项目:新闻与公告的时效型 RAG 问答平台(含增量 ingestion 与来源权威度)
📖 RAG 综述与总体架构
7.1 为什么选择 RAG
- 纯生成(生成式 LLM)容易"幻觉",无法访问私有知识与最新数据
- 仅检索(关键字/语义搜索)返回信息碎片,缺少综合与表达
- RAG 结合两者:先检索「相关片段」,再基于片段生成有"出处"的答案
7.2 参考架构(概念图)
markdown
用户问题 → 预处理/判定可回答性 → 检索层(向量/关键词/混合) → 候选融合/去冗余/重排序
→ 生成层(带引用的 Prompt)→ 审核与守护(引用校验/安全词)→ 输出(结构化 + 引用)
↘ 日志/指标/缓存/回放(LangSmith/自建)
7.3 数据与状态
- 文档域:来源(source)、时间(publishedAt)、作者/部门、类型(spec、faq、news)
- 切分策略:chunkSize/overlap 与语义边界结合,配合评测决定
- 向量索引:支持增量更新与软删除;多集合(collection)隔离不同业务域
- 会话状态:短期消息窗口 + 长期摘要/事实卡(参见第3章)
🧱 索引构建:加载/清洗/切分/向量化/入库
7.4 加载(Loader)与清洗(Cleaner)
typescript
// 文件:src/ch07/loaders.ts
import fs from "node:fs/promises";
import path from "node:path";
export type RawDoc = { id: string; text: string; meta: Record<string, any> };
export async function loadMarkdownDir(dir: string, tag = "default"): Promise<RawDoc[]> {
const files = await fs.readdir(dir);
const docs: RawDoc[] = [];
for (const f of files) {
if (!f.endsWith(".md")) continue;
const full = path.join(dir, f);
const text = await fs.readFile(full, "utf8");
docs.push({ id: f, text, meta: { source: full, type: "md", tag } });
}
return docs;
}
export function clean(text: string): string {
return text
.replace(/\r/g, "\n")
.replace(/\n{3,}/g, "\n\n")
.replace(/[\t\u00A0]+/g, " ")
.trim();
}
7.5 切分(Splitter)与标准化元数据
typescript
// 文件:src/ch07/splitter.ts
export type Chunk = { id: string; text: string; meta: Record<string, any> };
export function split(text: string, size = 900, overlap = 120): string[] {
const out: string[] = [];
let i = 0;
while (i < text.length) {
const s = text.slice(i, i + size);
out.push(s);
i += size - overlap;
}
return out;
}
export function toChunks(docs: { id: string; text: string; meta: any }[], size = 900, overlap = 120) {
const arr: Chunk[] = [];
for (const d of docs) {
const t = clean(d.text);
const parts = split(t, size, overlap);
parts.forEach((p, idx) => arr.push({ id: `${d.id}#${idx}`, text: p, meta: { ...d.meta, chunkIndex: idx } }));
}
return arr;
}
// 引入 clean(此处简化,可从 loaders.ts 导入)
function clean(t: string) { return t.replace(/\r/g, "\n").replace(/\n{3,}/g, "\n\n").replace(/[\t\u00A0]+/g, " ").trim(); }
7.6 向量化与入库(以 Chroma 为例)
typescript
// 文件:src/ch07/indexer-chroma.ts
import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";
import { loadMarkdownDir } from "./loaders";
import { toChunks } from "./splitter";
import * as dotenv from "dotenv"; dotenv.config();
export async function buildCollection(dir = "./docs", name = "news") {
const raw = await loadMarkdownDir(dir, name);
const chunks = toChunks(raw, 900, 120);
const texts = chunks.map(c => c.text);
const metas = chunks.map(c => ({ ...c.meta, id: c.id }));
const db = await Chroma.fromTexts(texts, metas, new OpenAIEmbeddings({ model: "text-embedding-3-small" }), {
collectionName: `rag_${name}`,
});
return db;
}
export async function openCollection(name = "news") {
return Chroma.fromExistingCollection(new OpenAIEmbeddings({ model: "text-embedding-3-small" }), { collectionName: `rag_${name}` });
}
7.7 增量更新与软删除(思路)
- 文档加版本号或内容指纹(hash),仅对变更条目重建切分与向量
- 保留旧向量并加
archived=true
标记,离线合并或周期清理 - 更新流程异步化,写操作限流;前端提示"索引更新中"
🔍 检索策略与融合
7.8 基础 TopK + MMR 去冗余
typescript
// 文件:src/ch07/retrievers.ts
import { openCollection } from "./indexer-chroma";
export type Hit = { text: string; score?: number; meta?: any };
export async function topK(name: string) {
const store = await openCollection(name);
return async (q: string, k = 6, filter?: any): Promise<Hit[]> => {
const docs = await store.similaritySearchWithScore(q, k, filter);
return docs.map(([d, s]) => ({ text: d.pageContent, score: s, meta: d.metadata }));
};
}
export function mmr(hits: Hit[], limit = 5) {
const out: Hit[] = []; const seen = new Set<string>();
for (const h of hits) {
const key = `${h.meta?.source}#${h.meta?.chunkIndex}`;
if (seen.has(key)) continue; seen.add(key); out.push(h);
if (out.length >= limit) break;
}
return out;
}
7.9 时间感知检索(Time-aware)
typescript
// 文件:src/ch07/time-aware.ts
import { topK, mmr } from "./retrievers";
type TimeRange = { from?: number; to?: number };
export async function timeAwareRetriever(name: string, range?: TimeRange) {
const retr = await topK(name);
return async (q: string) => {
const filter = range ? { publishedAt: { $gte: range.from, $lte: range.to } } as any : undefined;
const hits = await retr(q, 12, filter);
return mmr(hits, 6);
};
}
7.10 个性化检索(User-aware)
typescript
// 文件:src/ch07/user-aware.ts
import { topK, mmr } from "./retrievers";
export async function userAwareRetriever(name: string, user: { id: string; dept?: string; tags?: string[] }) {
const retr = await topK(name);
return async (q: string) => {
const filter: any = {};
if (user.dept) filter.dept = user.dept; // 部门内文档优先
const hits = await retr(q, 10, Object.keys(filter).length ? filter : undefined);
// 简单加权:匹配用户兴趣标签的得分略升(示意)
const boost = (h: any) => (user.tags || []).some(t => (h.meta?.tags || []).includes(t)) ? 0.05 : 0;
return mmr(hits.sort((a,b)=> (a.score! - boost(a)) - (b.score! - boost(b))), 6);
};
}
7.11 混合搜索(关键词 + 向量)与重排序
typescript
// 文件:src/ch07/hybrid.ts
import { topK } from "./retrievers";
function keywordSearch(q: string, corpus: { id: string; text: string }[], k = 5) {
const lower = q.toLowerCase();
return corpus.map(d => ({ d, s: (d.text.toLowerCase().match(new RegExp(lower, "g")) || []).length }))
.sort((a,b)=> b.s - a.s).slice(0,k).map(x => ({ text: x.d.text, meta: { id: x.d.id, source: "kw" } }));
}
export async function hybrid(name: string) {
const retr = await topK(name);
return async (q: string) => {
const [vecHits, kwHits] = await Promise.all([
retr(q, 8),
Promise.resolve(keywordSearch(q, [{ id: "k1", text: "RAG 结合检索与生成" }, { id: "k2", text: "MMR 提升多样性" }], 4)),
]);
const items = [
...vecHits.map(h => ({ ...h, weight: 0.7 * (1 - (h.score ?? 0)) })),
...kwHits.map(h => ({ ...h, weight: 0.3 })),
];
return items.sort((a,b)=> b.weight - a.weight).slice(0,6).map(({ weight, ...rest })=> rest);
};
}
🧩 融合与生成:降低幻觉,强制引用
7.12 结构化回答模板与引用约束
typescript
// 文件:src/ch07/prompt.ts
import { ChatPromptTemplate } from "@langchain/core/prompts";
export const answerPrompt = ChatPromptTemplate.fromMessages([
["system", `你是严谨的企业知识助手。仅基于"候选片段"作答;若信息不足请回答"我不知道"。
输出严格 JSON:
{
"answer": string,
"citations": [{"source": string, "chunkId": string}],
"confidence": number // 0~1
}`],
["human", `问题:{question}\n候选片段:\n{chunks}\n请输出 JSON:`],
]);
7.13 生成链(RunnableSequence)
typescript
// 文件:src/ch07/answer-chain.ts
import { RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI } from "@langchain/openai";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { answerPrompt } from "./prompt";
export function buildAnswerChain() {
const llm = new ChatOpenAI({ temperature: 0 });
const parser = new JsonOutputParser<{ answer: string; citations: {source:string; chunkId:string}[]; confidence:number }>();
return RunnableSequence.from([
answerPrompt,
llm,
parser,
]);
}
7.14 Guardrails:引用校验与事实性粗筛
typescript
// 文件:src/ch07/guards.ts
export function validateCitations(res: { answer: string; citations: any[] }) {
const ok = Array.isArray(res.citations) && res.citations.length > 0 && res.citations.every(c => c.source);
if (!ok) throw new Error("缺少引用或格式错误");
return res;
}
export function shortCircuitIfEmpty(hits: { text: string }[]) {
if (!hits || hits.length === 0) throw new Error("NO_CONTEXT");
}
7.15 端到端 RAG Runnable 链
typescript
// 文件:src/ch07/rag-pipeline.ts
import { RunnableSequence, RunnableLambda } from "@langchain/core/runnables";
import { buildAnswerChain } from "./answer-chain";
import { hybrid } from "./hybrid";
import { validateCitations, shortCircuitIfEmpty } from "./guards";
export async function buildRagPipeline(name = "news") {
const retrieve = await hybrid(name);
const toChunkList = new RunnableLambda(async (q: string) => {
const hits = await retrieve(q);
shortCircuitIfEmpty(hits);
const chunks = hits.map((h, i) => `#${i} [${h.meta?.source || "vec"}] ${String(h.text).replace(/\n/g," ").slice(0,400)}...`).join("\n");
return { question: q, chunks };
});
const answer = buildAnswerChain();
const check = new RunnableLambda((res: any) => validateCitations(res));
return RunnableSequence.from([
toChunkList,
answer,
check,
]);
}
🧠 LangGraph:用状态图编排 RAG 工作流
7.16 图节点:检索→融合→回答→守护→回写
typescript
// 文件:src/ch07/graph.ts(伪示例,依赖版本以本地为准)
import { StateGraph } from "@langchain/langgraph";
import { buildRagPipeline } from "./rag-pipeline";
type S = { q: string; result?: any; error?: string };
export async function buildGraph() {
const graph = new StateGraph<S>({ channels: { q: { value: "" }, result: { value: {} }, error: { value: "" } } });
graph.addNode("rag", async (s) => {
try {
const pipeline = await buildRagPipeline("news");
const out = await pipeline.invoke(s.q);
return { result: out };
} catch (e: any) {
return { error: e.message };
}
});
graph.addEdge("start", "rag");
graph.addEdge("rag", "end");
return graph.compile();
}
🌐 Next.js 接口:/api/ingest 与 /api/rag(SSE 流式)
7.17 文档入库接口
typescript
// 文件:src/app/api/ingest/route.ts
import { NextRequest } from "next/server";
import { buildCollection } from "@/src/ch07/indexer-chroma";
export const runtime = "edge";
export async function POST(req: NextRequest) {
const { dir, name } = await req.json();
try {
await buildCollection(dir || "./docs", name || "news");
return Response.json({ ok: true });
} catch (e: any) {
return Response.json({ ok: false, message: e.message }, { status: 500 });
}
}
7.18 RAG 问答(SSE)
typescript
// 文件:src/app/api/rag/route.ts
import { NextRequest } from "next/server";
import { buildRagPipeline } from "@/src/ch07/rag-pipeline";
export const runtime = "edge";
export async function POST(req: NextRequest) {
const { q } = await req.json();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const pipe = await buildRagPipeline("news");
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: "start" })}\n\n`));
const out = await pipe.invoke(q);
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: "result", data: out })}\n\n`));
} catch (e: any) {
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: "error", message: e.message })}\n\n`));
} finally { controller.close(); }
}
});
return new Response(stream, { headers: { "Content-Type": "text/event-stream", "Cache-Control": "no-cache" } });
}
7.19 前端消费(移动端适配)
tsx
// 文件:src/app/rag/page.tsx
"use client";
import { useRef, useState } from "react";
export default function RAGPage() {
const [q, setQ] = useState("");
const [answer, setAnswer] = useState<any>(null);
const [err, setErr] = useState("");
const esRef = useRef<EventSource | null>(null);
const ask = async () => {
setAnswer(null); setErr("");
const res = await fetch("/api/rag", { method: "POST", body: JSON.stringify({ q }) });
// 本示例简化:直接使用 EventSource 访问同一路由仅作演示,生产建议直接消费响应流
const es = new EventSource("/api/rag");
esRef.current = es;
es.onmessage = (e) => {
const data = JSON.parse(e.data);
if (data.type === "result") setAnswer(data.data);
if (data.type === "error") setErr(data.message);
};
};
const stop = () => esRef.current?.close();
return (
<main className="max-w-screen-sm mx-auto p-4">
<div className="flex gap-2">
<input className="flex-1 border rounded px-3 py-2" value={q} onChange={e=>setQ(e.target.value)} placeholder="提问:例如 公司新政策的生效时间?" />
<button className="px-4 py-2 bg-blue-600 text-white rounded" onClick={ask}>查询</button>
<button className="px-4 py-2 bg-gray-600 text-white rounded" onClick={stop}>停止</button>
</div>
{err && <p className="text-red-600 mt-2">{err}</p>}
{answer && (
<section className="mt-4 space-y-2">
<h2 className="font-semibold">回答</h2>
<pre className="whitespace-pre-wrap break-words">{JSON.stringify(answer, null, 2)}</pre>
</section>
)}
</main>
);
}
🚀 实战:时效型新闻/公告 RAG 平台
7.20 需求与约束
- 数据多源:官网公告、财报摘要、内部通知(部门维度)
- 强时效:近期内容应有更高权重;过期内容降权或过滤
- 可追溯:每条回答必须有引用;引用需含来源链接与发布时间
- 移动优先:列表卡片化,点击展开引用与原文链接
7.21 排序策略
- 综合权重:
score = α·语义相似 + β·时间衰减 + γ·来源权威度 + δ·个性化偏好
- 时间衰减:
decay = exp(-λ · days)
,近期更高 - 来源权威:官网 > 行业协会 > 社媒
7.22 关键代码(排序与合并)
typescript
// 文件:src/ch07/ranker.ts
export function rank(hits: { text: string; meta: any; score?: number }[], now = Date.now()) {
function decay(ts?: number) {
if (!ts) return 1; const days = (now - ts) / (24*3600*1000); const λ = 0.05; return Math.exp(-λ*days);
}
function authority(src?: string) {
if (!src) return 0.8; if (src.includes("official")) return 1.0; if (src.includes("association")) return 0.9; return 0.8;
}
return hits.map(h => ({
...h,
final: 0.6*(1 - (h.score ?? 0)) + 0.25*decay(h.meta?.publishedAt) + 0.15*authority(h.meta?.source || "")
})).sort((a,b)=> b.final - a.final);
}
7.23 UI 细节
- 列表展示:标题、来源、发布时间、相似度/置信度
- 详情框:引用片段、原文链接、关键句高亮
- 空结果:提供"换个问法/选择时间范围"的引导
⚙️ 工程与优化
7.24 缓存与并发
- 输入指纹(hash)→ 检索结果缓存(短期),降低 QPS 峰值
- LLM 输出缓存:对同问题短期共享,缓存注入引用与置信度以便直出
- 批处理嵌入:
embeddings.embedDocuments
合并请求
7.25 成本与时延
- 分块与 TopK:更小的 chunk + 合理 TopK,减少 context token
- Rerank 降级:在负载高时跳过重排,仅用 MMR
- 向量模型选择:
text-embedding-3-small
性价比高;需高精度时再上大模型
7.26 幻觉与安全
- 强制引用与"我不知道"路径;引用缺失直接报错/重试
- 敏感词与合规校验(关键词/正则/嵌入分类器),拒答与引导
- 记录"不可回答"问题,作为语料建设与 FAQ 补齐的输入
7.27 可观测性
- 回调(Callback)上报:检索数量、TopK 命中率、Rerank 是否开启、最终 token 使用
- 日志:RunId 贯穿;链路级延迟监控;错误分类(NO_CONTEXT、INVALID_CITATIONS、TIMEOUT)
- 回放:保存问题、命中片段、最终回答与引用,用于评审
🧪 评测与持续改进
- Golden Set:真实问题 + 标准答案 + 必要引用清单
- 离线评测:Recall@K、MRR、Citation Accuracy、Faithfulness
- 在线评测:满意度、人工接管率、追问率、平均延迟、成本
- A/B:chunkSize/overlap、TopK、MMR、混合权重、Rerank 开关与模型对比
📚 延伸链接
- LangChain.js 文档(RAG):
https://js.langchain.com/
- LangGraph:
https://langchain-ai.github.io/langgraph/
- Chroma:
https://docs.trychroma.com/
- Reranking 参考(Cohere/Jina/ColBERT 等)
✅ 本章小结
- 构建了完整的 RAG 流程:索引→检索→融合→生成→守护→可观测
- 实现了时间/个性化/混合检索与权重排序,降低幻觉并保证可追溯
- 用 Runnable 与 LangGraph 编排工作流,并在 Next.js 中提供 API 与前端
- 形成了评测与迭代闭环,实现生产可用的 RAG 体系
🎯 下章预告
下一章《Agent 智能代理系统开发》中,我们将:
- 构建具备工具调用、计划分解与多轮推理的 Agent
- 集成检索工具与代码执行环境,打造更强的任务完成能力
- 用 LangGraph 实现 Multi-Agent 协作与冲突解决
最后感谢阅读!欢迎关注我,微信公众号:
《鲫小鱼不正经》
。欢迎点赞、收藏、关注,一键三连!!!