LangChain.js 完全开发手册(七)RAG(检索增强生成)架构设计与实现

第7章:RAG(检索增强生成)架构设计与实现

前言

大家好,我是鲫小鱼。是一名不写前端代码的前端工程师,热衷于分享非前端的知识,带领切图仔逃离切图圈子,欢迎关注我,微信公众号:《鲫小鱼不正经》。欢迎点赞、收藏、关注,一键三连!!

🎯 本章学习目标

  • 全面掌握 RAG 的三层结构:检索(Retrieve)→ 融合(Augment/Fuse)→ 生成(Generate)
  • 落地数据管道:文档加载、清洗、切分、向量化、索引构建、增量更新
  • 掌握检索策略:TopK、MMR、时间感知(Time-aware)、个性化(User-aware)、混合搜索
  • 构建"可引用、可追溯、低幻觉"的回答链,强制引用来源与置信度
  • 用 Runnable/LangGraph 编排复杂 RAG 工作流,并实现可观测回调与缓存
  • 在 Next.js 中提供 /api/ingest 与 /api/rag 接口,支持流式回答与移动端适配
  • 实战项目:新闻与公告的时效型 RAG 问答平台(含增量 ingestion 与来源权威度)

📖 RAG 综述与总体架构

7.1 为什么选择 RAG

  • 纯生成(生成式 LLM)容易"幻觉",无法访问私有知识与最新数据
  • 仅检索(关键字/语义搜索)返回信息碎片,缺少综合与表达
  • RAG 结合两者:先检索「相关片段」,再基于片段生成有"出处"的答案

7.2 参考架构(概念图)

markdown 复制代码
用户问题 → 预处理/判定可回答性 → 检索层(向量/关键词/混合) → 候选融合/去冗余/重排序
  → 生成层(带引用的 Prompt)→ 审核与守护(引用校验/安全词)→ 输出(结构化 + 引用)
                              ↘ 日志/指标/缓存/回放(LangSmith/自建)

7.3 数据与状态

  • 文档域:来源(source)、时间(publishedAt)、作者/部门、类型(spec、faq、news)
  • 切分策略:chunkSize/overlap 与语义边界结合,配合评测决定
  • 向量索引:支持增量更新与软删除;多集合(collection)隔离不同业务域
  • 会话状态:短期消息窗口 + 长期摘要/事实卡(参见第3章)

🧱 索引构建:加载/清洗/切分/向量化/入库

7.4 加载(Loader)与清洗(Cleaner)

typescript 复制代码
// 文件:src/ch07/loaders.ts
import fs from "node:fs/promises";
import path from "node:path";

export type RawDoc = { id: string; text: string; meta: Record<string, any> };

export async function loadMarkdownDir(dir: string, tag = "default"): Promise<RawDoc[]> {
  const files = await fs.readdir(dir);
  const docs: RawDoc[] = [];
  for (const f of files) {
    if (!f.endsWith(".md")) continue;
    const full = path.join(dir, f);
    const text = await fs.readFile(full, "utf8");
    docs.push({ id: f, text, meta: { source: full, type: "md", tag } });
  }
  return docs;
}

export function clean(text: string): string {
  return text
    .replace(/\r/g, "\n")
    .replace(/\n{3,}/g, "\n\n")
    .replace(/[\t\u00A0]+/g, " ")
    .trim();
}

7.5 切分(Splitter)与标准化元数据

typescript 复制代码
// 文件:src/ch07/splitter.ts
export type Chunk = { id: string; text: string; meta: Record<string, any> };

export function split(text: string, size = 900, overlap = 120): string[] {
  const out: string[] = [];
  let i = 0;
  while (i < text.length) {
    const s = text.slice(i, i + size);
    out.push(s);
    i += size - overlap;
  }
  return out;
}

export function toChunks(docs: { id: string; text: string; meta: any }[], size = 900, overlap = 120) {
  const arr: Chunk[] = [];
  for (const d of docs) {
    const t = clean(d.text);
    const parts = split(t, size, overlap);
    parts.forEach((p, idx) => arr.push({ id: `${d.id}#${idx}`, text: p, meta: { ...d.meta, chunkIndex: idx } }));
  }
  return arr;
}

// 引入 clean(此处简化,可从 loaders.ts 导入)
function clean(t: string) { return t.replace(/\r/g, "\n").replace(/\n{3,}/g, "\n\n").replace(/[\t\u00A0]+/g, " ").trim(); }

7.6 向量化与入库(以 Chroma 为例)

typescript 复制代码
// 文件:src/ch07/indexer-chroma.ts
import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";
import { loadMarkdownDir } from "./loaders";
import { toChunks } from "./splitter";
import * as dotenv from "dotenv"; dotenv.config();

export async function buildCollection(dir = "./docs", name = "news") {
  const raw = await loadMarkdownDir(dir, name);
  const chunks = toChunks(raw, 900, 120);
  const texts = chunks.map(c => c.text);
  const metas = chunks.map(c => ({ ...c.meta, id: c.id }));
  const db = await Chroma.fromTexts(texts, metas, new OpenAIEmbeddings({ model: "text-embedding-3-small" }), {
    collectionName: `rag_${name}`,
  });
  return db;
}

export async function openCollection(name = "news") {
  return Chroma.fromExistingCollection(new OpenAIEmbeddings({ model: "text-embedding-3-small" }), { collectionName: `rag_${name}` });
}

7.7 增量更新与软删除(思路)

  • 文档加版本号或内容指纹(hash),仅对变更条目重建切分与向量
  • 保留旧向量并加 archived=true 标记,离线合并或周期清理
  • 更新流程异步化,写操作限流;前端提示"索引更新中"

🔍 检索策略与融合

7.8 基础 TopK + MMR 去冗余

typescript 复制代码
// 文件:src/ch07/retrievers.ts
import { openCollection } from "./indexer-chroma";

export type Hit = { text: string; score?: number; meta?: any };

export async function topK(name: string) {
  const store = await openCollection(name);
  return async (q: string, k = 6, filter?: any): Promise<Hit[]> => {
    const docs = await store.similaritySearchWithScore(q, k, filter);
    return docs.map(([d, s]) => ({ text: d.pageContent, score: s, meta: d.metadata }));
  };
}

export function mmr(hits: Hit[], limit = 5) {
  const out: Hit[] = []; const seen = new Set<string>();
  for (const h of hits) {
    const key = `${h.meta?.source}#${h.meta?.chunkIndex}`;
    if (seen.has(key)) continue; seen.add(key); out.push(h);
    if (out.length >= limit) break;
  }
  return out;
}

7.9 时间感知检索(Time-aware)

typescript 复制代码
// 文件:src/ch07/time-aware.ts
import { topK, mmr } from "./retrievers";

type TimeRange = { from?: number; to?: number };

export async function timeAwareRetriever(name: string, range?: TimeRange) {
  const retr = await topK(name);
  return async (q: string) => {
    const filter = range ? { publishedAt: { $gte: range.from, $lte: range.to } } as any : undefined;
    const hits = await retr(q, 12, filter);
    return mmr(hits, 6);
  };
}

7.10 个性化检索(User-aware)

typescript 复制代码
// 文件:src/ch07/user-aware.ts
import { topK, mmr } from "./retrievers";

export async function userAwareRetriever(name: string, user: { id: string; dept?: string; tags?: string[] }) {
  const retr = await topK(name);
  return async (q: string) => {
    const filter: any = {};
    if (user.dept) filter.dept = user.dept; // 部门内文档优先
    const hits = await retr(q, 10, Object.keys(filter).length ? filter : undefined);
    // 简单加权:匹配用户兴趣标签的得分略升(示意)
    const boost = (h: any) => (user.tags || []).some(t => (h.meta?.tags || []).includes(t)) ? 0.05 : 0;
    return mmr(hits.sort((a,b)=> (a.score! - boost(a)) - (b.score! - boost(b))), 6);
  };
}

7.11 混合搜索(关键词 + 向量)与重排序

typescript 复制代码
// 文件:src/ch07/hybrid.ts
import { topK } from "./retrievers";

function keywordSearch(q: string, corpus: { id: string; text: string }[], k = 5) {
  const lower = q.toLowerCase();
  return corpus.map(d => ({ d, s: (d.text.toLowerCase().match(new RegExp(lower, "g")) || []).length }))
    .sort((a,b)=> b.s - a.s).slice(0,k).map(x => ({ text: x.d.text, meta: { id: x.d.id, source: "kw" } }));
}

export async function hybrid(name: string) {
  const retr = await topK(name);
  return async (q: string) => {
    const [vecHits, kwHits] = await Promise.all([
      retr(q, 8),
      Promise.resolve(keywordSearch(q, [{ id: "k1", text: "RAG 结合检索与生成" }, { id: "k2", text: "MMR 提升多样性" }], 4)),
    ]);
    const items = [
      ...vecHits.map(h => ({ ...h, weight: 0.7 * (1 - (h.score ?? 0)) })),
      ...kwHits.map(h => ({ ...h, weight: 0.3 })),
    ];
    return items.sort((a,b)=> b.weight - a.weight).slice(0,6).map(({ weight, ...rest })=> rest);
  };
}

🧩 融合与生成:降低幻觉,强制引用

7.12 结构化回答模板与引用约束

typescript 复制代码
// 文件:src/ch07/prompt.ts
import { ChatPromptTemplate } from "@langchain/core/prompts";

export const answerPrompt = ChatPromptTemplate.fromMessages([
  ["system", `你是严谨的企业知识助手。仅基于"候选片段"作答;若信息不足请回答"我不知道"。
输出严格 JSON:
{
  "answer": string,
  "citations": [{"source": string, "chunkId": string}],
  "confidence": number // 0~1
}`],
  ["human", `问题:{question}\n候选片段:\n{chunks}\n请输出 JSON:`],
]);

7.13 生成链(RunnableSequence)

typescript 复制代码
// 文件:src/ch07/answer-chain.ts
import { RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI } from "@langchain/openai";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { answerPrompt } from "./prompt";

export function buildAnswerChain() {
  const llm = new ChatOpenAI({ temperature: 0 });
  const parser = new JsonOutputParser<{ answer: string; citations: {source:string; chunkId:string}[]; confidence:number }>();
  return RunnableSequence.from([
    answerPrompt,
    llm,
    parser,
  ]);
}

7.14 Guardrails:引用校验与事实性粗筛

typescript 复制代码
// 文件:src/ch07/guards.ts
export function validateCitations(res: { answer: string; citations: any[] }) {
  const ok = Array.isArray(res.citations) && res.citations.length > 0 && res.citations.every(c => c.source);
  if (!ok) throw new Error("缺少引用或格式错误");
  return res;
}

export function shortCircuitIfEmpty(hits: { text: string }[]) {
  if (!hits || hits.length === 0) throw new Error("NO_CONTEXT");
}

7.15 端到端 RAG Runnable 链

typescript 复制代码
// 文件:src/ch07/rag-pipeline.ts
import { RunnableSequence, RunnableLambda } from "@langchain/core/runnables";
import { buildAnswerChain } from "./answer-chain";
import { hybrid } from "./hybrid";
import { validateCitations, shortCircuitIfEmpty } from "./guards";

export async function buildRagPipeline(name = "news") {
  const retrieve = await hybrid(name);
  const toChunkList = new RunnableLambda(async (q: string) => {
    const hits = await retrieve(q);
    shortCircuitIfEmpty(hits);
    const chunks = hits.map((h, i) => `#${i} [${h.meta?.source || "vec"}] ${String(h.text).replace(/\n/g," ").slice(0,400)}...`).join("\n");
    return { question: q, chunks };
  });

  const answer = buildAnswerChain();
  const check = new RunnableLambda((res: any) => validateCitations(res));

  return RunnableSequence.from([
    toChunkList,
    answer,
    check,
  ]);
}

🧠 LangGraph:用状态图编排 RAG 工作流

7.16 图节点:检索→融合→回答→守护→回写

typescript 复制代码
// 文件:src/ch07/graph.ts(伪示例,依赖版本以本地为准)
import { StateGraph } from "@langchain/langgraph";
import { buildRagPipeline } from "./rag-pipeline";

type S = { q: string; result?: any; error?: string };

export async function buildGraph() {
  const graph = new StateGraph<S>({ channels: { q: { value: "" }, result: { value: {} }, error: { value: "" } } });
  graph.addNode("rag", async (s) => {
    try {
      const pipeline = await buildRagPipeline("news");
      const out = await pipeline.invoke(s.q);
      return { result: out };
    } catch (e: any) {
      return { error: e.message };
    }
  });
  graph.addEdge("start", "rag");
  graph.addEdge("rag", "end");
  return graph.compile();
}

🌐 Next.js 接口:/api/ingest 与 /api/rag(SSE 流式)

7.17 文档入库接口

typescript 复制代码
// 文件:src/app/api/ingest/route.ts
import { NextRequest } from "next/server";
import { buildCollection } from "@/src/ch07/indexer-chroma";

export const runtime = "edge";

export async function POST(req: NextRequest) {
  const { dir, name } = await req.json();
  try {
    await buildCollection(dir || "./docs", name || "news");
    return Response.json({ ok: true });
  } catch (e: any) {
    return Response.json({ ok: false, message: e.message }, { status: 500 });
  }
}

7.18 RAG 问答(SSE)

typescript 复制代码
// 文件:src/app/api/rag/route.ts
import { NextRequest } from "next/server";
import { buildRagPipeline } from "@/src/ch07/rag-pipeline";

export const runtime = "edge";

export async function POST(req: NextRequest) {
  const { q } = await req.json();
  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      try {
        const pipe = await buildRagPipeline("news");
        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: "start" })}\n\n`));
        const out = await pipe.invoke(q);
        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: "result", data: out })}\n\n`));
      } catch (e: any) {
        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ type: "error", message: e.message })}\n\n`));
      } finally { controller.close(); }
    }
  });
  return new Response(stream, { headers: { "Content-Type": "text/event-stream", "Cache-Control": "no-cache" } });
}

7.19 前端消费(移动端适配)

tsx 复制代码
// 文件:src/app/rag/page.tsx
"use client";
import { useRef, useState } from "react";

export default function RAGPage() {
  const [q, setQ] = useState("");
  const [answer, setAnswer] = useState<any>(null);
  const [err, setErr] = useState("");
  const esRef = useRef<EventSource | null>(null);

  const ask = async () => {
    setAnswer(null); setErr("");
    const res = await fetch("/api/rag", { method: "POST", body: JSON.stringify({ q }) });
    // 本示例简化:直接使用 EventSource 访问同一路由仅作演示,生产建议直接消费响应流
    const es = new EventSource("/api/rag");
    esRef.current = es;
    es.onmessage = (e) => {
      const data = JSON.parse(e.data);
      if (data.type === "result") setAnswer(data.data);
      if (data.type === "error") setErr(data.message);
    };
  };

  const stop = () => esRef.current?.close();

  return (
    <main className="max-w-screen-sm mx-auto p-4">
      <div className="flex gap-2">
        <input className="flex-1 border rounded px-3 py-2" value={q} onChange={e=>setQ(e.target.value)} placeholder="提问:例如 公司新政策的生效时间?" />
        <button className="px-4 py-2 bg-blue-600 text-white rounded" onClick={ask}>查询</button>
        <button className="px-4 py-2 bg-gray-600 text-white rounded" onClick={stop}>停止</button>
      </div>
      {err && <p className="text-red-600 mt-2">{err}</p>}
      {answer && (
        <section className="mt-4 space-y-2">
          <h2 className="font-semibold">回答</h2>
          <pre className="whitespace-pre-wrap break-words">{JSON.stringify(answer, null, 2)}</pre>
        </section>
      )}
    </main>
  );
}

🚀 实战:时效型新闻/公告 RAG 平台

7.20 需求与约束

  • 数据多源:官网公告、财报摘要、内部通知(部门维度)
  • 强时效:近期内容应有更高权重;过期内容降权或过滤
  • 可追溯:每条回答必须有引用;引用需含来源链接与发布时间
  • 移动优先:列表卡片化,点击展开引用与原文链接

7.21 排序策略

  • 综合权重:score = α·语义相似 + β·时间衰减 + γ·来源权威度 + δ·个性化偏好
  • 时间衰减:decay = exp(-λ · days),近期更高
  • 来源权威:官网 > 行业协会 > 社媒

7.22 关键代码(排序与合并)

typescript 复制代码
// 文件:src/ch07/ranker.ts
export function rank(hits: { text: string; meta: any; score?: number }[], now = Date.now()) {
  function decay(ts?: number) {
    if (!ts) return 1; const days = (now - ts) / (24*3600*1000); const λ = 0.05; return Math.exp(-λ*days);
  }
  function authority(src?: string) {
    if (!src) return 0.8; if (src.includes("official")) return 1.0; if (src.includes("association")) return 0.9; return 0.8;
  }
  return hits.map(h => ({
    ...h,
    final: 0.6*(1 - (h.score ?? 0)) + 0.25*decay(h.meta?.publishedAt) + 0.15*authority(h.meta?.source || "")
  })).sort((a,b)=> b.final - a.final);
}

7.23 UI 细节

  • 列表展示:标题、来源、发布时间、相似度/置信度
  • 详情框:引用片段、原文链接、关键句高亮
  • 空结果:提供"换个问法/选择时间范围"的引导

⚙️ 工程与优化

7.24 缓存与并发

  • 输入指纹(hash)→ 检索结果缓存(短期),降低 QPS 峰值
  • LLM 输出缓存:对同问题短期共享,缓存注入引用与置信度以便直出
  • 批处理嵌入:embeddings.embedDocuments 合并请求

7.25 成本与时延

  • 分块与 TopK:更小的 chunk + 合理 TopK,减少 context token
  • Rerank 降级:在负载高时跳过重排,仅用 MMR
  • 向量模型选择:text-embedding-3-small 性价比高;需高精度时再上大模型

7.26 幻觉与安全

  • 强制引用与"我不知道"路径;引用缺失直接报错/重试
  • 敏感词与合规校验(关键词/正则/嵌入分类器),拒答与引导
  • 记录"不可回答"问题,作为语料建设与 FAQ 补齐的输入

7.27 可观测性

  • 回调(Callback)上报:检索数量、TopK 命中率、Rerank 是否开启、最终 token 使用
  • 日志:RunId 贯穿;链路级延迟监控;错误分类(NO_CONTEXT、INVALID_CITATIONS、TIMEOUT)
  • 回放:保存问题、命中片段、最终回答与引用,用于评审

🧪 评测与持续改进

  • Golden Set:真实问题 + 标准答案 + 必要引用清单
  • 离线评测:Recall@K、MRR、Citation Accuracy、Faithfulness
  • 在线评测:满意度、人工接管率、追问率、平均延迟、成本
  • A/B:chunkSize/overlap、TopK、MMR、混合权重、Rerank 开关与模型对比

📚 延伸链接

  • LangChain.js 文档(RAG):https://js.langchain.com/
  • LangGraph:https://langchain-ai.github.io/langgraph/
  • Chroma:https://docs.trychroma.com/
  • Reranking 参考(Cohere/Jina/ColBERT 等)

✅ 本章小结

  • 构建了完整的 RAG 流程:索引→检索→融合→生成→守护→可观测
  • 实现了时间/个性化/混合检索与权重排序,降低幻觉并保证可追溯
  • 用 Runnable 与 LangGraph 编排工作流,并在 Next.js 中提供 API 与前端
  • 形成了评测与迭代闭环,实现生产可用的 RAG 体系

🎯 下章预告

下一章《Agent 智能代理系统开发》中,我们将:

  • 构建具备工具调用、计划分解与多轮推理的 Agent
  • 集成检索工具与代码执行环境,打造更强的任务完成能力
  • 用 LangGraph 实现 Multi-Agent 协作与冲突解决

最后感谢阅读!欢迎关注我,微信公众号:《鲫小鱼不正经》。欢迎点赞、收藏、关注,一键三连!!!

相关推荐
前端小巷子3 小时前
原生 JS 打造三级联动
前端·javascript·面试
江城开朗的豌豆3 小时前
useEffect vs componentDidUpdate:谁才是真正的更新之王?
前端·javascript·react.js
江城开朗的豌豆3 小时前
解密useEffect:让副作用无所遁形!
前端·javascript·react.js
IT_陈寒3 小时前
SpringBoot性能翻倍的7个隐藏配置,90%开发者从不知道!
前端·人工智能·后端
CODE_RabbitV4 小时前
【1分钟速通】 HTML快速入门
前端·html
weixin_459793104 小时前
SSE 模仿 GPT 响应
前端
rookie fish4 小时前
Electron+Vite+Vue项目中,如何监听Electron的修改实现和Vue一样的热更新?[特殊字符]
前端·vue.js·electron
她超甜i4 小时前
前端通过后端给的webrtc的链接,在前端展示,并更新实时状态
前端·javascript·webrtc
歪歪1004 小时前
Redux和MobX在React Native状态管理中的优缺点对比
前端·javascript·react native·react.js·架构·前端框架