Spring AI + RAG实战:打造企业级智能问答系统
🔥 文章定位:从0到1手把手搭建企业级RAG知识库系统
💡 适合人群:有Spring Boot基础的Java开发者,想用AI赋能业务数据
⏱️ 阅读时间:约30分钟
🛠️ 技术栈:Spring AI 1.1 + Milvus向量数据库 + 通义千问 + Spring Boot 3.3

前言:为什么你的AI总是不"懂"你的业务?
很多团队接入了AI后,发现一个尴尬的问题:AI回答得头头是道,但跟你的业务完全没关系。
比如你问:"我们公司年假政策是什么?" AI答:"根据劳动法规定,年假计算方式如下..."
这就是典型的知识缺失问题。AI预训练时不知道你公司的规章制度,需要通过RAG把知识注入进去。
RAG = Retrieval(检索) + Augmentation(增强) + Generation(生成)
今天这篇文章,我用一个完整的企业内部知识问答系统案例,带你从0到1掌握Spring AI RAG开发。
一、项目需求与架构设计
1.1 业务场景
某科技公司需要一个内部知识问答系统,让员工可以:
- 用自然语言查询公司制度、流程文档、技术规范
- 支持PDF、Word、Markdown等多种格式文档
- 快速找到相关政策,不用在海量文档里翻找
- 答案带有原文引用,方便溯源
1.2 系统架构
scss
┌─────────────────────────────────────────────────────────────────┐
│ 用户查询界面 │
│ (Web / 企业微信 / 钉钉 / 飞书) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Spring Boot 应用 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ REST API │ │ ChatClient│ │ Document │ │ Vector Search │ │
│ │ Controller│ │ (通义Qwen)│ │ Loader │ │ Service │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────────┐ ┌────────────────┐
│ Milvus向量库 │ │ 文档存储(OSS) │ │ 通义Qwen API │
│ (语义检索) │ │ (原始文件) │ │ (AI生成) │
└──────────────┘ └────────────────┘ └────────────────┘
1.3 技术选型
| 组件 | 选型 | 理由 |
|---|---|---|
| AI模型 | 通义Qwen-Plus | 中文最强,性价比高,国内合规 |
| 向量数据库 | Milvus | 分布式,高性能,开源 |
| 文档格式 | PDF/Word/Markdown/TXT | 企业最常见格式 |
| 嵌入模型 | text-embedding-v4 | 阿里官方,中文优化 |
| 框架 | Spring AI 1.1 | 与Spring Boot无缝集成 |
| 部署 | Docker Compose | 一键部署所有组件 |
二、环境搭建:Docker Compose一键部署
2.1 目录结构
bash
company-kb/
├── docker-compose.yml
├── spring-ai-kb-app/
│ ├── src/main/java/com/company/kb/
│ │ ├── CompanyKbApplication.java
│ │ ├── config/
│ │ │ ├── AiConfig.java
│ │ │ └── MilvusConfig.java
│ │ ├── controller/
│ │ │ ├── ChatController.java
│ │ │ └── DocumentController.java
│ │ ├── service/
│ │ │ ├── RagService.java
│ │ │ ├── DocumentService.java
│ │ │ └── EmbeddingService.java
│ │ ├── model/
│ │ │ ├── DocumentChunk.java
│ │ │ └── ChatRequest.java
│ │ └── loader/
│ │ ├── PdfLoader.java
│ │ ├── WordLoader.java
│ │ └── MarkdownLoader.java
│ └── src/main/resources/
│ └── application.yml
├── documents/ # 原始文档目录
└── embeddings/ # 嵌入缓存
2.2 Docker Compose配置
yaml
# docker-compose.yml
version: '3.8'
services:
# Milvus向量数据库
milvus-etcd:
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
volumes:
- ./volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
milvus-minio:
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
volumes:
- ./volumes/minio:/minio_data
command: minio server /minio_data
milvus-standalone:
image: milvusdb/milvus:v2.4.0
container_name: milvus-standalone
ports:
- "19530:19530"
- "9091:9091"
volumes:
- ./volumes/milvus:/var/lib/milvus
environment:
ETCD_ENDPOINTS: milvus-etcd:2379
MINIO_ADDRESS: milvus-minio:9000
depends_on:
- milvus-etcd
- milvus-minio
restart: always
# Spring AI应用(稍后构建)
spring-ai-kb:
build: ./spring-ai-kb-app
container_name: spring-ai-kb
ports:
- "8080:8080"
environment:
- SPRING_AI_ALIBABA_API_KEY=${ALI_API_KEY}
volumes:
- ./documents:/data/documents
depends_on:
- milvus-standalone
restart: always
networks:
default:
name: milvus-network
启动命令:
bash
# 启动所有服务
docker-compose up -d
# 查看状态
docker-compose ps
# 查看日志
docker-compose logs -f spring-ai-kb
⚠️ 注意:首次启动Milvus会下载镜像,需要等待3-5分钟。确保服务器内存≥8GB。
三、核心配置:application.yml
yaml
spring:
application:
name: company-kb-system
servlet:
multipart:
max-file-size: 100MB # 单文件最大100MB
max-request-size: 500MB # 单次请求最大500MB
ai:
alibaba:
api-key: ${ALI_API_KEY}
base-url: https://dashscope.aliyuncs.com/compatible-mode/v1
chat:
options:
model: qwen-plus
temperature: 0.1 # RAG场景温度要低,保证准确性
max-tokens: 2000
embedding:
options:
model: text-embedding-v4
# Milvus向量数据库配置
spring.ai.vectorstore.milvus:
client:
host: localhost
port: 19530
username: ""
password: ""
# 文件存储路径
app:
storage:
documents: /data/documents
embeddings: /data/embeddings
# 服务器配置
server:
port: 8080
logging:
level:
org.springframework.ai: DEBUG
com.company.kb: DEBUG
四、文档处理:多格式支持与智能分块
4.1 文档分块策略(核心!)
RAG效果80%取决于分块质量。太长→噪声多;太短→上下文不足。
三种分块策略对比:
| 策略 | 块大小 | 重叠 | 适用场景 |
|---|---|---|---|
| 固定字数 | 500字 | 50字 | 通用场景 |
| 段落分块 | 按段落边界 | 0 | 文档结构清晰 |
| 语义分块 | 按主题边界 | 100字 | 高度语义化内容 |
| 递归分块 | 层层细分 | 50字 | 生产环境推荐 |
4.2 统一文档加载器
java
package com.company.kb.loader;
import org.springframework.stereotype.Component;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
/**
* 文档加载器统一接口
*/
public interface DocumentLoader {
boolean supports(String fileName);
List<String> load(Path path) throws IOException;
String getContent(Path path) throws IOException;
}
4.3 PDF加载器
java
@Component
public class PdfLoader implements DocumentLoader {
@Override
public boolean supports(String fileName) {
return fileName.toLowerCase().endsWith(".pdf");
}
@Override
public List<String> load(Path path) throws IOException {
String content = getContent(path);
return smartChunk(content, 500, 50);
}
@Override
public String getContent(Path path) throws IOException {
StringBuilder text = new StringBuilder();
// 使用PDFBox提取文本
try (PDDocument document = PDDocument.load(path.toFile())) {
PDFRenderer renderer = new PDFRenderer(document);
for (int i = 0; i < document.getNumberOfPages(); i++) {
String pageText = renderer.renderTextFromPage(i);
text.append(pageText).append("\n");
}
}
return text.toString();
}
/**
* 智能分块:先按段落分割,再合并小段落
*/
private List<String> smartChunk(String content, int maxChars, int overlap) {
// 先清理文本
content = cleanText(content);
// 按段落分割
String[] paragraphs = content.split("\\n\\s*\\n");
List<String> chunks = new ArrayList<>();
StringBuilder currentChunk = new StringBuilder();
for (String paragraph : paragraphs) {
if (currentChunk.length() + paragraph.length() <= maxChars) {
currentChunk.append(paragraph).append("\n\n");
} else {
if (currentChunk.length() > 0) {
chunks.add(currentChunk.toString().trim());
// 保留重叠部分
String overlapText = getOverlapText(currentChunk.toString(), overlap);
currentChunk = new StringBuilder(overlapText);
}
currentChunk.append(paragraph).append("\n\n");
}
}
if (currentChunk.length() > 0) {
chunks.add(currentChunk.toString().trim());
}
return chunks;
}
private String cleanText(String text) {
return text.replaceAll("\\s+", " ") // 合并空白
.replaceAll("[\\x00-\\x1F]", ""); // 移除控制字符
}
private String getOverlapText(String text, int overlapChars) {
if (text.length() <= overlapChars) return text;
return text.substring(text.length() - overlapChars);
}
}
4.4 Word加载器
java
@Component
public class WordLoader implements DocumentLoader {
@Override
public boolean supports(String fileName) {
return fileName.toLowerCase().endsWith(".docx") ||
fileName.toLowerCase().endsWith(".doc");
}
@Override
public List<String> load(Path path) throws IOException {
String content = getContent(path);
return smartChunk(content, 500, 50);
}
@Override
public String getContent(Path path) throws IOException {
StringBuilder text = new StringBuilder();
try (XWPFDocument doc = new XWPFDocument(Files.newInputStream(path))) {
for (XWPFParagraph para : doc.getParagraphs()) {
text.append(para.getText()).append("\n");
}
// 处理表格
for (XWPFTable table : doc.getTables()) {
text.append("\n[表格]\n");
for (XWPFTableRow row : table.getRows()) {
String rowText = row.getTableCells().stream()
.map(XWPFTableCell::getText)
.collect(Collectors.joining(" | "));
text.append(rowText).append("\n");
}
}
}
return text.toString();
}
private List<String> smartChunk(String content, int maxChars, int overlap) {
content = content.replaceAll("\\s+", " ");
List<String> chunks = new ArrayList<>();
int start = 0;
while (start < content.length()) {
int end = Math.min(start + maxChars, content.length());
if (end < content.length()) {
// 在句子边界处截断
end = findSentenceBoundary(content, end);
}
chunks.add(content.substring(start, end));
start = end - overlap;
if (start < 0) start = 0;
}
return chunks;
}
private int findSentenceBoundary(String text, int pos) {
for (int i = pos; i < Math.min(pos + 100, text.length()); i++) {
if (".。!?".indexOf(text.charAt(i)) >= 0) {
return i + 1;
}
}
return pos;
}
}
4.5 Markdown加载器
java
@Component
public class MarkdownLoader implements DocumentLoader {
@Override
public boolean supports(String fileName) {
return fileName.toLowerCase().endsWith(".md");
}
@Override
public List<String> load(Path path) throws IOException {
String content = Files.readString(path);
return chunkBySection(content);
}
@Override
public String getContent(Path path) throws IOException {
return Files.readString(path);
}
/**
* 按标题章节分块(保留文档结构)
*/
private List<String> chunkBySection(String content) {
List<String> chunks = new StringBuilder();
String[] lines = content.split("\n");
StringBuilder currentSection = new StringBuilder();
String currentHeading = "";
for (String line : lines) {
if (line.startsWith("#")) {
// 保存上一个章节
if (currentSection.length() > 100) {
chunks.add(currentSection.toString().trim());
}
currentHeading = line;
currentSection = new StringBuilder(line).append("\n");
} else {
currentSection.append(line).append("\n");
}
}
if (currentSection.length() > 0) {
chunks.add(currentSection.toString().trim());
}
// 如果章节太长,再按段落分
List<String> finalChunks = new ArrayList<>();
for (String chunk : chunks) {
if (chunk.length() > 500) {
List<String> subChunks = smartChunk(chunk, 500, 50);
finalChunks.addAll(subChunks);
} else {
finalChunks.add(chunk);
}
}
return finalChunks;
}
private List<String> smartChunk(String content, int maxChars, int overlap) {
List<String> chunks = new ArrayList<>();
int start = 0;
while (start < content.length()) {
int end = Math.min(start + maxChars, content.length());
chunks.add(content.substring(start, end));
start = end - overlap;
if (start < 0) start = 0;
}
return chunks;
}
}
五、向量存储:Milvus配置与数据写入
5.1 Milvus配置类
java
package com.company.kb.config;
import io.milvus.client.MilvusClient;
import io.milvus.param.ConnectParam;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class MilvusConfig {
@Value("${spring.ai.vectorstore.milvus.client.host}")
private String host;
@Value("${spring.ai.vectorstore.milvus.client.port}")
private int port;
@Bean
public MilvusClient milvusClient() {
return new MilvusClient(ConnectParam.newBuilder()
.withHost(host)
.withPort(port)
.build());
}
}
5.2 文档上传服务
java
package com.company.kb.service;
import com.company.kb.loader.DocumentLoader;
import jakarta.annotation.PostConstruct;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.milvus.MilvusVectorStore;
import org.springframework.ai.document.Document;
import org.springframework.core.io.Resource;
import org.springframework.core.io.UrlResource;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.nio.file.*;
import java.util.*;
import java.util.stream.Collectors;
@Service
@RequiredArgsConstructor
@Slf4j
public class DocumentService {
private final List<DocumentLoader> loaders;
private final EmbeddingModel embeddingModel;
@Value("${app.storage.documents}")
private String documentsPath;
private VectorStore vectorStore;
@PostConstruct
public void init() {
// 初始化Milvus VectorStore
this.vectorStore = MilvusVectorStore.builder(embeddingModel, milvusClient())
.collectionName("company_knowledge_base")
.dimension(1536) // text-embedding-v4维度
.build();
log.info("Milvus VectorStore初始化完成");
}
private MilvusClient milvusClient() {
ConnectParam param = ConnectParam.newBuilder()
.withHost("localhost")
.withPort(19530)
.build();
return new MilvusClient(param);
}
/**
* 上传文档
*/
public UploadResult uploadDocument(MultipartFile file) throws Exception {
// 1. 保存原始文件
Path savePath = Paths.get(documentsPath, file.getOriginalFilename());
Files.createDirectories(savePath.getParent());
file.transferTo(savePath);
// 2. 找到合适的加载器
DocumentLoader loader = loaders.stream()
.filter(l -> l.supports(file.getOriginalFilename()))
.findFirst()
.orElseThrow(() -> new IllegalArgumentException(
"不支持的文件格式:" + file.getOriginalFilename()));
// 3. 加载并分块
List<String> chunks = loader.load(savePath);
log.info("文档 {} 分割为 {} 个块", file.getOriginalFilename(), chunks.size());
// 4. 构建Document对象(带元数据)
List<Document> documents = new ArrayList<>();
for (int i = 0; i < chunks.size(); i++) {
Map<String, Object> metadata = new HashMap<>();
metadata.put("source", file.getOriginalFilename());
metadata.put("chunk_index", i);
metadata.put("upload_time", System.currentTimeMillis());
metadata.put("total_chunks", chunks.size());
Document doc = new Document(
UUID.randomUUID().toString(),
chunks.get(i),
metadata
);
documents.add(doc);
}
// 5. 写入向量库
vectorStore.add(documents);
log.info("已写入 {} 个向量到Milvus", documents.size());
return new UploadResult(
file.getOriginalFilename(),
chunks.size(),
documents.size(),
true
);
}
/**
* 批量上传目录
*/
public BatchUploadResult batchUpload(String directoryPath) throws Exception {
Path dir = Paths.get(directoryPath);
File[] files = dir.toFile().listFiles();
int success = 0;
int failed = 0;
List<String> errors = new ArrayList<>();
for (File file : files) {
if (file.isFile()) {
try {
MultipartFile mpf = new MockMultipartFile(
file.getName(), file.getName(),
"application/octet-stream",
Files.readAllBytes(file.toPath())
);
uploadDocument(mpf);
success++;
} catch (Exception e) {
failed++;
errors.add(file.getName() + ": " + e.getMessage());
}
}
}
return new BatchUploadResult(success, failed, errors);
}
/**
* 删除文档(按文件名)
*/
public void deleteBySource(String sourceName) {
// Milvus支持按元数据过滤删除
// 这里简化处理,实际可通过ID删除
log.info("删除文档:{}", sourceName);
}
// ========== 内部类 ==========
public record UploadResult(
String fileName,
int chunks,
int vectorsAdded,
boolean success
) {}
public record BatchUploadResult(
int successCount,
int failedCount,
List<String> errors
) {}
}
六、RAG问答:多路召回与智能重排
6.1 传统RAG的问题
简单向量检索存在两个核心问题:
- 语义漂移:向量相似 ≠ 真正相关
- 上下文不足:单次检索可能遗漏关键信息
解决方案:多路召回 + 重排序
6.2 多路召回服务
java
@Service
@RequiredArgsConstructor
@Slf4j
public class MultiRetrieverService {
private final VectorStore vectorStore;
private final EmbeddingModel embeddingModel;
private final MilvusClient milvusClient;
/**
* 多路召回:从多个角度检索相关文档
*/
public List<RetrievedChunk> multiRetrieve(String query, int topK) {
List<RetrievedChunk> results = new ArrayList<>();
// 召回通道1:向量语义检索
results.addAll(vectorRetrieve(query, topK * 2));
// 召回通道2:关键词精确匹配(如果安装了BM25插件)
results.addAll(keywordRetrieve(query, topK));
// 召回通道3:元数据过滤检索(同一文档的不同块)
results.addAll(metadataRetrieve(query, topK));
// 去重
results = results.stream()
.distinct()
.collect(Collectors.toList());
// 重排序(关键!)
return rerank(query, results, topK);
}
/**
* 向量语义检索
*/
private List<RetrievedChunk> vectorRetrieve(String query, int topK) {
return vectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(topK)
.similarityThreshold(0.5)
.build()
).stream()
.map(doc -> new RetrievedChunk(
doc.getText(),
doc.getMetadata(),
"vector",
0.0 // 分数待rerank填充
))
.collect(Collectors.toList());
}
/**
* 关键词召回(简化版,实际可用Elasticsearch)
*/
private List<RetrievedChunk> keywordRetrieve(String query, int topK) {
// 提取关键词
String[] keywords = query.split("[\\s,。,、]");
return vectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(topK)
.build()
).stream()
.filter(doc -> {
String text = doc.getText().toLowerCase();
return Arrays.stream(keywords)
.anyMatch(k -> text.contains(k.toLowerCase()));
})
.map(doc -> new RetrievedChunk(
doc.getText(),
doc.getMetadata(),
"keyword",
0.0
))
.collect(Collectors.toList());
}
/**
* 元数据召回:同文档多块
*/
private List<RetrievedChunk> metadataRetrieve(String query, int topK) {
// 查找与查询相关的前几个文档,然后取这些文档的其他块
List<org.springframework.ai.document.Document> seedDocs =
vectorStore.similaritySearch(
SearchRequest.builder()
.query(query)
.topK(3)
.build()
);
List<RetrievedChunk> results = new ArrayList<>();
for (org.springframework.ai.document.Document seedDoc : seedDocs) {
String source = (String) seedDoc.getMetadata().get("source");
if (source != null) {
// 取出同一文档的其他块
// 实际实现需要自定义查询
results.add(new RetrievedChunk(
seedDoc.getText(),
seedDoc.getMetadata(),
"metadata",
0.0
));
}
}
return results;
}
/**
* 重排序:用更强的模型对召回结果排序
*/
private List<RetrievedChunk> rerank(String query,
List<RetrievedChunk> candidates,
int topK) {
if (candidates.isEmpty()) return candidates;
// 简化的重排序:计算query与每个chunk的交叉编码得分
List<RetrievedChunk> scored = candidates.stream()
.map(chunk -> {
double score = calculateRelevance(query, chunk.text());
return new RetrievedChunk(
chunk.text(),
chunk.metadata(),
chunk.source(),
score
);
})
.sorted((a, b) -> Double.compare(b.score(), a.score()))
.collect(Collectors.toList());
return scored.subList(0, Math.min(topK, scored.size()));
}
/**
* 计算相关性得分(简化版,实际可用Cohere rerank API)
*/
private double calculateRelevance(String query, String text) {
String[] queryWords = query.toLowerCase().split("\\s+");
String[] textWords = text.toLowerCase().split("\\s+");
Set<String> textSet = new HashSet<>(Arrays.asList(textWords));
long matchCount = Arrays.stream(queryWords)
.filter(textSet::contains)
.count();
return (double) matchCount / queryWords.length;
}
// ========== 内部类 ==========
public record RetrievedChunk(
String text,
Map<String, Object> metadata,
String retrievalSource,
double score
) {}
}
6.3 完整的RAG问答服务
java
@Service
@RequiredArgsConstructor
@Slf4j
public class RagService {
private final ChatClient chatClient;
private final MultiRetrieverService retriever;
private final DocumentService documentService;
private static final int DEFAULT_TOP_K = 5;
private static final int MAX_CONTEXT_TOKENS = 6000;
/**
* RAG问答
*/
public RagAnswer answer(String question) {
long startTime = System.currentTimeMillis();
// Step 1: 多路召回
List<RetrievedChunk> chunks = retriever.multiRetrieve(question, DEFAULT_TOP_K);
if (chunks.isEmpty()) {
return new RagAnswer(
"抱歉,知识库中未找到相关信息。",
Collections.emptyList(),
0,
System.currentTimeMillis() - startTime
);
}
// Step 2: 构建上下文
String context = buildContext(chunks);
List<SourceQuote> sources = buildSources(chunks);
// Step 3: 构造Prompt
String prompt = buildPrompt(question, context);
// Step 4: 调用AI
String answer;
try {
answer = chatClient.prompt()
.user(prompt)
.call()
.content();
} catch (Exception e) {
log.error("AI调用失败", e);
answer = "AI服务暂时不可用,请稍后重试。";
}
return new RagAnswer(
answer,
sources,
chunks.size(),
System.currentTimeMillis() - startTime
);
}
/**
* 流式RAG(提升用户体验)
*/
public Flux<String> answerStream(String question) {
// 先召回
List<RetrievedChunk> chunks = retriever.multiRetrieve(question, DEFAULT_TOP_K);
String context = buildContext(chunks);
String prompt = buildPrompt(question, context);
// 流式输出
return chatClient.prompt()
.user(prompt)
.stream()
.content();
}
/**
* 构建上下文(智能截断,避免超出token限制)
*/
private String buildContext(List<RetrievedChunk> chunks) {
StringBuilder context = new StringBuilder();
int totalTokens = 0;
for (RetrievedChunk chunk : chunks) {
int chunkTokens = estimateTokens(chunk.text());
if (totalTokens + chunkTokens > MAX_CONTEXT_TOKENS) {
break;
}
context.append("--- 来源:")
.append(chunk.metadata().getOrDefault("source", "未知"))
.append("---\n")
.append(chunk.text())
.append("\n\n");
totalTokens += chunkTokens;
}
return context.toString();
}
/**
* 构建Prompt(核心!)
*/
private String buildPrompt(String question, String context) {
return """
你是一个专业的企业内部知识问答助手。请根据提供的上下文信息,准确回答用户问题。
回答规则:
1. 优先使用上下文中的信息回答,不要编造内容
2. 如果上下文中没有相关信息,直接告知用户"未找到相关信息"
3. 回答要简洁、专业、有条理,使用适当的格式(如列表、表格)
4. 如果涉及政策、制度等重要信息,给出信息来源提示
请严格按照以下上下文回答问题:
{context}
用户问题:{question}
回答:
""".replace("{context}", context)
.replace("{question}", question);
}
/**
* 构建来源引用
*/
private List<SourceQuote> buildSources(List<RetrievedChunk> chunks) {
return chunks.stream()
.map(chunk -> new SourceQuote(
(String) chunk.metadata().getOrDefault("source", "未知"),
chunk.text().substring(0, Math.min(100, chunk.text().length())) + "...",
chunk.retrievalSource()
))
.distinct()
.collect(Collectors.toList());
}
/**
* 估算token数(简单估算:中文1字≈1.5token,英文1词≈1.3token)
*/
private int estimateTokens(String text) {
int chineseChars = (int) text.chars().filter(c -> c > 0x4E00 && c < 0x9FA5).count();
int otherChars = text.length() - chineseChars;
return (int) (chineseChars * 1.5 + otherChars / 1.3);
}
// ========== 内部类 ==========
public record RagAnswer(
String answer,
List<SourceQuote> sources,
int chunksRetrieved,
long timeMs
) {}
public record SourceQuote(
String source,
String excerpt,
String retrievalMethod
) {}
}
七、REST API:完整问答系统接口
7.1 问答控制器
java
package com.company.kb.controller;
import com.company.kb.service.RagAnswer;
import com.company.kb.service.RagService;
import lombok.RequiredArgsConstructor;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import reactor.core.publisher.Flux;
@RestController
@RequestMapping("/api/kb")
@RequiredArgsConstructor
public class KnowledgeBaseController {
private final RagService ragService;
private final DocumentService documentService;
/**
* 问答接口
* POST /api/kb/ask
*/
@PostMapping("/ask")
public RagAnswer ask(@RequestBody AskRequest request) {
if (request.question() == null || request.question().isBlank()) {
throw new IllegalArgumentException("问题不能为空");
}
return ragService.answer(request.question());
}
/**
* 流式问答(适合长回答)
* POST /api/kb/ask/stream
*/
@PostMapping(value = "/ask/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> askStream(@RequestBody AskRequest request) {
return ragService.answerStream(request.question());
}
/**
* 上传文档
* POST /api/kb/upload
*/
@PostMapping("/upload")
public DocumentService.UploadResult upload(
@RequestParam("file") MultipartFile file) throws Exception {
return documentService.uploadDocument(file);
}
/**
* 批量上传
* POST /api/kb/upload/batch
*/
@PostMapping("/upload/batch")
public DocumentService.BatchUploadResult uploadBatch(
@RequestParam("directory") String directory) throws Exception {
return documentService.batchUpload(directory);
}
/**
* 文档列表
* GET /api/kb/documents
*/
@GetMapping("/documents")
public DocumentListResult listDocuments() {
// 返回已上传的文档列表
// 实际实现需要查询Milvus的元数据
return new DocumentListResult(Collections.emptyList(), 0);
}
// ========== 内部类 ==========
public record AskRequest(String question) {}
public record DocumentListResult(
List<DocumentInfo> documents,
int totalChunks
) {}
public record DocumentInfo(
String fileName,
int chunks,
long uploadTime
) {}
}
7.2 前端调用示例
javascript
// 问答
async function askQuestion() {
const question = document.getElementById('question').value;
const response = await fetch('/api/kb/ask', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({question})
});
const result = await response.json();
// 显示答案
document.getElementById('answer').innerText = result.answer;
// 显示来源
const sourcesHtml = result.sources.map(s =>
`<div class="source">
<strong>来源:${s.source}</strong>
<p>${s.excerpt}</p>
</div>`
).join('');
document.getElementById('sources').innerHTML = sourcesHtml;
}
// 上传文档
async function uploadDocument(file) {
const formData = new FormData();
formData.append('file', file);
const response = await fetch('/api/kb/upload', {
method: 'POST',
body: formData
});
const result = await response.json();
console.log('上传成功:', result);
}
八、生产级优化
8.1 性能优化
java
// 1. 异步文档处理
@Async
public CompletableFuture<DocumentService.UploadResult> uploadDocumentAsync(
MultipartFile file) {
return CompletableFuture.completedFuture(uploadDocument(file));
}
// 2. 批量嵌入(减少API调用)
public void batchEmbed(List<String> texts) {
// 使用批处理API,一次提交多个文本
}
// 3. 缓存嵌入结果
@Bean
public CacheManager cacheManager() {
return new ConcurrentHashMapCacheManager("embeddings");
}
@Cacheable(value = "embeddings", key = "#text.hashCode()")
public List<Double> embedWithCache(String text) {
return embeddingModel.embed(text);
}
8.2 安全防护
java
@Configuration
public class SecurityConfig {
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
return http
.csrf(AbstractHttpConfigurer::disable)
.authorizeHttpRequests(auth -> auth
// 对外接口需要认证
.requestMatchers("/api/kb/ask/**").authenticated()
// 管理接口需要管理员权限
.requestMatchers("/api/kb/upload/**").hasRole("ADMIN")
.anyRequest().permitAll()
)
.httpBasic(Customizer.withDefaults())
.build();
}
}
8.3 监控告警
java
// Prometheus + Micrometer埋点
@Bean
public MeterRegistry meterRegistry() {
return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
}
// 关键指标
private void recordMetrics(RagAnswer answer) {
meterRegistry.counter("kb.questions.total").increment();
meterRegistry.summary("kb.questions.duration", answer.timeMs())
.record(answer.timeMs());
meterRegistry.summary("kb.chunks.retrieved", answer.chunksRetrieved())
.record(answer.chunksRetrieved());
}
九、效果测试
9.1 测试用例
bash
# 1. 上传知识文档
curl -X POST -F "file=@./docs/年假制度.pdf" http://localhost:8080/api/kb/upload
# 2. 基础问答
curl -X POST -H "Content-Type: application/json" \
-d '{"question":"我的年假有多少天?"}' \
http://localhost:8080/api/kb/ask
# 3. 流式问答
curl -X POST -H "Content-Type: application/json" \
-d '{"question":"请详细介绍一下公司的请假流程"}' \
http://localhost:8080/api/kb/ask/stream
9.2 效果评估指标
| 指标 | 说明 | 目标值 |
|---|---|---|
| 召回率 | 相关文档被检出的比例 | >90% |
| 准确率 | 检出文档确实相关的比例 | >85% |
| 响应时间 | 用户提问到收到答案 | <3s |
| Token消耗 | 每千次问答的API费用 | <$5 |
十、总结与扩展
10.1 本文核心知识点
RAG系统 = 文档处理 + 向量嵌入 + 语义检索 + AI生成
关键技术点:
├── 文档加载:PDF/Word/Markdown多格式支持
├── 智能分块:递归分块 + 语义边界
├── 向量存储:Milvus分布式部署
├── 多路召回:向量 + 关键词 + 元数据
├── 重排序:交叉编码提升相关性
├── Prompt工程:上下文构建 + 回答规则
└── 流式输出:SSE实时推送
10.2 进阶方向
| 方向 | 内容 |
|---|---|
| PDF OCR | 用OCR处理扫描件文档 |
| 多语言 | 支持英文文档和跨语言检索 |
| 知识图谱 | 构建实体关系,增强推理能力 |
| 主动学习 | 用户反馈自动优化检索效果 |
| Agent | 支持多轮对话和工具调用 |
🚀 如果有帮助,欢迎点赞、收藏!有任何问题评论区见!