Spring AI + RAG实战:打造企业级智能问答系统

Spring AI + RAG实战:打造企业级智能问答系统

🔥 文章定位:从0到1手把手搭建企业级RAG知识库系统

💡 适合人群:有Spring Boot基础的Java开发者,想用AI赋能业务数据

⏱️ 阅读时间:约30分钟

🛠️ 技术栈:Spring AI 1.1 + Milvus向量数据库 + 通义千问 + Spring Boot 3.3


前言:为什么你的AI总是不"懂"你的业务?

很多团队接入了AI后,发现一个尴尬的问题:AI回答得头头是道,但跟你的业务完全没关系。

比如你问:"我们公司年假政策是什么?" AI答:"根据劳动法规定,年假计算方式如下..."

这就是典型的知识缺失问题。AI预训练时不知道你公司的规章制度,需要通过RAG把知识注入进去。

RAG = Retrieval(检索) + Augmentation(增强) + Generation(生成)

今天这篇文章,我用一个完整的企业内部知识问答系统案例,带你从0到1掌握Spring AI RAG开发。


一、项目需求与架构设计

1.1 业务场景

某科技公司需要一个内部知识问答系统,让员工可以:

  • 用自然语言查询公司制度、流程文档、技术规范
  • 支持PDF、Word、Markdown等多种格式文档
  • 快速找到相关政策,不用在海量文档里翻找
  • 答案带有原文引用,方便溯源

1.2 系统架构

scss 复制代码
┌─────────────────────────────────────────────────────────────────┐
│                        用户查询界面                               │
│              (Web / 企业微信 / 钉钉 / 飞书)                       │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Spring Boot 应用                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │
│  │ REST API │  │ ChatClient│ │ Document │  │ Vector Search    │ │
│  │ Controller│ │ (通义Qwen)│ │ Loader   │  │ Service          │ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        ▼                    ▼                    ▼
┌──────────────┐   ┌────────────────┐   ┌────────────────┐
│ Milvus向量库 │   │  文档存储(OSS)  │   │  通义Qwen API  │
│  (语义检索)   │   │  (原始文件)     │   │  (AI生成)       │
└──────────────┘   └────────────────┘   └────────────────┘

1.3 技术选型

组件 选型 理由
AI模型 通义Qwen-Plus 中文最强,性价比高,国内合规
向量数据库 Milvus 分布式,高性能,开源
文档格式 PDF/Word/Markdown/TXT 企业最常见格式
嵌入模型 text-embedding-v4 阿里官方,中文优化
框架 Spring AI 1.1 与Spring Boot无缝集成
部署 Docker Compose 一键部署所有组件

二、环境搭建:Docker Compose一键部署

2.1 目录结构

bash 复制代码
company-kb/
├── docker-compose.yml
├── spring-ai-kb-app/
│   ├── src/main/java/com/company/kb/
│   │   ├── CompanyKbApplication.java
│   │   ├── config/
│   │   │   ├── AiConfig.java
│   │   │   └── MilvusConfig.java
│   │   ├── controller/
│   │   │   ├── ChatController.java
│   │   │   └── DocumentController.java
│   │   ├── service/
│   │   │   ├── RagService.java
│   │   │   ├── DocumentService.java
│   │   │   └── EmbeddingService.java
│   │   ├── model/
│   │   │   ├── DocumentChunk.java
│   │   │   └── ChatRequest.java
│   │   └── loader/
│   │       ├── PdfLoader.java
│   │       ├── WordLoader.java
│   │       └── MarkdownLoader.java
│   └── src/main/resources/
│       └── application.yml
├── documents/           # 原始文档目录
└── embeddings/         # 嵌入缓存

2.2 Docker Compose配置

yaml 复制代码
# docker-compose.yml
version: '3.8'

services:
  # Milvus向量数据库
  milvus-etcd:
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
    volumes:
      - ./volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

  milvus-minio:
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    volumes:
      - ./volumes/minio:/minio_data
    command: minio server /minio_data

  milvus-standalone:
    image: milvusdb/milvus:v2.4.0
    container_name: milvus-standalone
    ports:
      - "19530:19530"
      - "9091:9091"
    volumes:
      - ./volumes/milvus:/var/lib/milvus
    environment:
      ETCD_ENDPOINTS: milvus-etcd:2379
      MINIO_ADDRESS: milvus-minio:9000
    depends_on:
      - milvus-etcd
      - milvus-minio
    restart: always

  # Spring AI应用(稍后构建)
  spring-ai-kb:
    build: ./spring-ai-kb-app
    container_name: spring-ai-kb
    ports:
      - "8080:8080"
    environment:
      - SPRING_AI_ALIBABA_API_KEY=${ALI_API_KEY}
    volumes:
      - ./documents:/data/documents
    depends_on:
      - milvus-standalone
    restart: always

networks:
  default:
    name: milvus-network

启动命令:

bash 复制代码
# 启动所有服务
docker-compose up -d

# 查看状态
docker-compose ps

# 查看日志
docker-compose logs -f spring-ai-kb

⚠️ 注意:首次启动Milvus会下载镜像,需要等待3-5分钟。确保服务器内存≥8GB。


三、核心配置:application.yml

yaml 复制代码
spring:
  application:
    name: company-kb-system
  servlet:
    multipart:
      max-file-size: 100MB      # 单文件最大100MB
      max-request-size: 500MB    # 单次请求最大500MB

  ai:
    alibaba:
      api-key: ${ALI_API_KEY}
      base-url: https://dashscope.aliyuncs.com/compatible-mode/v1
      chat:
        options:
          model: qwen-plus
          temperature: 0.1       # RAG场景温度要低,保证准确性
          max-tokens: 2000
      embedding:
        options:
          model: text-embedding-v4

# Milvus向量数据库配置
spring.ai.vectorstore.milvus:
  client:
    host: localhost
    port: 19530
    username: ""
    password: ""

# 文件存储路径
app:
  storage:
    documents: /data/documents
    embeddings: /data/embeddings
  
# 服务器配置
server:
  port: 8080

logging:
  level:
    org.springframework.ai: DEBUG
    com.company.kb: DEBUG

四、文档处理:多格式支持与智能分块

4.1 文档分块策略(核心!)

RAG效果80%取决于分块质量。太长→噪声多;太短→上下文不足。

三种分块策略对比

策略 块大小 重叠 适用场景
固定字数 500字 50字 通用场景
段落分块 按段落边界 0 文档结构清晰
语义分块 按主题边界 100字 高度语义化内容
递归分块 层层细分 50字 生产环境推荐

4.2 统一文档加载器

java 复制代码
package com.company.kb.loader;

import org.springframework.stereotype.Component;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;

/**
 * 文档加载器统一接口
 */
public interface DocumentLoader {
    boolean supports(String fileName);
    List<String> load(Path path) throws IOException;
    String getContent(Path path) throws IOException;
}

4.3 PDF加载器

java 复制代码
@Component
public class PdfLoader implements DocumentLoader {

    @Override
    public boolean supports(String fileName) {
        return fileName.toLowerCase().endsWith(".pdf");
    }

    @Override
    public List<String> load(Path path) throws IOException {
        String content = getContent(path);
        return smartChunk(content, 500, 50);
    }

    @Override
    public String getContent(Path path) throws IOException {
        StringBuilder text = new StringBuilder();
        // 使用PDFBox提取文本
        try (PDDocument document = PDDocument.load(path.toFile())) {
            PDFRenderer renderer = new PDFRenderer(document);
            for (int i = 0; i < document.getNumberOfPages(); i++) {
                String pageText = renderer.renderTextFromPage(i);
                text.append(pageText).append("\n");
            }
        }
        return text.toString();
    }

    /**
     * 智能分块:先按段落分割,再合并小段落
     */
    private List<String> smartChunk(String content, int maxChars, int overlap) {
        // 先清理文本
        content = cleanText(content);
        
        // 按段落分割
        String[] paragraphs = content.split("\\n\\s*\\n");
        
        List<String> chunks = new ArrayList<>();
        StringBuilder currentChunk = new StringBuilder();
        
        for (String paragraph : paragraphs) {
            if (currentChunk.length() + paragraph.length() <= maxChars) {
                currentChunk.append(paragraph).append("\n\n");
            } else {
                if (currentChunk.length() > 0) {
                    chunks.add(currentChunk.toString().trim());
                    // 保留重叠部分
                    String overlapText = getOverlapText(currentChunk.toString(), overlap);
                    currentChunk = new StringBuilder(overlapText);
                }
                currentChunk.append(paragraph).append("\n\n");
            }
        }
        
        if (currentChunk.length() > 0) {
            chunks.add(currentChunk.toString().trim());
        }
        
        return chunks;
    }

    private String cleanText(String text) {
        return text.replaceAll("\\s+", " ")  // 合并空白
                   .replaceAll("[\\x00-\\x1F]", "");  // 移除控制字符
    }

    private String getOverlapText(String text, int overlapChars) {
        if (text.length() <= overlapChars) return text;
        return text.substring(text.length() - overlapChars);
    }
}

4.4 Word加载器

java 复制代码
@Component
public class WordLoader implements DocumentLoader {

    @Override
    public boolean supports(String fileName) {
        return fileName.toLowerCase().endsWith(".docx") || 
               fileName.toLowerCase().endsWith(".doc");
    }

    @Override
    public List<String> load(Path path) throws IOException {
        String content = getContent(path);
        return smartChunk(content, 500, 50);
    }

    @Override
    public String getContent(Path path) throws IOException {
        StringBuilder text = new StringBuilder();
        try (XWPFDocument doc = new XWPFDocument(Files.newInputStream(path))) {
            for (XWPFParagraph para : doc.getParagraphs()) {
                text.append(para.getText()).append("\n");
            }
            // 处理表格
            for (XWPFTable table : doc.getTables()) {
                text.append("\n[表格]\n");
                for (XWPFTableRow row : table.getRows()) {
                    String rowText = row.getTableCells().stream()
                            .map(XWPFTableCell::getText)
                            .collect(Collectors.joining(" | "));
                    text.append(rowText).append("\n");
                }
            }
        }
        return text.toString();
    }

    private List<String> smartChunk(String content, int maxChars, int overlap) {
        content = content.replaceAll("\\s+", " ");
        List<String> chunks = new ArrayList<>();
        
        int start = 0;
        while (start < content.length()) {
            int end = Math.min(start + maxChars, content.length());
            if (end < content.length()) {
                // 在句子边界处截断
                end = findSentenceBoundary(content, end);
            }
            chunks.add(content.substring(start, end));
            start = end - overlap;
            if (start < 0) start = 0;
        }
        return chunks;
    }

    private int findSentenceBoundary(String text, int pos) {
        for (int i = pos; i < Math.min(pos + 100, text.length()); i++) {
            if (".。!?".indexOf(text.charAt(i)) >= 0) {
                return i + 1;
            }
        }
        return pos;
    }
}

4.5 Markdown加载器

java 复制代码
@Component
public class MarkdownLoader implements DocumentLoader {

    @Override
    public boolean supports(String fileName) {
        return fileName.toLowerCase().endsWith(".md");
    }

    @Override
    public List<String> load(Path path) throws IOException {
        String content = Files.readString(path);
        return chunkBySection(content);
    }

    @Override
    public String getContent(Path path) throws IOException {
        return Files.readString(path);
    }

    /**
     * 按标题章节分块(保留文档结构)
     */
    private List<String> chunkBySection(String content) {
        List<String> chunks = new StringBuilder();
        String[] lines = content.split("\n");
        StringBuilder currentSection = new StringBuilder();
        String currentHeading = "";
        
        for (String line : lines) {
            if (line.startsWith("#")) {
                // 保存上一个章节
                if (currentSection.length() > 100) {
                    chunks.add(currentSection.toString().trim());
                }
                currentHeading = line;
                currentSection = new StringBuilder(line).append("\n");
            } else {
                currentSection.append(line).append("\n");
            }
        }
        
        if (currentSection.length() > 0) {
            chunks.add(currentSection.toString().trim());
        }
        
        // 如果章节太长,再按段落分
        List<String> finalChunks = new ArrayList<>();
        for (String chunk : chunks) {
            if (chunk.length() > 500) {
                List<String> subChunks = smartChunk(chunk, 500, 50);
                finalChunks.addAll(subChunks);
            } else {
                finalChunks.add(chunk);
            }
        }
        
        return finalChunks;
    }

    private List<String> smartChunk(String content, int maxChars, int overlap) {
        List<String> chunks = new ArrayList<>();
        int start = 0;
        while (start < content.length()) {
            int end = Math.min(start + maxChars, content.length());
            chunks.add(content.substring(start, end));
            start = end - overlap;
            if (start < 0) start = 0;
        }
        return chunks;
    }
}

五、向量存储:Milvus配置与数据写入

5.1 Milvus配置类

java 复制代码
package com.company.kb.config;

import io.milvus.client.MilvusClient;
import io.milvus.param.ConnectParam;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class MilvusConfig {

    @Value("${spring.ai.vectorstore.milvus.client.host}")
    private String host;

    @Value("${spring.ai.vectorstore.milvus.client.port}")
    private int port;

    @Bean
    public MilvusClient milvusClient() {
        return new MilvusClient(ConnectParam.newBuilder()
                .withHost(host)
                .withPort(port)
                .build());
    }
}

5.2 文档上传服务

java 复制代码
package com.company.kb.service;

import com.company.kb.loader.DocumentLoader;
import jakarta.annotation.PostConstruct;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.milvus.MilvusVectorStore;
import org.springframework.ai.document.Document;
import org.springframework.core.io.Resource;
import org.springframework.core.io.UrlResource;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;

import java.io.IOException;
import java.nio.file.*;
import java.util.*;
import java.util.stream.Collectors;

@Service
@RequiredArgsConstructor
@Slf4j
public class DocumentService {

    private final List<DocumentLoader> loaders;
    private final EmbeddingModel embeddingModel;
    
    @Value("${app.storage.documents}")
    private String documentsPath;

    private VectorStore vectorStore;

    @PostConstruct
    public void init() {
        // 初始化Milvus VectorStore
        this.vectorStore = MilvusVectorStore.builder(embeddingModel, milvusClient())
                .collectionName("company_knowledge_base")
                .dimension(1536)  // text-embedding-v4维度
                .build();
        log.info("Milvus VectorStore初始化完成");
    }

    private MilvusClient milvusClient() {
        ConnectParam param = ConnectParam.newBuilder()
                .withHost("localhost")
                .withPort(19530)
                .build();
        return new MilvusClient(param);
    }

    /**
     * 上传文档
     */
    public UploadResult uploadDocument(MultipartFile file) throws Exception {
        // 1. 保存原始文件
        Path savePath = Paths.get(documentsPath, file.getOriginalFilename());
        Files.createDirectories(savePath.getParent());
        file.transferTo(savePath);
        
        // 2. 找到合适的加载器
        DocumentLoader loader = loaders.stream()
                .filter(l -> l.supports(file.getOriginalFilename()))
                .findFirst()
                .orElseThrow(() -> new IllegalArgumentException(
                        "不支持的文件格式:" + file.getOriginalFilename()));

        // 3. 加载并分块
        List<String> chunks = loader.load(savePath);
        log.info("文档 {} 分割为 {} 个块", file.getOriginalFilename(), chunks.size());

        // 4. 构建Document对象(带元数据)
        List<Document> documents = new ArrayList<>();
        for (int i = 0; i < chunks.size(); i++) {
            Map<String, Object> metadata = new HashMap<>();
            metadata.put("source", file.getOriginalFilename());
            metadata.put("chunk_index", i);
            metadata.put("upload_time", System.currentTimeMillis());
            metadata.put("total_chunks", chunks.size());
            
            Document doc = new Document(
                    UUID.randomUUID().toString(),
                    chunks.get(i),
                    metadata
            );
            documents.add(doc);
        }

        // 5. 写入向量库
        vectorStore.add(documents);
        log.info("已写入 {} 个向量到Milvus", documents.size());

        return new UploadResult(
                file.getOriginalFilename(),
                chunks.size(),
                documents.size(),
                true
        );
    }

    /**
     * 批量上传目录
     */
    public BatchUploadResult batchUpload(String directoryPath) throws Exception {
        Path dir = Paths.get(directoryPath);
        File[] files = dir.toFile().listFiles();
        int success = 0;
        int failed = 0;
        List<String> errors = new ArrayList<>();

        for (File file : files) {
            if (file.isFile()) {
                try {
                    MultipartFile mpf = new MockMultipartFile(
                            file.getName(), file.getName(),
                            "application/octet-stream",
                            Files.readAllBytes(file.toPath())
                    );
                    uploadDocument(mpf);
                    success++;
                } catch (Exception e) {
                    failed++;
                    errors.add(file.getName() + ": " + e.getMessage());
                }
            }
        }

        return new BatchUploadResult(success, failed, errors);
    }

    /**
     * 删除文档(按文件名)
     */
    public void deleteBySource(String sourceName) {
        // Milvus支持按元数据过滤删除
        // 这里简化处理,实际可通过ID删除
        log.info("删除文档:{}", sourceName);
    }

    // ========== 内部类 ==========
    public record UploadResult(
            String fileName, 
            int chunks, 
            int vectorsAdded, 
            boolean success
    ) {}

    public record BatchUploadResult(
            int successCount,
            int failedCount,
            List<String> errors
    ) {}
}

六、RAG问答:多路召回与智能重排

6.1 传统RAG的问题

简单向量检索存在两个核心问题:

  1. 语义漂移:向量相似 ≠ 真正相关
  2. 上下文不足:单次检索可能遗漏关键信息

解决方案:多路召回 + 重排序

6.2 多路召回服务

java 复制代码
@Service
@RequiredArgsConstructor
@Slf4j
public class MultiRetrieverService {

    private final VectorStore vectorStore;
    private final EmbeddingModel embeddingModel;
    private final MilvusClient milvusClient;

    /**
     * 多路召回:从多个角度检索相关文档
     */
    public List<RetrievedChunk> multiRetrieve(String query, int topK) {
        List<RetrievedChunk> results = new ArrayList<>();

        // 召回通道1:向量语义检索
        results.addAll(vectorRetrieve(query, topK * 2));

        // 召回通道2:关键词精确匹配(如果安装了BM25插件)
        results.addAll(keywordRetrieve(query, topK));

        // 召回通道3:元数据过滤检索(同一文档的不同块)
        results.addAll(metadataRetrieve(query, topK));

        // 去重
        results = results.stream()
                .distinct()
                .collect(Collectors.toList());

        // 重排序(关键!)
        return rerank(query, results, topK);
    }

    /**
     * 向量语义检索
     */
    private List<RetrievedChunk> vectorRetrieve(String query, int topK) {
        return vectorStore.similaritySearch(
                SearchRequest.builder()
                        .query(query)
                        .topK(topK)
                        .similarityThreshold(0.5)
                        .build()
        ).stream()
                .map(doc -> new RetrievedChunk(
                        doc.getText(),
                        doc.getMetadata(),
                        "vector",
                        0.0  // 分数待rerank填充
                ))
                .collect(Collectors.toList());
    }

    /**
     * 关键词召回(简化版,实际可用Elasticsearch)
     */
    private List<RetrievedChunk> keywordRetrieve(String query, int topK) {
        // 提取关键词
        String[] keywords = query.split("[\\s,。,、]");
        
        return vectorStore.similaritySearch(
                SearchRequest.builder()
                        .query(query)
                        .topK(topK)
                        .build()
        ).stream()
                .filter(doc -> {
                    String text = doc.getText().toLowerCase();
                    return Arrays.stream(keywords)
                            .anyMatch(k -> text.contains(k.toLowerCase()));
                })
                .map(doc -> new RetrievedChunk(
                        doc.getText(),
                        doc.getMetadata(),
                        "keyword",
                        0.0
                ))
                .collect(Collectors.toList());
    }

    /**
     * 元数据召回:同文档多块
     */
    private List<RetrievedChunk> metadataRetrieve(String query, int topK) {
        // 查找与查询相关的前几个文档,然后取这些文档的其他块
        List<org.springframework.ai.document.Document> seedDocs = 
                vectorStore.similaritySearch(
                        SearchRequest.builder()
                                .query(query)
                                .topK(3)
                                .build()
                );

        List<RetrievedChunk> results = new ArrayList<>();
        for (org.springframework.ai.document.Document seedDoc : seedDocs) {
            String source = (String) seedDoc.getMetadata().get("source");
            if (source != null) {
                // 取出同一文档的其他块
                // 实际实现需要自定义查询
                results.add(new RetrievedChunk(
                        seedDoc.getText(),
                        seedDoc.getMetadata(),
                        "metadata",
                        0.0
                ));
            }
        }
        return results;
    }

    /**
     * 重排序:用更强的模型对召回结果排序
     */
    private List<RetrievedChunk> rerank(String query, 
                                         List<RetrievedChunk> candidates, 
                                         int topK) {
        if (candidates.isEmpty()) return candidates;

        // 简化的重排序:计算query与每个chunk的交叉编码得分
        List<RetrievedChunk> scored = candidates.stream()
                .map(chunk -> {
                    double score = calculateRelevance(query, chunk.text());
                    return new RetrievedChunk(
                            chunk.text(),
                            chunk.metadata(),
                            chunk.source(),
                            score
                    );
                })
                .sorted((a, b) -> Double.compare(b.score(), a.score()))
                .collect(Collectors.toList());

        return scored.subList(0, Math.min(topK, scored.size()));
    }

    /**
     * 计算相关性得分(简化版,实际可用Cohere rerank API)
     */
    private double calculateRelevance(String query, String text) {
        String[] queryWords = query.toLowerCase().split("\\s+");
        String[] textWords = text.toLowerCase().split("\\s+");
        
        Set<String> textSet = new HashSet<>(Arrays.asList(textWords));
        long matchCount = Arrays.stream(queryWords)
                .filter(textSet::contains)
                .count();
        
        return (double) matchCount / queryWords.length;
    }

    // ========== 内部类 ==========
    public record RetrievedChunk(
            String text,
            Map<String, Object> metadata,
            String retrievalSource,
            double score
    ) {}
}

6.3 完整的RAG问答服务

java 复制代码
@Service
@RequiredArgsConstructor
@Slf4j
public class RagService {

    private final ChatClient chatClient;
    private final MultiRetrieverService retriever;
    private final DocumentService documentService;

    private static final int DEFAULT_TOP_K = 5;
    private static final int MAX_CONTEXT_TOKENS = 6000;

    /**
     * RAG问答
     */
    public RagAnswer answer(String question) {
        long startTime = System.currentTimeMillis();

        // Step 1: 多路召回
        List<RetrievedChunk> chunks = retriever.multiRetrieve(question, DEFAULT_TOP_K);

        if (chunks.isEmpty()) {
            return new RagAnswer(
                    "抱歉,知识库中未找到相关信息。",
                    Collections.emptyList(),
                    0,
                    System.currentTimeMillis() - startTime
            );
        }

        // Step 2: 构建上下文
        String context = buildContext(chunks);
        List<SourceQuote> sources = buildSources(chunks);

        // Step 3: 构造Prompt
        String prompt = buildPrompt(question, context);

        // Step 4: 调用AI
        String answer;
        try {
            answer = chatClient.prompt()
                    .user(prompt)
                    .call()
                    .content();
        } catch (Exception e) {
            log.error("AI调用失败", e);
            answer = "AI服务暂时不可用,请稍后重试。";
        }

        return new RagAnswer(
                answer,
                sources,
                chunks.size(),
                System.currentTimeMillis() - startTime
        );
    }

    /**
     * 流式RAG(提升用户体验)
     */
    public Flux<String> answerStream(String question) {
        // 先召回
        List<RetrievedChunk> chunks = retriever.multiRetrieve(question, DEFAULT_TOP_K);
        String context = buildContext(chunks);
        String prompt = buildPrompt(question, context);

        // 流式输出
        return chatClient.prompt()
                .user(prompt)
                .stream()
                .content();
    }

    /**
     * 构建上下文(智能截断,避免超出token限制)
     */
    private String buildContext(List<RetrievedChunk> chunks) {
        StringBuilder context = new StringBuilder();
        int totalTokens = 0;

        for (RetrievedChunk chunk : chunks) {
            int chunkTokens = estimateTokens(chunk.text());
            if (totalTokens + chunkTokens > MAX_CONTEXT_TOKENS) {
                break;
            }
            context.append("--- 来源:")
                   .append(chunk.metadata().getOrDefault("source", "未知"))
                   .append("---\n")
                   .append(chunk.text())
                   .append("\n\n");
            totalTokens += chunkTokens;
        }

        return context.toString();
    }

    /**
     * 构建Prompt(核心!)
     */
    private String buildPrompt(String question, String context) {
        return """
            你是一个专业的企业内部知识问答助手。请根据提供的上下文信息,准确回答用户问题。

            回答规则:
            1. 优先使用上下文中的信息回答,不要编造内容
            2. 如果上下文中没有相关信息,直接告知用户"未找到相关信息"
            3. 回答要简洁、专业、有条理,使用适当的格式(如列表、表格)
            4. 如果涉及政策、制度等重要信息,给出信息来源提示

            请严格按照以下上下文回答问题:
            
            {context}
            
            用户问题:{question}
            
            回答:
            """.replace("{context}", context)
             .replace("{question}", question);
    }

    /**
     * 构建来源引用
     */
    private List<SourceQuote> buildSources(List<RetrievedChunk> chunks) {
        return chunks.stream()
                .map(chunk -> new SourceQuote(
                        (String) chunk.metadata().getOrDefault("source", "未知"),
                        chunk.text().substring(0, Math.min(100, chunk.text().length())) + "...",
                        chunk.retrievalSource()
                ))
                .distinct()
                .collect(Collectors.toList());
    }

    /**
     * 估算token数(简单估算:中文1字≈1.5token,英文1词≈1.3token)
     */
    private int estimateTokens(String text) {
        int chineseChars = (int) text.chars().filter(c -> c > 0x4E00 && c < 0x9FA5).count();
        int otherChars = text.length() - chineseChars;
        return (int) (chineseChars * 1.5 + otherChars / 1.3);
    }

    // ========== 内部类 ==========
    public record RagAnswer(
            String answer,
            List<SourceQuote> sources,
            int chunksRetrieved,
            long timeMs
    ) {}

    public record SourceQuote(
            String source,
            String excerpt,
            String retrievalMethod
    ) {}
}

七、REST API:完整问答系统接口

7.1 问答控制器

java 复制代码
package com.company.kb.controller;

import com.company.kb.service.RagAnswer;
import com.company.kb.service.RagService;
import lombok.RequiredArgsConstructor;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import reactor.core.publisher.Flux;

@RestController
@RequestMapping("/api/kb")
@RequiredArgsConstructor
public class KnowledgeBaseController {

    private final RagService ragService;
    private final DocumentService documentService;

    /**
     * 问答接口
     * POST /api/kb/ask
     */
    @PostMapping("/ask")
    public RagAnswer ask(@RequestBody AskRequest request) {
        if (request.question() == null || request.question().isBlank()) {
            throw new IllegalArgumentException("问题不能为空");
        }
        return ragService.answer(request.question());
    }

    /**
     * 流式问答(适合长回答)
     * POST /api/kb/ask/stream
     */
    @PostMapping(value = "/ask/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> askStream(@RequestBody AskRequest request) {
        return ragService.answerStream(request.question());
    }

    /**
     * 上传文档
     * POST /api/kb/upload
     */
    @PostMapping("/upload")
    public DocumentService.UploadResult upload(
            @RequestParam("file") MultipartFile file) throws Exception {
        return documentService.uploadDocument(file);
    }

    /**
     * 批量上传
     * POST /api/kb/upload/batch
     */
    @PostMapping("/upload/batch")
    public DocumentService.BatchUploadResult uploadBatch(
            @RequestParam("directory") String directory) throws Exception {
        return documentService.batchUpload(directory);
    }

    /**
     * 文档列表
     * GET /api/kb/documents
     */
    @GetMapping("/documents")
    public DocumentListResult listDocuments() {
        // 返回已上传的文档列表
        // 实际实现需要查询Milvus的元数据
        return new DocumentListResult(Collections.emptyList(), 0);
    }

    // ========== 内部类 ==========
    public record AskRequest(String question) {}

    public record DocumentListResult(
            List<DocumentInfo> documents,
            int totalChunks
    ) {}

    public record DocumentInfo(
            String fileName,
            int chunks,
            long uploadTime
    ) {}
}

7.2 前端调用示例

javascript 复制代码
// 问答
async function askQuestion() {
    const question = document.getElementById('question').value;
    
    const response = await fetch('/api/kb/ask', {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({question})
    });
    
    const result = await response.json();
    
    // 显示答案
    document.getElementById('answer').innerText = result.answer;
    
    // 显示来源
    const sourcesHtml = result.sources.map(s => 
        `<div class="source">
            <strong>来源:${s.source}</strong>
            <p>${s.excerpt}</p>
        </div>`
    ).join('');
    document.getElementById('sources').innerHTML = sourcesHtml;
}

// 上传文档
async function uploadDocument(file) {
    const formData = new FormData();
    formData.append('file', file);
    
    const response = await fetch('/api/kb/upload', {
        method: 'POST',
        body: formData
    });
    
    const result = await response.json();
    console.log('上传成功:', result);
}

八、生产级优化

8.1 性能优化

java 复制代码
// 1. 异步文档处理
@Async
public CompletableFuture<DocumentService.UploadResult> uploadDocumentAsync(
        MultipartFile file) {
    return CompletableFuture.completedFuture(uploadDocument(file));
}

// 2. 批量嵌入(减少API调用)
public void batchEmbed(List<String> texts) {
    // 使用批处理API,一次提交多个文本
}

// 3. 缓存嵌入结果
@Bean
public CacheManager cacheManager() {
    return new ConcurrentHashMapCacheManager("embeddings");
}

@Cacheable(value = "embeddings", key = "#text.hashCode()")
public List<Double> embedWithCache(String text) {
    return embeddingModel.embed(text);
}

8.2 安全防护

java 复制代码
@Configuration
public class SecurityConfig {

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        return http
                .csrf(AbstractHttpConfigurer::disable)
                .authorizeHttpRequests(auth -> auth
                        // 对外接口需要认证
                        .requestMatchers("/api/kb/ask/**").authenticated()
                        // 管理接口需要管理员权限
                        .requestMatchers("/api/kb/upload/**").hasRole("ADMIN")
                        .anyRequest().permitAll()
                )
                .httpBasic(Customizer.withDefaults())
                .build();
    }
}

8.3 监控告警

java 复制代码
// Prometheus + Micrometer埋点
@Bean
public MeterRegistry meterRegistry() {
    return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
}

// 关键指标
private void recordMetrics(RagAnswer answer) {
    meterRegistry.counter("kb.questions.total").increment();
    meterRegistry.summary("kb.questions.duration", answer.timeMs())
            .record(answer.timeMs());
    meterRegistry.summary("kb.chunks.retrieved", answer.chunksRetrieved())
            .record(answer.chunksRetrieved());
}

九、效果测试

9.1 测试用例

bash 复制代码
# 1. 上传知识文档
curl -X POST -F "file=@./docs/年假制度.pdf" http://localhost:8080/api/kb/upload

# 2. 基础问答
curl -X POST -H "Content-Type: application/json" \
     -d '{"question":"我的年假有多少天?"}' \
     http://localhost:8080/api/kb/ask

# 3. 流式问答
curl -X POST -H "Content-Type: application/json" \
     -d '{"question":"请详细介绍一下公司的请假流程"}' \
     http://localhost:8080/api/kb/ask/stream

9.2 效果评估指标

指标 说明 目标值
召回率 相关文档被检出的比例 >90%
准确率 检出文档确实相关的比例 >85%
响应时间 用户提问到收到答案 <3s
Token消耗 每千次问答的API费用 <$5

十、总结与扩展

10.1 本文核心知识点

复制代码
RAG系统 = 文档处理 + 向量嵌入 + 语义检索 + AI生成

关键技术点:
├── 文档加载:PDF/Word/Markdown多格式支持
├── 智能分块:递归分块 + 语义边界
├── 向量存储:Milvus分布式部署
├── 多路召回:向量 + 关键词 + 元数据
├── 重排序:交叉编码提升相关性
├── Prompt工程:上下文构建 + 回答规则
└── 流式输出:SSE实时推送

10.2 进阶方向

方向 内容
PDF OCR 用OCR处理扫描件文档
多语言 支持英文文档和跨语言检索
知识图谱 构建实体关系,增强推理能力
主动学习 用户反馈自动优化检索效果
Agent 支持多轮对话和工具调用

🚀 如果有帮助,欢迎点赞、收藏!有任何问题评论区见!

相关推荐
IT当时语_青山师__JAVA技术栈1 小时前
动态代理深度解析:JDK与CGLIB底层实现与实战
java·后端·面试
SamDeepThinking1 小时前
别人写的代码看不懂,到底是谁的水平有问题
java·后端·程序员
Nyarlathotep01131 小时前
类加载机制(3):类加载器
jvm·后端
叼烟扛炮1 小时前
C++第五讲:内存管理
c++·算法·面试·内存管理
Tisfy1 小时前
LeetCode 3629.通过质数传送到达终点的最少跳跃次数:埃式筛+BFS
算法·leetcode·宽度优先·质数·埃式筛
Hello.Reader1 小时前
算法基础(九)——循环不变式如何证明一个算法是正确的
java·开发语言·算法
苏三说技术1 小时前
小米二面:Redis为什么能支撑10万+ QPS?
后端
wuweijianlove2 小时前
算法稳定性分析中的输入扰动建模的技术7
算法
MATLAB代码顾问2 小时前
粒子群优化算法(PSO)原理与Python高级实现
开发语言·python·算法