文章目录
- 大模型入门学习路径(Java开发者版)
-
- 第一阶段:基础理论与首次实践(1-2周)
-
- [1.1 核心概念深入理解](#1.1 核心概念深入理解)
-
- Transformer架构
- 预训练与微调
- [Prompt Engineering核心技巧](#Prompt Engineering核心技巧)
- Token与嵌入机制
- [1.2 首次API调用实践](#1.2 首次API调用实践)
- [1.3 阶段性任务清单](#1.3 阶段性任务清单)
- 第二阶段:企业级API应用开发(2-3周)
- 第三阶段:向量数据库与RAG架构(2周)
-
- [3.1 向量嵌入原理深入](#3.1 向量嵌入原理深入)
- [3.2 向量数据库集成](#3.2 向量数据库集成)
- [3.3 RAG完整实现](#3.3 RAG完整实现)
- [3.4 高级优化技巧](#3.4 高级优化技巧)
-
- [1. 查询扩展(Query Expansion)](#1. 查询扩展(Query Expansion))
- [2. Rerank(重排序)](#2. Rerank(重排序))
- [3. 上下文压缩](#3. 上下文压缩)
- [3.5 实战项目:企业知识库问答系统](#3.5 实战项目:企业知识库问答系统)
- [3.6 阶段性任务清单](#3.6 阶段性任务清单)
大模型入门学习路径(Java开发者版)
第一阶段:基础理论与首次实践(1-2周)
1.1 核心概念深入理解
Transformer架构
- 自注意力机制(Self-Attention)
- Query、Key、Value矩阵计算
- 多头注意力(Multi-Head Attention)
- 位置编码(Positional Encoding)
- 编码器-解码器结构
- BERT(仅编码器):理解类任务
- GPT(仅解码器):生成类任务
- T5(完整结构):通用任务
学习资源:
- 论文:《Attention is All You Need》
- 视频:3Blue1Brown的Transformer可视化讲解
- 代码:手写简化版Attention机制(Python/Java)
预训练与微调
java
// 理解预训练模型的加载流程
public class ModelUnderstanding {
/*
* 预训练阶段:
* 1. 大规模无标注数据(如CommonCrawl、维基百科)
* 2. 自监督学习任务(如MLM、NSP、CLM)
* 3. 学习通用语言表示
*
* 微调阶段:
* 1. 特定领域数据(如客服对话、法律文档)
* 2. 监督学习任务
* 3. 适配具体业务场景
*/
}
Prompt Engineering核心技巧
1. Zero-Shot提示
java
String prompt = """
将以下Java代码转换为Python:
public int add(int a, int b) {
return a + b;
}
""";
2. Few-Shot提示
java
String fewShotPrompt = """
任务:提取文本中的关键信息
示例1:
输入:订单号12345,客户张三,金额500元
输出:{"orderId": "12345", "customer": "张三", "amount": 500}
示例2:
输入:用户李四购买商品A,花费200元
输出:{"orderId": null, "customer": "李四", "amount": 200}
现在处理:王五在订单67890中支付了1000元
输出:
""";
3. Chain-of-Thought(思维链)
java
String cotPrompt = """
问题:一个班级有30名学生,其中60%是女生,女生中有40%戴眼镜。
请问戴眼镜的女生有多少人?
让我们一步步思考:
1. 首先计算女生人数
2. 然后计算戴眼镜的女生人数
3. 给出最终答案
""";
Token与嵌入机制
java
// 使用tiktoken Java库理解Token化
public class TokenDemo {
public static void main(String[] args) {
String text = "Hello, 世界!";
// 1. Tokenization(分词)
// "Hello" -> [15496]
// "," -> [11]
// "世界" -> [99489, 228]
// 2. Embedding(嵌入)
// 每个Token -> 1536维向量(以text-embedding-ada-002为例)
// 3. 上下文窗口限制
// GPT-3.5: 4096 tokens
// GPT-4: 8192/32768 tokens
}
}
1.2 首次API调用实践
项目结构
llm-demo/
├── pom.xml
├── src/main/java/
│ └── com/example/llm/
│ ├── LlmApplication.java
│ ├── config/OpenAiConfig.java
│ ├── service/ChatService.java
│ └── controller/ChatController.java
└── src/main/resources/
└── application.yml
Maven依赖配置
xml
<!-- pom.xml -->
<dependencies>
<!-- Spring Boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>3.2.0</version>
</dependency>
<!-- OpenAI Java SDK -->
<dependency>
<groupId>com.theokanning.openai-gpt3-java</groupId>
<artifactId>service</artifactId>
<version>0.18.2</version>
</dependency>
<!-- 或使用国内SDK -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>dashscope-sdk-java</artifactId>
<version>2.12.0</version>
</dependency>
<!-- Lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<scope>provided</scope>
</dependency>
</dependencies>
完整代码实现
java
// 1. 配置类
@Configuration
public class OpenAiConfig {
@Value("${openai.api.key}")
private String apiKey;
@Bean
public OpenAiService openAiService() {
return new OpenAiService(apiKey, Duration.ofSeconds(60));
}
}
// 2. 服务层
@Service
@Slf4j
public class ChatService {
@Autowired
private OpenAiService openAiService;
/**
* 基础对话
*/
public String chat(String userMessage) {
ChatCompletionRequest request = ChatCompletionRequest.builder()
.model("gpt-3.5-turbo")
.messages(List.of(
new ChatMessage("system", "你是一个专业的Java编程助手"),
new ChatMessage("user", userMessage)
))
.temperature(0.7)
.maxTokens(1000)
.build();
ChatCompletionResult result = openAiService.createChatCompletion(request);
return result.getChoices().get(0).getMessage().getContent();
}
/**
* 流式响应
*/
public void chatStream(String userMessage, Consumer<String> callback) {
ChatCompletionRequest request = ChatCompletionRequest.builder()
.model("gpt-3.5-turbo")
.messages(List.of(new ChatMessage("user", userMessage)))
.stream(true)
.build();
openAiService.streamChatCompletion(request)
.doOnError(error -> log.error("Stream error", error))
.blockingForEach(chunk -> {
String content = chunk.getChoices().get(0).getMessage().getContent();
if (content != null) {
callback.accept(content);
}
});
}
/**
* 上下文管理(多轮对话)
*/
public String chatWithContext(List<ChatMessage> history, String newMessage) {
history.add(new ChatMessage("user", newMessage));
ChatCompletionRequest request = ChatCompletionRequest.builder()
.model("gpt-3.5-turbo")
.messages(history)
.build();
ChatCompletionResult result = openAiService.createChatCompletion(request);
String assistantReply = result.getChoices().get(0).getMessage().getContent();
// 保存助手回复到历史
history.add(new ChatMessage("assistant", assistantReply));
return assistantReply;
}
}
// 3. 控制器
@RestController
@RequestMapping("/api/chat")
public class ChatController {
@Autowired
private ChatService chatService;
// 存储会话历史(生产环境应使用Redis)
private Map<String, List<ChatMessage>> sessions = new ConcurrentHashMap<>();
@PostMapping("/simple")
public ResponseEntity<String> simpleChat(@RequestBody ChatRequest request) {
String response = chatService.chat(request.getMessage());
return ResponseEntity.ok(response);
}
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter streamChat(@RequestParam String message) {
SseEmitter emitter = new SseEmitter();
CompletableFuture.runAsync(() -> {
try {
chatService.chatStream(message, chunk -> {
try {
emitter.send(SseEmitter.event().data(chunk));
} catch (IOException e) {
emitter.completeWithError(e);
}
});
emitter.complete();
} catch (Exception e) {
emitter.completeWithError(e);
}
});
return emitter;
}
@PostMapping("/context")
public ResponseEntity<ChatResponse> contextChat(
@RequestParam String sessionId,
@RequestBody ChatRequest request) {
List<ChatMessage> history = sessions.computeIfAbsent(
sessionId, k -> new ArrayList<>()
);
String response = chatService.chatWithContext(history, request.getMessage());
return ResponseEntity.ok(new ChatResponse(response, history.size()));
}
}
配置文件
yaml
# application.yml
spring:
application:
name: llm-demo
openai:
api:
key: ${OPENAI_API_KEY} # 环境变量配置
# 或使用通义千问
# base-url: https://dashscope.aliyuncs.com/api/v1
server:
port: 8080
1.3 阶段性任务清单
- 阅读Transformer论文并手写注意力机制伪代码
- 完成OpenAI API调用的三种模式(同步、流式、上下文)
- 对比测试GPT-3.5和通义千问的响应效果
- 编写10个不同场景的Prompt模板(代码生成、文本分类、信息提取等)
- 实现Token计数和成本估算功能
第二阶段:企业级API应用开发(2-3周)
2.1 多模型适配层设计
统一接口抽象
java
// 1. 定义统一的LLM接口
public interface LlmService {
String chat(String message);
void streamChat(String message, Consumer<String> callback);
List<Float> embedding(String text);
}
// 2. OpenAI实现
@Service("openai")
public class OpenAiServiceImpl implements LlmService {
@Autowired
private OpenAiService openAiService;
@Override
public String chat(String message) {
// OpenAI特定实现
}
}
// 3. 通义千问实现
@Service("qwen")
public class QwenServiceImpl implements LlmService {
@Autowired
private Generation generation;
@Override
public String chat(String message) {
GenerationParam param = GenerationParam.builder()
.model("qwen-turbo")
.prompt(message)
.build();
GenerationResult result = generation.call(param);
return result.getOutput().getText();
}
}
// 4. 策略模式切换
@Service
public class LlmStrategyService {
@Autowired
private Map<String, LlmService> llmServices;
public String chat(String provider, String message) {
LlmService service = llmServices.get(provider);
if (service == null) {
throw new IllegalArgumentException("Unsupported provider: " + provider);
}
return service.chat(message);
}
}
2.2 实战项目一:智能客服系统
系统架构
前端(Vue3) <--WebSocket--> 后端(Spring Boot)
|
+---------------+---------------+
| | |
LLM API 知识库API 工单系统
| | |
OpenAI Elasticsearch MySQL
核心功能实现
1. 意图识别
java
@Service
public class IntentRecognitionService {
@Autowired
private LlmService llmService;
public Intent recognizeIntent(String userMessage) {
String prompt = String.format("""
请分析以下客户咨询的意图,从以下类别中选择:
- 退款申请
- 物流查询
- 产品咨询
- 投诉建议
- 其他
客户消息:%s
请只返回JSON格式:{"intent": "xxx", "confidence": 0.95}
""", userMessage);
String response = llmService.chat(prompt);
return parseIntent(response);
}
}
2. 知识库检索增强
java
@Service
public class CustomerServiceBot {
@Autowired
private ElasticsearchClient esClient;
@Autowired
private LlmService llmService;
public String answer(String question) {
// Step 1: 从ES检索相关知识
List<KnowledgeDoc> docs = searchKnowledge(question);
// Step 2: 构建增强Prompt
String context = docs.stream()
.map(KnowledgeDoc::getContent)
.collect(Collectors.joining("\n\n"));
String prompt = String.format("""
你是一个专业的客服助手。请根据以下知识库内容回答用户问题。
知识库:
%s
用户问题:%s
回答要求:
1. 如果知识库中有明确答案,直接引用
2. 如果没有相关信息,礼貌告知用户转人工客服
3. 保持专业、友好的语气
""", context, question);
return llmService.chat(prompt);
}
private List<KnowledgeDoc> searchKnowledge(String query) {
// Elasticsearch BM25检索
SearchRequest request = SearchRequest.of(s -> s
.index("knowledge_base")
.query(q -> q
.match(m -> m
.field("content")
.query(query)
)
)
.size(3)
);
// 执行检索并返回结果
// ...
}
}
3. WebSocket实时通信
java
@Configuration
@EnableWebSocket
public class WebSocketConfig implements WebSocketConfigurer {
@Override
public void registerWebSocketHandlers(WebSocketHandlerRegistry registry) {
registry.addHandler(new ChatWebSocketHandler(), "/ws/chat")
.setAllowedOrigins("*");
}
}
@Component
public class ChatWebSocketHandler extends TextWebSocketHandler {
@Autowired
private CustomerServiceBot bot;
@Override
protected void handleTextMessage(WebSocketSession session, TextMessage message) {
String userMessage = message.getPayload();
// 流式返回
bot.streamAnswer(userMessage, chunk -> {
try {
session.sendMessage(new TextMessage(chunk));
} catch (IOException e) {
log.error("Failed to send message", e);
}
});
}
}
4. 会话管理
java
@Service
public class SessionManager {
@Autowired
private RedisTemplate<String, Object> redisTemplate;
private static final String SESSION_PREFIX = "chat:session:";
private static final Duration SESSION_TTL = Duration.ofHours(1);
public void saveMessage(String sessionId, ChatMessage message) {
String key = SESSION_PREFIX + sessionId;
redisTemplate.opsForList().rightPush(key, message);
redisTemplate.expire(key, SESSION_TTL);
}
public List<ChatMessage> getHistory(String sessionId) {
String key = SESSION_PREFIX + sessionId;
List<Object> objects = redisTemplate.opsForList().range(key, 0, -1);
return objects.stream()
.map(obj -> (ChatMessage) obj)
.collect(Collectors.toList());
}
public void clearSession(String sessionId) {
redisTemplate.delete(SESSION_PREFIX + sessionId);
}
}
2.3 实战项目二:代码生成工具
java
@Service
public class CodeGeneratorService {
@Autowired
private LlmService llmService;
/**
* 根据需求生成完整的CRUD代码
*/
public GeneratedCode generateCrud(CrudRequest request) {
String prompt = String.format("""
请为以下数据库表生成完整的Spring Boot CRUD代码:
表名:%s
字段:%s
要求生成:
1. Entity类(使用JPA注解)
2. Repository接口
3. Service类
4. Controller类(RESTful API)
代码规范:
- 使用Lombok注解
- 遵循阿里巴巴Java开发手册
- 添加必要的注释
- 包含参数校验
""", request.getTableName(), request.getFields());
String response = llmService.chat(prompt);
return parseGeneratedCode(response);
}
/**
* 单元测试生成
*/
public String generateUnitTest(String sourceCode) {
String prompt = String.format("""
为以下Java方法生成JUnit 5单元测试:
```java
%s
```
要求:
1. 覆盖正常场景和边界场景
2. 使用Mockito模拟依赖
3. 使用AssertJ断言
4. 添加测试说明注释
""", sourceCode);
return llmService.chat(prompt);
}
/**
* 代码审查
*/
public CodeReviewResult reviewCode(String code) {
String prompt = String.format("""
请审查以下Java代码,关注:
1. 潜在的空指针问题
2. 性能优化点
3. 安全隐患
4. 代码规范问题
```java
%s
```
返回JSON格式:
{
"issues": [
{"level": "error|warning|info", "line": 10, "message": "..."},
...
],
"suggestions": ["..."],
"score": 85
}
""", code);
String response = llmService.chat(prompt);
return JSON.parseObject(response, CodeReviewResult.class);
}
}
2.4 成本与性能优化
1. 请求缓存
java
@Service
public class CachedLlmService implements LlmService {
@Autowired
private LlmService delegateService;
@Autowired
private RedisTemplate<String, String> redisTemplate;
@Override
public String chat(String message) {
String cacheKey = "llm:cache:" + DigestUtils.md5Hex(message);
// 尝试从缓存获取
String cached = redisTemplate.opsForValue().get(cacheKey);
if (cached != null) {
return cached;
}
// 调用实际服务
String response = delegateService.chat(message);
// 缓存结果(24小时)
redisTemplate.opsForValue().set(cacheKey, response, Duration.ofHours(24));
return response;
}
}
2. 请求限流
java
@Aspect
@Component
public class RateLimitAspect {
@Autowired
private RedisTemplate<String, Long> redisTemplate;
@Around("@annotation(rateLimit)")
public Object around(ProceedingJoinPoint point, RateLimit rateLimit) throws Throwable {
String key = "rate:limit:" + getClientId();
Long count = redisTemplate.opsForValue().increment(key);
if (count == 1) {
redisTemplate.expire(key, Duration.ofMinutes(1));
}
if (count > rateLimit.maxRequests()) {
throw new RateLimitException("请求过于频繁,请稍后再试");
}
return point.proceed();
}
}
// 使用示例
@RateLimit(maxRequests = 60) // 每分钟最多60次
public String chat(String message) {
// ...
}
3. Token控制
java
@Service
public class TokenOptimizationService {
/**
* 智能截断长文本
*/
public String truncateText(String text, int maxTokens) {
Encoding encoding = Encodings.newDefaultEncodingRegistry().getEncoding(EncodingType.CL100K_BASE);
List<Integer> tokens = encoding.encode(text);
if (tokens.size() <= maxTokens) {
return text;
}
// 截取前N个token并解码
List<Integer> truncated = tokens.subList(0, maxTokens);
return encoding.decode(truncated);
}
/**
* 成本估算
*/
public BigDecimal estimateCost(String prompt, String model) {
int tokenCount = countTokens(prompt);
// 价格表(美元/1K tokens)
Map<String, BigDecimal> prices = Map.of(
"gpt-3.5-turbo", new BigDecimal("0.002"),
"gpt-4", new BigDecimal("0.03"),
"qwen-turbo", new BigDecimal("0.0008")
);
BigDecimal pricePerToken = prices.get(model).divide(new BigDecimal("1000"), 6, RoundingMode.HALF_UP);
return pricePerToken.multiply(new BigDecimal(tokenCount));
}
}
2.5 阶段性任务清单
- 实现至少3个LLM Provider的适配(OpenAI、Qwen、ChatGLM)
- 完成智能客服系统的意图识别和知识库检索
- 实现WebSocket实时对话功能
- 开发代码生成工具(Entity、CRUD、单元测试)
- 集成Redis缓存和限流机制
- 编写性能测试报告(响应时间、Token消耗、成本分析)
第三阶段:向量数据库与RAG架构(2周)
3.1 向量嵌入原理深入
Embedding模型原理
java
/**
* 向量嵌入原理示意:
*
* 文本: "苹果是水果" -> Embedding模型 -> [0.12, -0.34, 0.56, ...](1536维)
* 文本: "香蕉是水果" -> Embedding模型 -> [0.15, -0.31, 0.58, ...](1536维)
*
* 余弦相似度:cos_sim = 0.92 (语义相似)
*
* 应用场景:
* - 语义搜索:找到与查询语义相似的文档
* - 聊天机器人:匹配最相关的知识
* - 推荐系统:找到相似的商品/文章
*/
@Service
public class EmbeddingService {
@Autowired
private OpenAiService openAiService;
/**
* 生成文本向量
*/
public List<Float> embed(String text) {
EmbeddingRequest request = EmbeddingRequest.builder()
.model("text-embedding-ada-002") // 1536维,$0.0001/1K tokens
.input(List.of(text))
.build();
EmbeddingResult result = openAiService.createEmbeddings(request);
return result.getData().get(0).getEmbedding();
}
/**
* 批量嵌入(优化性能)
*/
public List<List<Float>> batchEmbed(List<String> texts) {
EmbeddingRequest request = EmbeddingRequest.builder()
.model("text-embedding-ada-002")
.input(texts)
.build();
EmbeddingResult result = openAiService.createEmbeddings(request);
return result.getData().stream()
.map(Embedding::getEmbedding)
.collect(Collectors.toList());
}
/**
* 计算余弦相似度
*/
public double cosineSimilarity(List<Float> vec1, List<Float> vec2) {
double dotProduct = 0.0;
double norm1 = 0.0;
double norm2 = 0.0;
for (int i = 0; i < vec1.size(); i++) {
dotProduct += vec1.get(i) * vec2.get(i);
norm1 += Math.pow(vec1.get(i), 2);
norm2 += Math.pow(vec2.get(i), 2);
}
return dotProduct / (Math.sqrt(norm1) * Math.sqrt(norm2));
}
}
3.2 向量数据库集成
Milvus实战案例
java
// 1. Maven依赖
/*
<dependency>
<groupId>io.milvus</groupId>
<artifactId>milvus-sdk-java</artifactId>
<version>2.3.4</version>
</dependency>
*/
// 2. Milvus配置
@Configuration
public class MilvusConfig {
@Bean
public MilvusServiceClient milvusClient() {
return new MilvusServiceClient(
ConnectParam.newBuilder()
.withHost("localhost")
.withPort(19530)
.build()
);
}
}
// 3. 向量存储服务
@Service
public class VectorStoreService {
@Autowired
private MilvusServiceClient milvusClient;
@Autowired
private EmbeddingService embeddingService;
private static final String COLLECTION_NAME = "knowledge_base";
/**
* 创建集合(类似建表)
*/
@PostConstruct
public void createCollection() {
FieldType id = FieldType.newBuilder()
.withName("id")
.withDataType(DataType.Int64)
.withPrimaryKey(true)
.withAutoID(true)
.build();
FieldType text = FieldType.newBuilder()
.withName("text")
.withDataType(DataType.VarChar)
.withMaxLength(65535)
.build();
FieldType vector = FieldType.newBuilder()
.withName("embedding")
.withDataType(DataType.FloatVector)
.withDimension(1536) // text-embedding-ada-002的维度
.build();
CreateCollectionParam param = CreateCollectionParam.newBuilder()
.withCollectionName(COLLECTION_NAME)
.addFieldType(id)
.addFieldType(text)
.addFieldType(vector)
.build();
milvusClient.createCollection(param);
// 创建索引(加速检索)
CreateIndexParam indexParam = CreateIndexParam.newBuilder()
.withCollectionName(COLLECTION_NAME)
.withFieldName("embedding")
.withIndexType(IndexType.IVF_FLAT) // 或HNSW
.withMetricType(MetricType.COSINE) // 余弦相似度
.withExtraParam("{\"nlist\": 128}")
.build();
milvusClient.createIndex(indexParam);
}
/**
* 插入文档
*/
public void insertDocument(String text) {
// 1. 生成向量
List<Float> embedding = embeddingService.embed(text);
// 2. 构建插入数据
List<InsertParam.Field> fields = Arrays.asList(
new InsertParam.Field("text", List.of(text)),
new InsertParam.Field("embedding", List.of(embedding))
);
InsertParam param = InsertParam.newBuilder()
.withCollectionName(COLLECTION_NAME)
.withFields(fields)
.build();
milvusClient.insert(param);
}
/**
* 批量插入(高效)
*/
public void batchInsert(List<String> texts) {
List<List<Float>> embeddings = embeddingService.batchEmbed(texts);
List<InsertParam.Field> fields = Arrays.asList(
new InsertParam.Field("text", texts),
new InsertParam.Field("embedding", embeddings)
);
InsertParam param = InsertParam.newBuilder()
.withCollectionName(COLLECTION_NAME)
.withFields(fields)
.build();
milvusClient.insert(param);
}
/**
* 语义检索
*/
public List<SearchResult> search(String query, int topK) {
// 1. 将查询转为向量
List<Float> queryVector = embeddingService.embed(query);
// 2. 向量检索
SearchParam param = SearchParam.newBuilder()
.withCollectionName(COLLECTION_NAME)
.withMetricType(MetricType.COSINE)
.withTopK(topK)
.withVectors(List.of(queryVector))
.withVectorFieldName("embedding")
.withOutFields(List.of("text")) // 返回原文
.build();
SearchResults results = milvusClient.search(param);
// 3. 解析结果
return results.getResults().get(0).stream()
.map(hit -> new SearchResult(
hit.getFieldValues().get("text").toString(),
hit.getScore() // 相似度分数
))
.collect(Collectors.toList());
}
}
@Data
@AllArgsConstructor
class SearchResult {
private String text;
private float score;
}
3.3 RAG完整实现
文档处理流程
java
@Service
public class DocumentProcessor {
/**
* 文档切分策略
*/
public List<String> splitDocument(String document) {
// 方法1:固定长度切分(简单但不智能)
return fixedLengthSplit(document, 500, 50); // 500字符,重叠50字符
// 方法2:语义切分(按段落/句子)
// return semanticSplit(document);
}
private List<String> fixedLengthSplit(String text, int chunkSize, int overlap) {
List<String> chunks = new ArrayList<>();
int start = 0;
while (start < text.length()) {
int end = Math.min(start + chunkSize, text.length());
chunks.add(text.substring(start, end));
start += (chunkSize - overlap);
}
return chunks;
}
/**
* 文档入库(完整流程)
*/
@Autowired
private VectorStoreService vectorStore;
public void ingestDocument(String filePath) throws IOException {
// 1. 读取文档
String content = Files.readString(Paths.get(filePath));
// 2. 切分文档
List<String> chunks = splitDocument(content);
// 3. 批量入库
vectorStore.batchInsert(chunks);
log.info("Document ingested: {} chunks", chunks.size());
}
}
RAG问答系统
java
@Service
public class RagService {
@Autowired
private VectorStoreService vectorStore;
@Autowired
private LlmService llmService;
/**
* RAG问答(核心流程)
*/
public RagResponse answer(String question) {
// Step 1: 向量检索(Retrieval)
List<SearchResult> searchResults = vectorStore.search(question, 3);
// Step 2: 过滤低分结果
List<String> relevantDocs = searchResults.stream()
.filter(r -> r.getScore() > 0.7) // 相似度阈值
.map(SearchResult::getText)
.collect(Collectors.toList());
if (relevantDocs.isEmpty()) {
return new RagResponse(
"抱歉,我在知识库中没有找到相关信息。",
Collections.emptyList(),
0.0
);
}
// Step 3: 构建增强Prompt(Augmented)
String context = String.join("\n\n---\n\n", relevantDocs);
String prompt = buildRagPrompt(question, context);
// Step 4: LLM生成答案(Generation)
String answer = llmService.chat(prompt);
return new RagResponse(
answer,
searchResults,
searchResults.get(0).getScore()
);
}
private String buildRagPrompt(String question, String context) {
return String.format("""
你是一个专业的知识助手。请基于以下知识库内容回答用户问题。
知识库内容:
%s
用户问题:%s
回答要求:
1. 只使用知识库中的信息回答,不要编造
2. 如果知识库中没有相关信息,明确说明
3. 回答要简洁明了,带有必要的细节
4. 如果适用,可以注明信息来源
""", context, question);
}
/**
* 混合检索(向量 + 关键词)
*/
public RagResponse hybridAnswer(String question) {
// 1. 向量检索
List<SearchResult> vectorResults = vectorStore.search(question, 5);
// 2. BM25关键词检索(通过Elasticsearch)
List<SearchResult> keywordResults = keywordSearch(question, 5);
// 3. 结果融合(RRF - Reciprocal Rank Fusion)
List<SearchResult> mergedResults = mergeResults(vectorResults, keywordResults);
// 4. 生成答案
String context = mergedResults.stream()
.limit(3)
.map(SearchResult::getText)
.collect(Collectors.joining("\n\n---\n\n"));
String answer = llmService.chat(buildRagPrompt(question, context));
return new RagResponse(answer, mergedResults, mergedResults.get(0).getScore());
}
}
@Data
@AllArgsConstructor
class RagResponse {
private String answer;
private List<SearchResult> sources;
private double confidence;
}
3.4 高级优化技巧
1. 查询扩展(Query Expansion)
java
@Service
public class QueryExpansionService {
@Autowired
private LlmService llmService;
/**
* 生成多个相似查询
*/
public List<String> expandQuery(String originalQuery) {
String prompt = String.format("""
请为以下查询生成 3 个语义相似但表达不同的问题:
原始查询:%s
要求:
1. 保持相同的查询意图
2. 使用不同的措述方式
3. 每行一个问题
""", originalQuery);
String response = llmService.chat(prompt);
return Arrays.asList(response.split("\n"));
}
}
2. Rerank(重排序)
java
@Service
public class RerankService {
@Autowired
private LlmService llmService;
/**
* 使用LLM对检索结果重排序
*/
public List<SearchResult> rerank(String query, List<SearchResult> results) {
String prompt = String.format("""
给定查询和文档列表,请按相关度排序(1-10分):
查询:%s
文档:
%s
返回JSON格式:[{"index": 0, "score": 9.5}, ...]
""",
query,
formatDocuments(results)
);
String response = llmService.chat(prompt);
List<RankScore> scores = parseRankScores(response);
// 按新分数重排
return scores.stream()
.sorted(Comparator.comparing(RankScore::getScore).reversed())
.map(rs -> results.get(rs.getIndex()))
.collect(Collectors.toList());
}
}
3. 上下文压缩
java
@Service
public class ContextCompressionService {
@Autowired
private LlmService llmService;
/**
* 提取文档中与问题相关的关键部分
*/
public String compressContext(String question, String longContext) {
String prompt = String.format("""
从以下长文档中提取与问题相关的关键信息:
问题:%s
文档:
%s
请只返回相关的关键句子,去除无关内容。
""", question, longContext);
return llmService.chat(prompt);
}
}
3.5 实战项目:企业知识库问答系统
完整架构
java
@RestController
@RequestMapping("/api/knowledge")
public class KnowledgeController {
@Autowired
private DocumentProcessor documentProcessor;
@Autowired
private RagService ragService;
/**
* 上传文档
*/
@PostMapping("/upload")
public ResponseEntity<String> uploadDocument(@RequestParam("file") MultipartFile file) {
try {
// 1. 保存文件
String filePath = saveFile(file);
// 2. 处理并入库
documentProcessor.ingestDocument(filePath);
return ResponseEntity.ok("文档已成功入库");
} catch (Exception e) {
return ResponseEntity.status(500).body("处理失败: " + e.getMessage());
}
}
/**
* 问答接口
*/
@PostMapping("/ask")
public ResponseEntity<RagResponse> ask(@RequestBody QuestionRequest request) {
RagResponse response = ragService.answer(request.getQuestion());
return ResponseEntity.ok(response);
}
/**
* 流式问答
*/
@GetMapping(value = "/ask-stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter askStream(@RequestParam String question) {
SseEmitter emitter = new SseEmitter();
CompletableFuture.runAsync(() -> {
try {
// 1. 检索相关文档
List<SearchResult> docs = vectorStore.search(question, 3);
emitter.send(SseEmitter.event().name("sources").data(docs));
// 2. 流式生成答案
String context = docs.stream()
.map(SearchResult::getText)
.collect(Collectors.joining("\n\n"));
llmService.streamChat(
ragService.buildRagPrompt(question, context),
chunk -> {
try {
emitter.send(SseEmitter.event().name("answer").data(chunk));
} catch (IOException e) {
emitter.completeWithError(e);
}
}
);
emitter.complete();
} catch (Exception e) {
emitter.completeWithError(e);
}
});
return emitter;
}
}
3.6 阶段性任务清单
- 理解向量嵌入原理,手写余弦相似度计算
- 搭建Milvus本地环境(Docker)
- 实现文档切分、向量化、入库完整流程
- 开发RAG问答系统(检索+生成)
- 对比普通检索、向量检索、混合检索的效果
- 实现查询扩展和Rerank优化
- 完成企业知识库项目(含文档上传、问答、源引用)