工作流程
大语言模型 重排序器 Milvus Elasticsearch 应用程序 用户 大语言模型 重排序器 Milvus Elasticsearch 应用程序 用户 1. 数据索引阶段 2. 用户查询与检索阶段 par 并行检索 3. 融合与重排序 4. 生成回答 文档切块(Chunking) 调用 Embedding 模型 存储文本块与嵌入向量 存储文本块与嵌入向量 输入问题(Query) 调用 Embedding 模型 关键词检索 返回相关文本块 (BM25) 向量检索 返回相关文本块 (语义) 融合结果 (RRF/加权) 调用 Reranker 模型 返回重排序后的Top-K结果 构建提示词(Prompt) + 上下文 生成最终回答 返回回答
ollam+llm
[安装参考](https://blog.csdn.net/wenwang3000/article/details/145705858)这里需要2种类型的
milvus安装
参考:https://milvus.io/docs/zh 线上环境采用集群版本,这里只做演示配 配置轻量版本 Milvus Lite
shell
#安装 pymilvus 客户端
pip install pymilvus
#安装 milvus 服务端包
pip install milvus
#启动 Milvus 本地服务
milvus-server --data D:\logs\milvus_data

milvus客户端工具
参考:https://github.com/zilliztech/attu/blob/main/README_CN.md
collection
elasticsearch安装
[集群配置参考](https://blog.csdn.net/wenwang3000/article/details/99820920),这里受环境限制配置单节点
配置
shell
# 配置项
cluster.name: es8
node.name: node1
path.data: D:\DataBase\elasticsearch-8.19.14\data
path.logs: D:\DataBase\elasticsearch-8.19.14\logs
network.host: 127.0.0.1
http.port: 9200
discovery.type: single-node
xpack.security.enabled: false
http.cors.enabled: true
http.cors.allow-origin: "*"
ik分词器
java
#安装 ik分词器 插件
cd D:\DataBase\elasticsearch-8.19.14\plugins
.\elasticsearch-plugin install https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-8.19.14.zip

启动服务
java
# 启动服务 不带任何参数) 打开浏览器,访问 http://localhost:9200
elasticsearch.bat(不带任何参数)

elasticsearch客户端工具
轻量、现代化的Web UI;可作为浏览器插件一键安装,极客风,支持REST API调试 https://github.com/cars10/elasticvue
浏览器插件版本:
展示效果

核心代码(java版本)
jar依赖
java
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.5.9</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.spring.cloud.admin</groupId>
<artifactId>com-spring-ai</artifactId>
<version>1.0</version>
<name>com-spring-ai</name>
<description>admin</description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>17</java.version>
<spring-ai.version>1.0.0</spring-ai.version>
<elasticsearch-java.version>8.12.0</elasticsearch-java.version>
<milvus-sdk-java.version>2.6.18</milvus-sdk-java.version>
<commons-lang3.version>3.7</commons-lang3.version>
<alibaba-fastjson.version>1.2.58</alibaba-fastjson.version>
<org-projectlombok.version>1.18.44</org-projectlombok.version>
<jackson.version>2.15.3</jackson.version>
<langchain4j.version>1.13.1</langchain4j.version>
</properties>
<dependencies>
<!-- LangChain4j core + OpenAI -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<!-- LangChain4j Ollama 集成 -->
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<!-- 文档解析器(支持 PDF/Word/HTML 等) -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Apache Commons Compress - 修复 Tika ZIP 处理问题 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.26.0</version>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<!-- Spring AI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<!--<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-milvus</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>-->
<!-- Elasticsearch Java Client -->
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>${elasticsearch-java.version}</version>
</dependency>
<!-- Milvus Java SDK -->
<dependency>
<groupId>io.milvus</groupId>
<artifactId>milvus-sdk-java</artifactId>
<version>${milvus-sdk-java.version}</version>
</dependency>
<!-- Jackson for JSON -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${jackson.version}</version>
</dependency>
<!--common-->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>${commons-lang3.version}</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${alibaba-fastjson.version}</version>
</dependency>
<!--lombok-->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
<version>${org-projectlombok.version}</version>
</dependency>
<!--web-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
yml配置
java
server:
port: 8080
servlet:
# 项目contextPath
context-path: /
spring:
application:
name: ai
#ai
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: qwen3.5:2b
temperature: 0.7
embedding:
options:
model: qwen3-embedding:0.6b
autoconfigure:
exclude:
- org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchDataAutoConfiguration
- org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchRepositoriesAutoConfiguration
# Elasticsearch Configuration
elasticsearch:
host: localhost
port: 9200
scheme: http
username:
password:
index:
name: documents
# Milvus Configuration
milvus:
host: localhost
port: 19530
db-name: test
collection:
name: test_ai
index:
type: IVF_FLAT
metric-type: IP
embedding-dimension: 1024
search.top: 2
logging:
level:
com.example: DEBUG
服务入口
java
package com.controller;
import com.ai.rag.bean.AskRequest;
import com.ai.rag.bean.DocumentChunk;
import com.ai.rag.bean.FIleInfo;
import com.ai.rag.service.DocumentChunkService;
import com.ai.rag.service.RagQueryService;
import com.alibaba.fastjson.JSONObject;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.tika.TikaDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.*;
/**
┌─────────────────────────────────────────────────────────────────────┐
│ Spring Boot Application │
├─────────────────────────────────────────────────────────────────────┤
│ API Layer │
├─────────────────────────────────────────────────────────────────────┤
│ DocumentService │
│ (双写入库核心调度层) │
├─────────────────────────────────────────────────────────────────────┤
│ EmbeddingService │
│ (Ollama Chunk + 向量转换) │
├───────────────────────────────────┬─────────────────────────────────┤
│ ElasticsearchService │ MilvusService │
│ (全文检索 + 文本存储) │ (向量检索 + 向量存储) │
├───────────────────────────────────┴─────────────────────────────────┤
│ VectorStore │
│ (双写入库事务/补偿机制) │
└─────────────────────────────────────────────────────────────────────┘
*/
@Slf4j
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final DocumentChunkService documentChunkService;
private final RagQueryService ragQueryService;
public RagController(DocumentChunkService documentChunkService,
RagQueryService ragQueryService) {
this.documentChunkService = documentChunkService;
this.ragQueryService = ragQueryService;
}
/**
* 全文检索接口
*/
@GetMapping("/search/text")
public ResponseEntity<List<DocumentChunk>> textSearch(@RequestParam String query,
@RequestParam(defaultValue = "10") int limit) {
return ResponseEntity.ok(ragQueryService.getSearchResults(query, limit));
}
/**
* 混合检索接口
*/
@GetMapping("/search/hybrid/{topK}")
public ResponseEntity<List<DocumentChunk>> hybridSearch(
@RequestParam String query,
@PathVariable(required = false) Integer topK) {
int finalTopK = (topK != null) ? topK : 10;
return ResponseEntity.ok(ragQueryService.hybridSearch(query, topK));
}
/**
* RAG 问答接口
*/
@PostMapping("/ask")
public ResponseEntity<Map<String, String>> askQuestion(@RequestBody AskRequest request) {
log.info("askQuestion-入参{}", JSONObject.toJSONString(request));
String answer = ragQueryService.askQuestion(request.getQuestion());
Map<String, String> response = new HashMap<>();
response.put("question", request.getQuestion());
response.put("answer", answer);
return ResponseEntity.ok(response);
}
@RequestMapping("/tikaDocument")
public String tikaDocument(@RequestBody FIleInfo fIleInfo) {
log.info("tikaDocument-入参{}", JSONObject.toJSONString(fIleInfo));
/**
* 基于Apache Tika 技术实现,支持自动检测和解析超过 1000 种文件格式。
*可处理 PDF、DOC/DOCX、PPT/PPTX、XLS/XLSX、HTML、EPUB、ZIP(内含文档)等主流办公与结构化文档格式
* 统一文本提取:将异构文档转换为标准化的纯文本,便于后续 AI 处理(如 RAG、知识库构建、文档分类等)
* 元数据保留:自动提取作者、创建日期、MIME 类型等结构化信息
*Spring 生态集成:作为 DocumentReader 接口的实现,可无缝接入 Spring AI 的文档处理流水线
*/
TikaDocumentReader reader = new TikaDocumentReader(fIleInfo.getUrl());
/**
* * PagePdfDocumentReader - 按页面读取PDF文档,适合需要分页处理的场景
* * ParagraphPdfDocumentReader - 按段落读取PDF文档,保持文档的逻辑结构
*/
//PagePdfDocumentReader reader = new PagePdfDocumentReader(fIleInfo.getUrl());
/**
* Spring AI 提供的按 Token 精确切分长文本的工具,核心用于 RAG 场景,把超长文档切分成符合大模型上下文限制的小块,同时保证语义连贯
* 精确按 Token 计数,底层集成专业分词器,安全适配模型要求
* int defaultChunkSize = 800, // 每块最大 Token 数
* int chunkOverlap = 160, // 块间重叠 Token 数(默认 20%)
* int minChunkSizeChars = 350, // 每块最小字符数
* int minChunkLengthToEmbed = 5, // 可嵌入的最小长度
* int maxNumChunks = 10000, // 最大分块数
* boolean keepSeparator = true // 保留分隔符(换行/句号)
*/
TokenTextSplitter splitter = new TokenTextSplitter();
List<Document> documents = splitter.apply(reader.read());
log.info("tikaDocument-分割结果:{}", documents.size());
List<DocumentChunk> chunks = new ArrayList<>();
for (int i = 0; i < documents.size(); i++) {
DocumentChunk chunk = DocumentChunk.builder()
.id(documents.get(i).getId())
.documentId(fIleInfo.getDocumentId())
.content(documents.get(i).getText())
.title(fIleInfo.getTitle())
.author(fIleInfo.getAuthor())
.category(fIleInfo.getCategory())
.chunkIndex(i)
.createdAt(new Date())
.updatedAt(new Date())
.totalChunks(documents.size())
.build();
chunks.add(chunk);
}
if (chunks.size() >0){
documentChunkService.batchDualWriteDocumentChunks(chunks);
}
return "ok";
}
}
基础服务
milvus服务
java
package com.ai.rag.service;
import com.ai.rag.bean.DocumentChunk;
import com.ai.rag.constant.FiledEnums;
import io.milvus.client.MilvusServiceClient;
import io.milvus.grpc.DataType;
import io.milvus.grpc.MutationResult;
import io.milvus.grpc.SearchResults;
import io.milvus.param.*;
import io.milvus.param.collection.*;
import io.milvus.param.dml.DeleteParam;
import io.milvus.param.dml.InsertParam;
import io.milvus.param.dml.SearchParam;
import io.milvus.param.index.CreateIndexParam;
import io.milvus.response.QueryResultsWrapper;
import io.milvus.response.SearchResultsWrapper;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import jakarta.annotation.PostConstruct;
import java.util.*;
import java.util.stream.Collectors;
@Slf4j
@Service
public class MilvusService {
@Value("${milvus.host}")
private String milvusHost;
@Value("${milvus.port}")
private int milvusPort;
@Value("${milvus.db-name}")
private String dbName;
@Value("${milvus.collection.name}")
private String collectionName;
@Value("${milvus.index.type}")
private String indexType;
@Value("${milvus.index.metric-type}")
private String metricType;
@Value("${milvus.index.embedding-dimension}")
private Integer embeddingDim;
private MilvusServiceClient milvusClient;
@PostConstruct
public void init() {
ConnectParam connectParam = ConnectParam.newBuilder()
.withHost(milvusHost)
.withPort(milvusPort)
.withDatabaseName(dbName)
.build();
milvusClient = new MilvusServiceClient(connectParam);
log.info("Connected to Milvus at {}:{}", milvusHost, milvusPort);
// 创建集合(如果不存在)
createCollectionIfNotExists();
}
private void createCollectionIfNotExists() {
// 检查集合是否存在
R<Boolean> hasCollectionRes = milvusClient.hasCollection(
HasCollectionParam.newBuilder()
.withCollectionName(collectionName)
.build()
);
if (hasCollectionRes.getData()) {
log.info("Milvus collection already exists: {}", collectionName);
return;
}
// 定义字段结构
List<FieldType> fieldTypes = Arrays.asList(
FieldType.newBuilder()
.withName(FiledEnums.id.getCode())
.withDataType(DataType.VarChar)
.withMaxLength(255)
.withPrimaryKey(true)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.documentId.getCode())
.withDataType(DataType.VarChar)
.withMaxLength(255)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.content.getCode())
.withDataType(DataType.VarChar)
.withMaxLength(65535)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.title.getCode())
.withDataType(DataType.VarChar)
.withMaxLength(255)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.author.getCode())
.withDataType(DataType.VarChar)
.withMaxLength(255)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.category.getCode())
.withDataType(DataType.VarChar)
.withMaxLength(255)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.chunkIndex.getCode())
.withDataType(DataType.Int32)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.totalChunks.getCode())
.withDataType(DataType.Int32)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.createdAt.getCode())
.withDataType(DataType.Int64)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.updatedAt.getCode())
.withDataType(DataType.Int64)
.build(),
// 2.6+版本才支持
/* FieldType.newBuilder()
.withName(FiledEnums.createdAt.getCode())
.withDataType(DataType.Timestamptz)
.build(),
FieldType.newBuilder()
.withName(FiledEnums.updatedAt.getCode())
.withDataType(DataType.Timestamptz)
.build(),*/
FieldType.newBuilder()
.withName(FiledEnums.embedding.getCode())
.withDataType(DataType.FloatVector)
.withDimension(embeddingDim)
.build()
);
CreateCollectionParam createParam = CreateCollectionParam.newBuilder()
.withCollectionName(collectionName)
.withDescription("测试db双写")
.withFieldTypes(fieldTypes)
.build();
R<RpcStatus> res = milvusClient.createCollection(createParam);
if (res.getStatus() == R.Status.Success.getCode()) {
log.info("Created Milvus collection: {}", collectionName);
createIndex();
} else {
log.error("Failed to create Milvus collection: {}", res.getMessage());
}
}
/**
* 使用 IVF_FLAT 索引类型时,必须在 extraParam 中提供 nlist 参数,该参数控制了向量空间划分的簇数量。
*/
private void createIndex() {
// 定义索引参数
CreateIndexParam indexParam = CreateIndexParam.newBuilder()
.withCollectionName(collectionName)
.withFieldName(FiledEnums.embedding.getCode())
.withIndexType(IndexType.valueOf(indexType))
.withMetricType(MetricType.valueOf(metricType))
.withExtraParam("{\"nlist\":" + embeddingDim + "}")
.build();
R<RpcStatus> res = milvusClient.createIndex(indexParam);
if (res.getStatus() == R.Status.Success.getCode()) {
log.info("Created index on collection: {} with type {}", collectionName, indexType);
// 加载集合到内存
LoadCollectionParam loadParam = LoadCollectionParam.newBuilder()
.withCollectionName(collectionName)
.build();
milvusClient.loadCollection(loadParam);
log.info("Loaded collection: {} into memory", collectionName);
} else {
log.error("Failed to create index: {}", res.getMessage());
}
}
/**
* 批量插入文档块
*/
public boolean batchInsertDocumentChunks(List<DocumentChunk> chunks) {
if (chunks.isEmpty()) return true;
try {
List<InsertParam.Field> fields = new ArrayList<>();
fields.add(new InsertParam.Field(FiledEnums.id.getCode(), chunks.stream().map(DocumentChunk::getId)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.documentId.getCode(), chunks.stream().map(DocumentChunk::getDocumentId)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.content.getCode(), chunks.stream().map(DocumentChunk::getContent)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.category.getCode(), chunks.stream().map(DocumentChunk::getCategory)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.title.getCode(), chunks.stream().map(DocumentChunk::getTitle)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.author.getCode(), chunks.stream().map(DocumentChunk::getAuthor)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.chunkIndex.getCode(), chunks.stream().map(DocumentChunk::getChunkIndex)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.totalChunks.getCode(), chunks.stream().map(DocumentChunk::getTotalChunks)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.embedding.getCode(), chunks.stream().map(DocumentChunk::getEmbedding)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.createdAt.getCode(), chunks.stream()
.map(chunk -> chunk.getCreatedAt() != null ? chunk.getCreatedAt().getTime() : System.currentTimeMillis())
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.updatedAt.getCode(), chunks.stream()
.map(chunk -> chunk.getUpdatedAt() != null ? chunk.getUpdatedAt().getTime() : System.currentTimeMillis())
.collect(Collectors.toList())));
// 2.6+版本才支持
/* fields.add(new InsertParam.Field(FiledEnums.createdAt.getCode(), chunks.stream().map(DocumentChunk::getCreatedAt)
.collect(Collectors.toList())));
fields.add(new InsertParam.Field(FiledEnums.updatedAt.getCode(), chunks.stream().map(DocumentChunk::getUpdatedAt)
.collect(Collectors.toList())));*/
InsertParam insertParam = InsertParam.newBuilder()
.withCollectionName(collectionName)
.withFields(fields)
.build();
R<MutationResult> res = milvusClient.insert(insertParam);
if (res.getStatus() == R.Status.Success.getCode()) {
log.info("Batch inserted {} chunks to Milvus", chunks.size());
return true;
} else {
log.error("Failed to batch insert to Milvus: {}", res.getMessage());
return false;
}
} catch (Exception e) {
log.error("Exception during Milvus batch insert", e);
return false;
}
}
/**
* 根据ID删除文档块
*/
public boolean deleteDocumentChunk(String id) {
try {
String expr = String.format("id == \"%s\"", id);
DeleteParam deleteParam = DeleteParam.newBuilder()
.withCollectionName(collectionName)
.withExpr(expr)
.build();
R<MutationResult> res = milvusClient.delete(deleteParam);
if (res.getStatus() == R.Status.Success.getCode()) {
log.debug("Document deleted from Milvus: {}", id);
return true;
} else {
log.error("Failed to delete from Milvus: {}", res.getMessage());
return false;
}
} catch (Exception e) {
log.error("Exception during Milvus delete", e);
return false;
}
}
/**
* 向量相似度搜索
*/
public List<DocumentChunk> searchByVector(List<Float> queryVector, int topK) {
try {
List<String> outputFields = Arrays.asList("id", "documentId", "content",
"title", "category", "chunkIndex");
SearchParam searchParam = SearchParam.newBuilder()
.withCollectionName(collectionName)
.withVectorFieldName(FiledEnums.embedding.getCode())
.withVectors(Arrays.asList(queryVector))
.withTopK(topK)
.withMetricType(MetricType.valueOf(metricType))
// nlist = 1024(索引时划分的簇数) nprobe = 10(搜索时检查的簇数) 比例 = 10/1024 ≈ 1%
// 只搜索约 1% 的向量空间 如果发现搜索结果不够准确,可以适当增加 nprobe:
.withParams("{\"nprobe\":10}")
/** 做权限过滤 .withExpr("")
* // 1. 等于
* "author == \"admin\""
* // 2. 不等于
* "author != \"guest\""
* // 3. 大于/小于(数值类型)
* "createdAt > 1700000000000"
* // 4. IN 操作
* "category in [\"tech\", \"science\"]"
* // 5. NOT IN
* "category not in [\"private\", \"confidential\"]"
* // 6. AND/OR 组合
* "(author == \"admin\" || category == \"public\") && createdAt > 1700000000000"
* // 7. LIKE 模糊匹配(VarChar 字段)
* "title like \"AI%\""
* // 8. 范围查询
* "createdAt >= 1700000000000 && createdAt <= 1710000000000"
*/
.withOutFields(outputFields)
.build();
R<SearchResults> res = milvusClient.search(searchParam);
if (res.getStatus() == R.Status.Success.getCode()) {
SearchResultsWrapper wrapper = new SearchResultsWrapper(res.getData().getResults());
List<DocumentChunk> results = new ArrayList<>();
for (QueryResultsWrapper.RowRecord record : wrapper.getRowRecords()) {
DocumentChunk chunk = DocumentChunk.builder()
.id((String)record.get("id"))
.documentId((String) record.get("documentId"))
.content((String) record.get("content"))
.title((String) record.get("title"))
.category((String) record.get("category"))
.chunkIndex((Integer) record.get("chunkIndex"))
.build();
results.add(chunk);
}
return results;
} else {
log.error("Failed to search in Milvus: {}", res.getMessage());
return new ArrayList<>();
}
} catch (Exception e) {
log.error("Exception during Milvus search", e);
return new ArrayList<>();
}
}
}
es服务
java
package com.ai.rag.service;
import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch._types.mapping.Property;
import co.elastic.clients.elasticsearch.core.*;
import co.elastic.clients.elasticsearch.core.search.Hit;
import co.elastic.clients.elasticsearch.indices.CreateIndexRequest;
import co.elastic.clients.elasticsearch.indices.ExistsRequest;
import co.elastic.clients.elasticsearch.indices.IndexSettings;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.ElasticsearchTransport;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import com.ai.rag.bean.DocumentChunk;
import com.ai.rag.constant.FiledEnums;
import jakarta.annotation.PostConstruct;
import lombok.extern.slf4j.Slf4j;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.elasticsearch.client.RestClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
@Slf4j
@Service
public class ElasticsearchService {
@Value("${elasticsearch.host}")
private String esHost;
@Value("${elasticsearch.port}")
private int esPort;
@Value("${elasticsearch.scheme}")
private String esScheme;
@Value("${elasticsearch.username}")
private String esUsername;
@Value("${elasticsearch.password}")
private String esPassword;
@Value("${elasticsearch.index.name}")
private String indexName;
private ElasticsearchClient esClient;
@Value("${milvus.index.embedding-dimension}")
private Integer embeddingDim;
@PostConstruct
public void init() throws IOException {
RestClient restClient = buildRestClient();
ElasticsearchTransport transport = new RestClientTransport(
restClient, new JacksonJsonpMapper()
);
this.esClient = new ElasticsearchClient(transport);
// 创建索引(如果不存在)
createIndexIfNotExists();
}
private RestClient buildRestClient() {
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
if (esUsername != null && !esUsername.isEmpty()) {
credentialsProvider.setCredentials(
AuthScope.ANY,
new UsernamePasswordCredentials(esUsername, esPassword)
);
}
return RestClient.builder(
new HttpHost(esHost, esPort, esScheme)
).setHttpClientConfigCallback(httpClientBuilder ->
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
).build();
}
/**
* Mapping 理解为索引的"蓝图"或"表结构",它定义了每个字段的数据类型、是否被索引、如何被分析等关键属性
* @throws IOException
*/
private void createIndexIfNotExists() throws IOException {
ExistsRequest existsRequest = ExistsRequest.of(e -> e.index(indexName));
boolean exists = esClient.indices().exists(existsRequest).value();
if (!exists) {
CreateIndexRequest createIndexRequest = CreateIndexRequest.of(i -> i
.index(indexName)
.settings(IndexSettings.of(s -> s
.numberOfShards("1")
.numberOfReplicas("1")
))
.mappings(m -> m
.properties(FiledEnums.id.getCode(), Property.of(p -> p.keyword(k -> k)))
.properties(FiledEnums.documentId.getCode(), Property.of(p -> p.keyword(k -> k)))
.properties(FiledEnums.content.getCode(), Property.of(p -> p.text(t -> t
.analyzer("ik_max_word") //最细粒度分词(索引时使用)
.searchAnalyzer("ik_smart") //智能分词(搜索时使用)
)))
.properties(FiledEnums.title.getCode(), Property.of(p -> p.text(t -> t
.analyzer("ik_max_word") //最细粒度分词(索引时使用)
.searchAnalyzer("ik_smart") //智能分词(搜索时使用)
)))
.properties(FiledEnums.author.getCode(), Property.of(p -> p.text(t -> t
.analyzer("ik_max_word") //最细粒度分词(索引时使用)
.searchAnalyzer("ik_smart") //智能分词(搜索时使用)
)))
.properties(FiledEnums.category.getCode(), Property.of(p -> p.text(t -> t)))
.properties(FiledEnums.embedding.getCode(), Property.of(p -> p.denseVector(t -> t.dims(embeddingDim))))
.properties(FiledEnums.chunkIndex.getCode(), Property.of(p -> p.integer(t -> t)))
.properties(FiledEnums.totalChunks.getCode(), Property.of(p -> p.integer(t -> t)))
.properties(FiledEnums.createdAt.getCode(), Property.of(p -> p.date(t -> t)))
.properties(FiledEnums.updatedAt.getCode(), Property.of(p -> p.date(t -> t)))
)
);
esClient.indices().create(createIndexRequest);
log.info("Created Elasticsearch index: {}", indexName);
}
}
/**
* 批量插入文档块
*/
public boolean batchInsertDocumentChunks(List<DocumentChunk> chunks) {
try {
var bulkRequest = BulkRequest.of(b -> {
b.index(indexName);
for (DocumentChunk chunk : chunks) {
b.operations(op -> op
.index(idx -> idx
.id(chunk.getId())
.document(chunk)
)
);
}
return b;
});
BulkResponse response = esClient.bulk(bulkRequest);
if (response.errors()) {
log.error("Bulk insert to ES had errors");
return false;
}
log.info("Batch inserted {} chunks to Elasticsearch", chunks.size());
return true;
} catch (IOException e) {
log.error("Failed to batch insert document chunks to Elasticsearch", e);
return false;
}
}
/**
* 全文搜索
*/
public List<DocumentChunk> search(String query, int limit) {
try {
SearchResponse<DocumentChunk> response = esClient.search(s -> s
.index(indexName)
.query(q -> q
.match(m -> m
.field("content")
.query(query)
)
)
.size(limit),
DocumentChunk.class
);
List<DocumentChunk> results = new ArrayList<>();
for (Hit<DocumentChunk> hit : response.hits().hits()) {
if (hit.source() != null) {
results.add(hit.source());
}
}
return results;
} catch (IOException e) {
log.error("Failed to search in Elasticsearch", e);
return new ArrayList<>();
}
}
}
向量化服务
java
package com.ai.rag.service;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.ArrayList;
import java.util.List;
@Slf4j
@Service
public class OllamaEmbeddingService {
@Value("${spring.ai.ollama.base-url}")
private String ollamaBaseUrl;
@Value("${spring.ai.ollama.embedding.options.model}")
private String embeddingModel;
private final HttpClient httpClient;
private final ObjectMapper objectMapper;
public OllamaEmbeddingService() {
this.httpClient = HttpClient.newHttpClient();
this.objectMapper = new ObjectMapper();
}
/**
* 生成单个文本的 Embedding 向量
*/
public List<Float> generateEmbedding(String text) {
try {
String requestBody = String.format(
"{\"model\":\"%s\",\"input\":\"%s\"}",
embeddingModel,
escapeJson(text)
);
// log.info("Ollama embedding request: {}", requestBody);
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(ollamaBaseUrl + "/api/embed"))
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(requestBody))
.build();
HttpResponse<String> response = httpClient.send(request,
HttpResponse.BodyHandlers.ofString());
// log.info("Ollama embedding response: {}", response.body());
if (response.statusCode() == 200) {
JsonNode jsonNode = objectMapper.readTree(response.body());
JsonNode embeddings = jsonNode.get("embeddings");
if (embeddings != null && embeddings.isArray() && embeddings.size() > 0) {
List<Float> embedding = new ArrayList<>();
for (JsonNode value : embeddings.get(0)) {
embedding.add(value.floatValue());
}
return embedding;
}
} else {
log.error("Ollama embedding failed with status: {}", response.statusCode());
}
} catch (IOException | InterruptedException e) {
log.error("Failed to generate embedding", e);
Thread.currentThread().interrupt();
}
return null;
}
/**
* 批量生成 Embedding 向量
*/
public List<List<Float>> generateEmbeddingsBatch(List<String> texts) {
List<List<Float>> embeddings = new ArrayList<>();
for (String text : texts) {
List<Float> embedding = generateEmbedding(text);
if (embedding != null) {
embeddings.add(embedding);
} else {
log.error("Failed to generate embedding for text: {}", text.substring(0,
Math.min(50, text.length())));
}
}
return embeddings;
}
private String escapeJson(String text) {
return text.replace("\\", "\\\\")
.replace("\"", "\\\"")
.replace("\n", "\\n")
.replace("\r", "\\r");
}
}
数据双写
java
package com.ai.rag.service;
import com.ai.rag.bean.DocumentChunk;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
@Slf4j
@Service
public class DocumentChunkService {
private final ElasticsearchService elasticsearchService;
private final MilvusService milvusService;
private final OllamaEmbeddingService ollamaEmbeddingService;
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
public DocumentChunkService(ElasticsearchService elasticsearchService,
MilvusService milvusService,
OllamaEmbeddingService ollamaEmbeddingService) {
this.elasticsearchService = elasticsearchService;
this.milvusService = milvusService;
this.ollamaEmbeddingService = ollamaEmbeddingService;
}
/**
* 批量文档块双写入库
*/
public boolean batchDualWriteDocumentChunks(List<DocumentChunk> chunks) {
if (chunks == null || chunks.isEmpty()) {
return true;
}
// Step 1: 批量生成 Embedding(优化性能)
List<String> texts = chunks.stream().map(DocumentChunk::getContent)
.collect(ArrayList::new, ArrayList::add, ArrayList::addAll);
List<List<Float>> embeddings = ollamaEmbeddingService.generateEmbeddingsBatch(texts);
if (embeddings.size() != chunks.size()) {
log.error("批量生成Embedding失败, expected {} but got {}", chunks.size(), embeddings.size());
return false;
}
for (int i = 0; i < chunks.size(); i++) {
chunks.get(i).setEmbedding(embeddings.get(i));
}
// Step 2: 批量写入 Elasticsearch
boolean esSuccess = elasticsearchService.batchInsertDocumentChunks(chunks);
if (!esSuccess) {
log.error("写入es失败 需完善补偿机制");
// TODO
return false;
}
// Step 3: 批量写入 Milvus
boolean milvusSuccess = milvusService.batchInsertDocumentChunks(chunks);
if (!milvusSuccess) {
log.error("写入milvus失败 需完善补偿机制");
// TODO
return false;
}
return true;
}
}
查询服务
java
package com.ai.rag.service;
import com.ai.rag.bean.DocumentChunk;
import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.*;
@Slf4j
@Service
public class RagQueryService {
private final ElasticsearchService elasticsearchService;
private final MilvusService milvusService;
private final OllamaEmbeddingService ollamaEmbeddingService;
@Value("${spring.ai.ollama.base-url}")
private String ollamaBaseUrl;
@Value("${spring.ai.ollama.chat.options.model}")
private String chatModel;
@Value("${spring.ai.ollama.chat.options.temperature}")
private String temperature;
@Value("${search.top}")
private Integer searchTop;
private final HttpClient httpClient;
// 请在 RagQueryService 中添加以下方法
public List<DocumentChunk> getSearchResults(String query, int limit) {
return elasticsearchService.search(query, limit);
}
public RagQueryService(ElasticsearchService elasticsearchService,
MilvusService milvusService,
OllamaEmbeddingService ollamaEmbeddingService) {
this.elasticsearchService = elasticsearchService;
this.milvusService = milvusService;
this.ollamaEmbeddingService = ollamaEmbeddingService;
this.httpClient = HttpClient.newHttpClient();
}
/**
* 混合检索:向量检索 + 全文检索,并进行结果融合
*/
public List<DocumentChunk> hybridSearch(String query, int topK) {
// Step 1: 向量检索 - 将查询转为向量后检索
List<Float> queryVector = ollamaEmbeddingService.generateEmbedding(query);
log.info("hybridSearch-将查询转为向量:{}", JSONObject.toJSONString(queryVector));
// log.info("hybridSearch-将查询转为向量后检索:{}", JSONObject.toJSONString(queryVector));
List<DocumentChunk> vectorResults = new ArrayList<>();
if (queryVector != null) {
log.info("hybridSearch-从向量库检索开始......");
vectorResults = milvusService.searchByVector(queryVector, topK);
log.info("hybridSearch-从向量库检索返回 {} 条", vectorResults.size());
}
// Step 2: 全文检索
log.info("hybridSearch-从es检索开始......");
List<DocumentChunk> textResults = elasticsearchService.search(query, topK);
log.info("hybridSearch-从es检索返回 {} 条", textResults.size());
// Step 3: 结果融合(RRF 算法去重融合)
return fusionResults(vectorResults, textResults, topK);
}
/**
* 基于 RRF(Reciprocal Rank Fusion)的结果融合
*/
private List<DocumentChunk> fusionResults(List<DocumentChunk> vectorResults,
List<DocumentChunk> textResults, int topK) {
Map<String, Double> scoreMap = new HashMap<>();
Map<String, DocumentChunk> chunkMap = new HashMap<>();
// 计算向量检索的 RRF 分数
for (int i = 0; i < vectorResults.size(); i++) {
DocumentChunk chunk = vectorResults.get(i);
String id = chunk.getId();
double rrfScore = 1.0 / (i + 60); // RRF with k=60
scoreMap.put(id, scoreMap.getOrDefault(id, 0.0) + rrfScore);
chunkMap.putIfAbsent(id, chunk);
}
// 计算文本检索的 RRF 分数
for (int i = 0; i < textResults.size(); i++) {
DocumentChunk chunk = textResults.get(i);
String id = chunk.getId();
double rrfScore = 1.0 / (i + 60);
scoreMap.put(id, scoreMap.getOrDefault(id, 0.0) + rrfScore);
chunkMap.putIfAbsent(id, chunk);
}
// 按 RRF 分数排序并取 topK
List<Map.Entry<String, Double>> sorted = new ArrayList<>(scoreMap.entrySet());
sorted.sort((a, b) -> b.getValue().compareTo(a.getValue()));
List<DocumentChunk> results = new ArrayList<>();
for (int i = 0; i < Math.min(topK, sorted.size()); i++) {
results.add(chunkMap.get(sorted.get(i).getKey()));
}
//log.info("fusionResults-结果融合结果 {}", JSONObject.toJSONString(results));
log.info("hybridSearch-RRF计计算后返回总条数: {}", results.size());
return results;
}
/**
* 完整的 RAG 问答
*/
public String askQuestion(String question) {
// Step 1: 混合检索获取相关文档块
List<DocumentChunk> relevantChunks = hybridSearch(question, searchTop);
if (relevantChunks.isEmpty()) {
return generateFallbackAnswer(question);
}
// Step 2: 构建 Prompt
String context = buildContextFromChunks(relevantChunks);
String prompt = buildPrompt(question, context);
// Step 3: 调用 Ollama 生成回答
return generateAnswer(prompt);
}
private String buildContextFromChunks(List<DocumentChunk> chunks) {
StringBuilder sb = new StringBuilder();
int rank = 1;
for (DocumentChunk chunk : chunks) {
sb.append("【参考文档").append(rank++).append("】\n");
sb.append(chunk.getContent()).append("\n\n");
}
return sb.toString();
}
private String buildPrompt(String question, String context) {
return String.format(
"你是一个专业的智能问答助手。请基于以下提供的参考资料内容,回答用户的问题。\n\n" +
"注意事项:\n" +
"1. 仅使用参考资料中的信息进行回答,不要编造或添加参考资料中不包含的内容\n" +
"2. 如果参考资料中不包含相关信息,请直接告知用户\"根据现有资料无法回答此问题\"\n" +
"3. 回答应当准确、简洁、有条理\n" +
"4. 确保回答与用户的问题相关\n\n" +
"=== 参考资料 ===\n%s\n" +
"=== 用户问题 ===\n%s\n\n" +
"=== 回答 ===\n",
context, question
);
}
private String generateAnswer(String prompt) {
try {
String requestBody = String.format(
"{\"model\":\"%s\",\"prompt\":\"%s\",\"stream\":false,\"temperature\":"+temperature+"}",
chatModel, escapeJson(prompt)
);
log.info("请求大模型开始:{}");
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(ollamaBaseUrl + "/api/generate"))
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(requestBody))
.build();
HttpResponse<String> response = httpClient.send(request,
HttpResponse.BodyHandlers.ofString());
log.info("请求大模型响应: {}", response.body());
if (response.statusCode() == 200) {
JSONObject jsonResponse = JSONObject.parseObject(response.body());
String responseBody = jsonResponse.getString("response");
log.info("请求大模型检索结果:{}", responseBody);
return responseBody;
} else {
log.error("请求大模型失败: {}", response.statusCode());
return "对不起,生成回答时出现了错误。";
}
} catch (IOException | InterruptedException e) {
log.error("请求大模型异常", e);
Thread.currentThread().interrupt();
return "对不起,生成回答时出现了错误。";
}
}
private String generateFallbackAnswer(String question) {
String prompt = String.format(
"用户提出了问题:\"%s\"\n" +
"但当前知识库中未检索到相关的资料。请友好地告知用户当前无法回答该问题," +
"并建议用户提供更多信息或尝试其他问题。\n\n" +
"请以自然、有帮助的态度进行回复。",
question
);
return generateAnswer(prompt);
}
private String escapeJson(String text) {
return text.replace("\\", "\\\\")
.replace("\"", "\\\"")
.replace("\n", "\\n")
.replace("\r", "\\r");
}
}



