前言
RAG(检索增强生成)是当前 AI 应用最火的方向之一。但很多人卡在"本地跑通"这一步------要么 API 不稳定,要么向量数据库配置复杂,要么代码架构一团糟。
今天分享一个我在本地跑通的 RAG 知识库项目,技术栈:
- 后端:Spring Boot 3 + MyBatis Plus + Qdrant 向量数据库
- 前端:Vue3 + Element Plus
- LLM:LM Studio(本地 qwen3-vl-4b)
- Embedding:本地 text-embedding-bge-base-zh-v1.5
一、项目架构
scss
┌─────────────────────────────────────────────────────┐
│ 前端 (Vue3) │
│ 文档上传 / 对话界面 / RAG 配置 │
└──────────────────┬──────────────────────────────────┘
│ HTTP / SSE
┌──────────────────▼──────────────────────────────────┐
│ 后端 (Spring Boot 3) │
│ ┌──────────────┴──────────────┐ │
│ │ ChatController │ │
│ │ /api/chat/ask (问答) │ │
│ └──────────────┬──────────────┘ │
│ │ │
│ ┌──────────────▼──────────────┐ │
│ │ ChatService │ │
│ │ 1. 问题向量化 │ │
│ │ 2. Qdrant 相似度检索 │ │
│ │ 3. 构建 Prompt │ │
│ │ 4. 调用 LM Studio LLM │ │
│ └──────────────┬──────────────┘ │
│ │ │
│ ┌──────────────▼──────────────┐ │
│ │ VectorizationService │ │
│ │ 文档切分 → Embedding → │ │
│ │ Qdrant 存储 │ │
│ └────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────────────┐
│ Qdrant │ │ LM Studio │
│ (向量数据库) │ │ (本地 LLM + Embedding) │
│ localhost:6333 │ │ localhost:1234 │
└─────────────────┘ └─────────────────────────┘
二、核心代码实现
2.1 依赖配置(pom.xml)
xml
<dependencies>
<!-- Spring Boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- MySQL + MyBatis Plus -->
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-spring-boot3-starter</artifactId>
<version>3.5.5</version>
</dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
</dependency>
<!-- Qdrant 向量数据库客户端 -->
<dependency>
<groupId>io.qdrant</groupId>
<artifactId>qdrant-client</artifactId>
<version>1.7.0</version>
</dependency>
<!-- HTTP 客户端(调用 LM Studio) -->
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.12.0</version>
</dependency>
<!-- JSON 处理 -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson2</artifactId>
<version>2.0.43</version>
</dependency>
<!-- 文档解析 -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.5</version>
</dependency>
<!-- WebClient(SSE 流式响应用) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
</dependencies>
2.2 配置 application.yml
yaml
server:
port: 8080
spring:
datasource:
url: jdbc:mysql://localhost:3306/doc_qa?useUnicode=true&characterEncoding=utf8
username: root
password: your_password
# LM Studio(LLM + Embedding)
lmstudio:
base-url: http://localhost:1234
api-key: sk-lm-xxxxx
chat-model: qwen/qwen3-vl-4b
embeddings-model: text-embedding-bge-base-zh-v1.5
# Qdrant 向量数据库
qdrant:
host: localhost
port: 6333
collection-name: documents
# RAG 配置
rag:
top-k: 5
similarity-threshold: 0.3
chunk-size: 1000
chunk-overlap: 100
2.3 向量服务(核心)
scss
@Service
public class VectorizationService {
@Value("${qdrant.host}")
private String qdrantHost;
@Value("${qdrant.port}")
private int qdrantPort;
@Value("${lmstudio.base-url}")
private String lmstudioUrl;
@Value("${lmstudio.embeddings-model}")
private String embeddingModel;
@Value("${rag.chunk-size}")
private int chunkSize;
@Value("${rag.chunk-overlap}")
private int chunkOverlap;
private final QdrantClient qdrantClient;
private final OkHttpClient httpClient;
public VectorizationService() {
this.qdrantClient = new QdrantClient("localhost", 6333");
this.httpClient = new OkHttpClient.Builder()
.connectTimeout(30, TimeUnit.SECONDS)
.readTimeout(60, TimeUnit.SECONDS)
.build();
}
/**
* 文本向量化
*/
public float[] embedText(String text) throws IOException {
String url = lmstudioUrl + "/v1/embeddings";
Map<String, Object> body = new HashMap<>();
body.put("model", embeddingModel);
body.put("input", Collections.singletonList(text));
String json = JSON.toJSONString(body);
Request request = new Request.Builder()
.url(url)
.addHeader("Authorization", "Bearer " + getLmApiKey())
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(json, MediaType.parse("application/json")))
.build();
try (Response response = httpClient.newCall(request).execute()) {
String responseBody = response.body().string();
Map<String, Object> result = JSON.parseObject(responseBody);
List<?> data = (List<?>) result.get("data");
Map<?, ?> embeddingData = (Map<?, ?>) data.get(0);
List<?> embedding = (List<?>) embeddingData.get("embedding");
float[] vector = new float[embedding.size()];
for (int i = 0; i < embedding.size(); i++) {
vector[i] = ((Number) embedding.get(i)).floatValue();
}
return vector;
}
}
/**
* 智能文本切分(按句子断句,避免在句子中间切断)
*/
public List<String> chunkText(String text) {
List<String> chunks = new ArrayList<>();
// 按换行或句号分割句子
String[] sentences = text.split("(?<=[。!?.!?\n])");
StringBuilder current = new StringBuilder();
for (String sentence : sentences) {
if (current.length() + sentence.length() > chunkSize) {
if (current.length() > 0) {
chunks.add(current.toString().trim());
// 重叠部分
String tail = current.toString();
current = new StringBuilder(
tail.length() > chunkOverlap
? tail.substring(tail.length() - chunkOverlap)
: tail
);
}
}
current.append(sentence);
}
if (current.length() > 0) {
chunks.add(current.toString().trim());
}
return chunks;
}
/**
* 存储到 Qdrant
*/
public void store(String documentId, List<String> chunks) throws Exception {
List<PointStruct> points = new ArrayList<>();
List<Float[]> vectors = new ArrayList<>();
int idx = 0;
for (String chunk : chunks) {
float[] vector = embedText(chunk);
Float[] v = new Float[vector.length];
for (int i = 0; i < vector.length; i++) v[i] = vector[i];
PointStruct point = PointStruct.of(
UUID.randomUUID().toString(),
Arrays.stream(vector).boxed().toList(),
Map.of(
"document_id", documentId,
"chunk_index", idx,
"content", chunk
)
);
points.add(point);
vectors.add(v);
idx++;
}
qdrantClient.upsert(
CollectionName,
points
);
}
/**
* 相似度搜索
*/
public List<SearchResult> search(String query, int topK) throws Exception {
float[] queryVector = embedText(query);
SearchResult[] results = qdrantClient.search(
CollectionName,
SearchVector.from(queryVector),
Limit.of(topK),
WithPayload.of(true),
ScoreThreshold.of(0.3f)
);
List<SearchResult> list = new ArrayList<>();
for (ScoredPoint point : results) {
list.add(new SearchResult(
point.getId().toString(),
(String) point.getPayload().get("content"),
(String) point.getPayload().get("document_id"),
point.getScore()
));
}
return list;
}
}
2.4 Chat 服务(完整 RAG 链路)
ini
@Service
public class ChatService {
@Value("${lmstudio.chat-model}")
private String chatModel;
private final VectorizationService vectorizationService;
private final OkHttpClient httpClient;
private static final String SYSTEM_PROMPT = """
你是一个专业的知识库问答助手。请根据提供的上下文信息,
准确、简洁地回答用户的问题。
如果上下文没有相关信息,请说明"抱歉,知识库中没有找到相关内容"。
请用中文回答。
""";
public ChatService(VectorizationService vectorizationService) {
this.vectorizationService = vectorizationService;
this.httpClient = new OkHttpClient.Builder()
.readTimeout(120, TimeUnit.SECONDS)
.build();
}
/**
* 完整 RAG 链路:检索 → 构建 Prompt → LLM 生成
*/
public String answer(String question, int topK) throws Exception {
// Step 1: 检索相关文档
List<VectorizationService.SearchResult> docs =
vectorizationService.search(question, topK);
if (docs.isEmpty()) {
return "抱歉,知识库中没有找到与您问题相关的内容。";
}
// Step 2: 构建上下文
StringBuilder context = new StringBuilder();
for (VectorizationService.SearchResult doc : docs) {
context.append("【").append(doc.getDocumentName()).append("】\n");
context.append(doc.getContent()).append("\n\n");
}
// Step 3: 调用 LLM
String prompt = String.format("""
%s
参考信息:
%s
用户问题:%s
""",
SYSTEM_PROMPT,
context.toString(),
question
);
return callLLM(prompt);
}
/**
* 调用 LM Studio LLM
*/
private String callLLM(String prompt) throws IOException {
String url = lmstudioUrl + "/v1/chat/completions";
Map<String, Object> body = new HashMap<>();
body.put("model", chatModel);
body.put("messages", List.of(
Map.of("role", "user", "content", prompt)
));
body.put("max_tokens", 1000);
body.put("temperature", 0.7);
Request request = new Request.Builder()
.url(url)
.addHeader("Authorization", "Bearer " + getLmApiKey())
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(
JSON.toJSONString(body),
MediaType.parse("application/json")
))
.build();
try (Response response = httpClient.newCall(request).execute()) {
String responseBody = response.body().string();
Map<String, Object> result = JSON.parseObject(responseBody);
List<?> choices = (List<?>) result.get("choices");
Map<?, ?> choice = (Map<?, ?>) choices.get(0);
Map<?, ?> message = (Map<?, ?>) choice.get("message");
return (String) message.get("content");
}
}
}
2.5 Controller
less
@RestController
@RequestMapping("/api/chat")
public class ChatController {
private final ChatService chatService;
@Value("${rag.top-k}")
private int topK;
public ChatController(ChatService chatService) {
this.chatService = chatService;
}
@PostMapping("/ask")
public ApiResponse<Map<String, Object>> ask(@RequestBody Map<String, String> request) {
try {
String question = request.get("question");
if (question == null || question.trim().isEmpty()) {
return ApiResponse.error("问题不能为空");
}
// 获取相关文档
List<VectorizationService.SearchResult> results =
chatService.searchRelevantDocuments(question, topK);
// 构建来源信息
List<Map<String, Object>> sources = new ArrayList<>();
StringBuilder context = new StringBuilder();
for (VectorizationService.SearchResult result : results) {
context.append("【").append(result.getDocumentName()).append("】\n");
context.append(result.getContent()).append("\n\n");
sources.add(Map.of(
"documentName", result.getDocumentName(),
"content", result.getContent(),
"score", result.getScore()
));
}
// 生成回答
String answer = chatService.generateAnswer(question, context.toString());
return ApiResponse.success(Map.of(
"question", question,
"answer", answer,
"sources", sources,
"contextCount", sources.size()
));
} catch (Exception e) {
return ApiResponse.error("问答失败: " + e.getMessage());
}
}
}
三、本地环境准备
3.1 安装 LM Studio
下载地址:lmstudio.ai/
下载后:
- 搜索并下载
qwen/qwen3-vl-4b模型(4B 参数,内存占用 ~4GB) - 下载
text-embedding-bge-base-zh-v1.5Embedding 模型 - 启动本地服务器(Local Server),API Token 会自动生成
3.2 安装 Qdrant
bash
# Windows (Docker)
docker run -d -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
# 或者下载独立版
# https://github.com/qdrant/qdrant/releases
3.3 启动服务顺序
bash
# 1. MySQL(确保已安装)
# 2. Qdrant
docker run -d -p 6333:6333 qdrant/qdrant
# 3. LM Studio(启动后点击 "Local Server")
# 4. 后端
cd backend && mvn spring-boot:run
# 5. 前端
cd frontend && npm install && npm run dev
四、实际效果演示
上传一份公司文档后,提问测试:
Q:公司的核心业务是什么?
答:公司的核心业务是人工智能、大数据分析、云计算服务三大板块。
Q:公司有多少员工?
答:公司员工超过 5000 人,其中研发人员占比 60%。
来源卡片会显示检索到的文档片段和相似度分数。
五、关键踩坑记录
❌ 坑 1:LM Studio API Token 不一致
LM Studio 有两套 API:
- Local Server API (
/v1/chat/completions):需要 Bearer Token - Developer API (
/api/v1/chat):可能免 Token
解决:使用 Local Server 时,必须在请求头加上:
code复制
makefile
Authorization: Bearer sk-lm-xxxxx
❌ 坑 2:Embedding 模型向量维度不匹配
如果 Qdrant 集合创建时维度和 Embedding 输出维度不一致,会导致存储失败。
解决:先测试一次 Embedding 获取实际维度:
bash
curl http://localhost:1234/v1/embeddings \
-H "Authorization: Bearer $LM_API_TOKEN" \
-d '{"model":"text-embedding-bge-base-zh-v1.5","input":["测试"]}'
# 返回的 embedding 数组长度就是向量维度
❌ 坑 3:PDF 解析乱码
Apache PDFBox 解析中文 PDF 时乱码,需要指定字体。
结语
这套架构跑通后,你可以:
- 换成 Ollama + Gemma/Ollama 推理更快
- 接入硅基流动/DeepSeek 等在线 API(无需本地 GPU)
- 加上 SSE 流式响应,体验更丝滑
- 接腾讯云 COS/阿里云 OSS 实现文件存储
有任何问题欢迎评论区交流!