【技术实战】用 Spring Boot + Vue3 + LM Studio 在本地跑通 RAG 知识库

前言

RAG（检索增强生成）是当前 AI 应用最火的方向之一。但很多人卡在"本地跑通"这一步------要么 API 不稳定，要么向量数据库配置复杂，要么代码架构一团糟。

今天分享一个我在本地跑通的 RAG 知识库项目，技术栈：

后端：Spring Boot 3 + MyBatis Plus + Qdrant 向量数据库
前端：Vue3 + Element Plus
LLM：LM Studio（本地 qwen3-vl-4b）
Embedding：本地 text-embedding-bge-base-zh-v1.5

一、项目架构

scss 复制代码

┌─────────────────────────────────────────────────────┐
│                    前端 (Vue3)                       │
│         文档上传 / 对话界面 / RAG 配置                 │
└──────────────────┬──────────────────────────────────┘
                   │ HTTP / SSE
┌──────────────────▼──────────────────────────────────┐
│              后端 (Spring Boot 3)                    │
│  ┌──────────────┴──────────────┐                    │
│  │       ChatController        │                    │
│  │    /api/chat/ask (问答)     │                    │
│  └──────────────┬──────────────┘                    │
│                  │                                   │
│  ┌──────────────▼──────────────┐                    │
│  │      ChatService           │                    │
│  │  1. 问题向量化              │                    │
│  │  2. Qdrant 相似度检索        │                    │
│  │  3. 构建 Prompt             │                    │
│  │  4. 调用 LM Studio LLM      │                    │
│  └──────────────┬──────────────┘                    │
│                  │                                   │
│  ┌──────────────▼──────────────┐                    │
│  │  VectorizationService       │                    │
│  │  文档切分 → Embedding →    │                    │
│  │  Qdrant 存储               │                    │
│  └────────────────────────────┘                    │
└─────────────────────────────────────────────────────┘
         │                            │
         ▼                            ▼
┌─────────────────┐         ┌─────────────────────────┐
│  Qdrant         │         │  LM Studio              │
│  (向量数据库)    │         │  (本地 LLM + Embedding) │
│  localhost:6333 │         │  localhost:1234          │
└─────────────────┘         └─────────────────────────┘

二、核心代码实现

2.1 依赖配置（pom.xml）

xml 复制代码

<dependencies>
    <!-- Spring Boot -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- MySQL + MyBatis Plus -->
    <dependency>
        <groupId>com.baomidou</groupId>
        <artifactId>mybatis-plus-spring-boot3-starter</artifactId>
        <version>3.5.5</version>
    </dependency>
    <dependency>
        <groupId>com.mysql</groupId>
        <artifactId>mysql-connector-j</artifactId>
    </dependency>

    <!-- Qdrant 向量数据库客户端 -->
    <dependency>
        <groupId>io.qdrant</groupId>
        <artifactId>qdrant-client</artifactId>
        <version>1.7.0</version>
    </dependency>

    <!-- HTTP 客户端（调用 LM Studio） -->
    <dependency>
        <groupId>com.squareup.okhttp3</groupId>
        <artifactId>okhttp</artifactId>
        <version>4.12.0</version>
    </dependency>

    <!-- JSON 处理 -->
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson2</artifactId>
        <version>2.0.43</version>
    </dependency>

    <!-- 文档解析 -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>3.0.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>5.2.5</version>
    </dependency>

    <!-- WebClient（SSE 流式响应用） -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>
</dependencies>

2.2 配置 application.yml

yaml 复制代码

server:
  port: 8080

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/doc_qa?useUnicode=true&characterEncoding=utf8
    username: root
    password: your_password

# LM Studio（LLM + Embedding）
lmstudio:
  base-url: http://localhost:1234
  api-key: sk-lm-xxxxx
  chat-model: qwen/qwen3-vl-4b
  embeddings-model: text-embedding-bge-base-zh-v1.5

# Qdrant 向量数据库
qdrant:
  host: localhost
  port: 6333
  collection-name: documents

# RAG 配置
rag:
  top-k: 5
  similarity-threshold: 0.3
  chunk-size: 1000
  chunk-overlap: 100

2.3 向量服务（核心）

scss 复制代码

@Service
public class VectorizationService {

    @Value("${qdrant.host}")
    private String qdrantHost;

    @Value("${qdrant.port}")
    private int qdrantPort;

    @Value("${lmstudio.base-url}")
    private String lmstudioUrl;

    @Value("${lmstudio.embeddings-model}")
    private String embeddingModel;

    @Value("${rag.chunk-size}")
    private int chunkSize;

    @Value("${rag.chunk-overlap}")
    private int chunkOverlap;

    private final QdrantClient qdrantClient;
    private final OkHttpClient httpClient;

    public VectorizationService() {
        this.qdrantClient = new QdrantClient("localhost", 6333");
        this.httpClient = new OkHttpClient.Builder()
                .connectTimeout(30, TimeUnit.SECONDS)
                .readTimeout(60, TimeUnit.SECONDS)
                .build();
    }

    /**
     * 文本向量化
     */
    public float[] embedText(String text) throws IOException {
        String url = lmstudioUrl + "/v1/embeddings";

        Map<String, Object> body = new HashMap<>();
        body.put("model", embeddingModel);
        body.put("input", Collections.singletonList(text));

        String json = JSON.toJSONString(body);
        Request request = new Request.Builder()
                .url(url)
                .addHeader("Authorization", "Bearer " + getLmApiKey())
                .addHeader("Content-Type", "application/json")
                .post(RequestBody.create(json, MediaType.parse("application/json")))
                .build();

        try (Response response = httpClient.newCall(request).execute()) {
            String responseBody = response.body().string();
            Map<String, Object> result = JSON.parseObject(responseBody);

            List<?> data = (List<?>) result.get("data");
            Map<?, ?> embeddingData = (Map<?, ?>) data.get(0);
            List<?> embedding = (List<?>) embeddingData.get("embedding");

            float[] vector = new float[embedding.size()];
            for (int i = 0; i < embedding.size(); i++) {
                vector[i] = ((Number) embedding.get(i)).floatValue();
            }
            return vector;
        }
    }

    /**
     * 智能文本切分（按句子断句，避免在句子中间切断）
     */
    public List<String> chunkText(String text) {
        List<String> chunks = new ArrayList<>();
        // 按换行或句号分割句子
        String[] sentences = text.split("(?<=[。！？.!?\n])");

        StringBuilder current = new StringBuilder();
        for (String sentence : sentences) {
            if (current.length() + sentence.length() > chunkSize) {
                if (current.length() > 0) {
                    chunks.add(current.toString().trim());
                    // 重叠部分
                    String tail = current.toString();
                    current = new StringBuilder(
                            tail.length() > chunkOverlap
                                    ? tail.substring(tail.length() - chunkOverlap)
                                    : tail
                    );
                }
            }
            current.append(sentence);
        }
        if (current.length() > 0) {
            chunks.add(current.toString().trim());
        }
        return chunks;
    }

    /**
     * 存储到 Qdrant
     */
    public void store(String documentId, List<String> chunks) throws Exception {
        List<PointStruct> points = new ArrayList<>();
        List<Float[]> vectors = new ArrayList<>();

        int idx = 0;
        for (String chunk : chunks) {
            float[] vector = embedText(chunk);
            Float[] v = new Float[vector.length];
            for (int i = 0; i < vector.length; i++) v[i] = vector[i];

            PointStruct point = PointStruct.of(
                    UUID.randomUUID().toString(),
                    Arrays.stream(vector).boxed().toList(),
                    Map.of(
                            "document_id", documentId,
                            "chunk_index", idx,
                            "content", chunk
                    )
            );
            points.add(point);
            vectors.add(v);
            idx++;
        }

        qdrantClient.upsert(
                CollectionName,
                points
        );
    }

    /**
     * 相似度搜索
     */
    public List<SearchResult> search(String query, int topK) throws Exception {
        float[] queryVector = embedText(query);

        SearchResult[] results = qdrantClient.search(
                CollectionName,
                SearchVector.from(queryVector),
                Limit.of(topK),
                WithPayload.of(true),
                ScoreThreshold.of(0.3f)
        );

        List<SearchResult> list = new ArrayList<>();
        for (ScoredPoint point : results) {
            list.add(new SearchResult(
                    point.getId().toString(),
                    (String) point.getPayload().get("content"),
                    (String) point.getPayload().get("document_id"),
                    point.getScore()
            ));
        }
        return list;
    }
}

2.4 Chat 服务（完整 RAG 链路）

ini 复制代码

@Service
public class ChatService {

    @Value("${lmstudio.chat-model}")
    private String chatModel;

    private final VectorizationService vectorizationService;
    private final OkHttpClient httpClient;

    private static final String SYSTEM_PROMPT = """
            你是一个专业的知识库问答助手。请根据提供的上下文信息，
            准确、简洁地回答用户的问题。
            如果上下文没有相关信息，请说明"抱歉，知识库中没有找到相关内容"。
            请用中文回答。
            """;

    public ChatService(VectorizationService vectorizationService) {
        this.vectorizationService = vectorizationService;
        this.httpClient = new OkHttpClient.Builder()
                .readTimeout(120, TimeUnit.SECONDS)
                .build();
    }

    /**
     * 完整 RAG 链路：检索 → 构建 Prompt → LLM 生成
     */
    public String answer(String question, int topK) throws Exception {
        // Step 1: 检索相关文档
        List<VectorizationService.SearchResult> docs =
                vectorizationService.search(question, topK);

        if (docs.isEmpty()) {
            return "抱歉，知识库中没有找到与您问题相关的内容。";
        }

        // Step 2: 构建上下文
        StringBuilder context = new StringBuilder();
        for (VectorizationService.SearchResult doc : docs) {
            context.append("【").append(doc.getDocumentName()).append("】\n");
            context.append(doc.getContent()).append("\n\n");
        }

        // Step 3: 调用 LLM
        String prompt = String.format("""
                %s

                参考信息：
                %s

                用户问题：%s
                """,
                SYSTEM_PROMPT,
                context.toString(),
                question
        );

        return callLLM(prompt);
    }

    /**
     * 调用 LM Studio LLM
     */
    private String callLLM(String prompt) throws IOException {
        String url = lmstudioUrl + "/v1/chat/completions";

        Map<String, Object> body = new HashMap<>();
        body.put("model", chatModel);
        body.put("messages", List.of(
                Map.of("role", "user", "content", prompt)
        ));
        body.put("max_tokens", 1000);
        body.put("temperature", 0.7);

        Request request = new Request.Builder()
                .url(url)
                .addHeader("Authorization", "Bearer " + getLmApiKey())
                .addHeader("Content-Type", "application/json")
                .post(RequestBody.create(
                        JSON.toJSONString(body),
                        MediaType.parse("application/json")
                ))
                .build();

        try (Response response = httpClient.newCall(request).execute()) {
            String responseBody = response.body().string();
            Map<String, Object> result = JSON.parseObject(responseBody);

            List<?> choices = (List<?>) result.get("choices");
            Map<?, ?> choice = (Map<?, ?>) choices.get(0);
            Map<?, ?> message = (Map<?, ?>) choice.get("message");
            return (String) message.get("content");
        }
    }
}

2.5 Controller

less 复制代码

@RestController
@RequestMapping("/api/chat")
public class ChatController {

    private final ChatService chatService;

    @Value("${rag.top-k}")
    private int topK;

    public ChatController(ChatService chatService) {
        this.chatService = chatService;
    }

    @PostMapping("/ask")
    public ApiResponse<Map<String, Object>> ask(@RequestBody Map<String, String> request) {
        try {
            String question = request.get("question");
            if (question == null || question.trim().isEmpty()) {
                return ApiResponse.error("问题不能为空");
            }

            // 获取相关文档
            List<VectorizationService.SearchResult> results =
                    chatService.searchRelevantDocuments(question, topK);

            // 构建来源信息
            List<Map<String, Object>> sources = new ArrayList<>();
            StringBuilder context = new StringBuilder();

            for (VectorizationService.SearchResult result : results) {
                context.append("【").append(result.getDocumentName()).append("】\n");
                context.append(result.getContent()).append("\n\n");

                sources.add(Map.of(
                        "documentName", result.getDocumentName(),
                        "content", result.getContent(),
                        "score", result.getScore()
                ));
            }

            // 生成回答
            String answer = chatService.generateAnswer(question, context.toString());

            return ApiResponse.success(Map.of(
                    "question", question,
                    "answer", answer,
                    "sources", sources,
                    "contextCount", sources.size()
            ));
        } catch (Exception e) {
            return ApiResponse.error("问答失败: " + e.getMessage());
        }
    }
}

三、本地环境准备

3.1 安装 LM Studio

下载地址：lmstudio.ai/

下载后：

搜索并下载 qwen/qwen3-vl-4b 模型（4B 参数，内存占用 ~4GB）
下载 text-embedding-bge-base-zh-v1.5 Embedding 模型
启动本地服务器（Local Server），API Token 会自动生成

3.2 安装 Qdrant

bash 复制代码

# Windows (Docker)
docker run -d -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

# 或者下载独立版
# https://github.com/qdrant/qdrant/releases

3.3 启动服务顺序

bash 复制代码

# 1. MySQL（确保已安装）
# 2. Qdrant
docker run -d -p 6333:6333 qdrant/qdrant

# 3. LM Studio（启动后点击 "Local Server"）
# 4. 后端
cd backend && mvn spring-boot:run

# 5. 前端
cd frontend && npm install && npm run dev

四、实际效果演示

上传一份公司文档后，提问测试：

Q：公司的核心业务是什么？

答：公司的核心业务是人工智能、大数据分析、云计算服务三大板块。

Q：公司有多少员工？

答：公司员工超过 5000 人，其中研发人员占比 60%。

来源卡片会显示检索到的文档片段和相似度分数。

五、关键踩坑记录

❌ 坑 1：LM Studio API Token 不一致

LM Studio 有两套 API：

Local Server API （/v1/chat/completions）：需要 Bearer Token
Developer API （/api/v1/chat）：可能免 Token

解决：使用 Local Server 时，必须在请求头加上：

code复制

makefile 复制代码

Authorization: Bearer sk-lm-xxxxx

❌ 坑 2：Embedding 模型向量维度不匹配

如果 Qdrant 集合创建时维度和 Embedding 输出维度不一致，会导致存储失败。

解决：先测试一次 Embedding 获取实际维度：

bash 复制代码

curl http://localhost:1234/v1/embeddings \
  -H "Authorization: Bearer $LM_API_TOKEN" \
  -d '{"model":"text-embedding-bge-base-zh-v1.5","input":["测试"]}'
# 返回的 embedding 数组长度就是向量维度

❌ 坑 3：PDF 解析乱码

Apache PDFBox 解析中文 PDF 时乱码，需要指定字体。

结语

这套架构跑通后，你可以：

换成 Ollama + Gemma/Ollama 推理更快
接入硅基流动/DeepSeek 等在线 API（无需本地 GPU）
加上 SSE 流式响应，体验更丝滑
接腾讯云 COS/阿里云 OSS 实现文件存储

有任何问题欢迎评论区交流！