GraphRAG vs 传统向量RAG：Spring AI实战对比

传统向量RAG找答案快，但多跳推理弱。GraphRAG能理清实体关系，但构建成本高。

什么时候用什么？看完这篇你就清楚了。

一、传统向量RAG的硬伤

看个例子：

复制代码

文档1：马云，阿里巴巴创始人，1964年生于杭州。
文档2：马化腾，腾讯创始人，1971年生于汕头。
用户问：马云和马化腾什么关系？

传统向量RAG会：1）把问题转成向量；2）检索相似文档（找到文档1和文档2）；3）拼给LLM。

LLM回答："两人都是中国互联网企业家。"

这不是"关系"，是属性罗列。真正的关系是竞争、业务重叠------这些信息分散在几十篇文档里，传统向量RAG检索不到。

二、原理对比

2.1 传统向量RAG流程

复制代码

文档 → Embedding → 向量数据库
                              ↓
问题 → Embedding → 相似度检索 → Top-K文档 → LLM

核心是语义相似度：问题和答案在向量空间距离近就是好答案。

优点：快、便宜
缺点：多跳推理弱，全局理解差

2.2 GraphRAG流程

复制代码

文档 → 实体抽取 → 关系抽取 → 知识图谱（图数据库）
                                            ↓
问题 → 实体识别 → 子图检索 → 相关子图 → LLM推理

核心是实体关系图：通过图遍历找到"A→关系→B"的路径。

优点：多跳推理强，全局理解好
缺点：构建慢、成本高

三、Spring AI实现：传统向量RAG

3.1 依赖配置

xml 复制代码

<!-- Spring AI + Milvus（版本号由spring-ai-bom管理，无需在此指定） -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-milvus</artifactId>
</dependency>

yaml 复制代码

# application.yml
spring:
  ai:
    vectorstore:
      milvus:
        client:
          host: localhost
          port: 19530
        collectionName: documents
        embeddingDimension: 1536

3.2 文档索引

java 复制代码

@Service
public class DocIndexService {

    private final VectorStore vectorStore;

    public DocIndexService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    // 将文档向量化并存储
    public void index(String content, String docId) {
        Document doc = new Document(
            content,
            Map.of("docId", docId, "time", System.currentTimeMillis())
        );
        // VectorStore.add内部会调用EmbeddingModel生成向量
        // 来源：Spring AI官方文档
        vectorStore.add(List.of(doc));
    }
}

3.3 相似度检索

java 复制代码

@Service
public class DocSearchService {

    private final VectorStore vectorStore;

    public DocSearchService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    // 检索Top-K相关文档
    public List<Document> search(String query, int topK) {
        // 构建SearchRequest（来源：Spring AI官方文档）
        SearchRequest request = SearchRequest.builder()
            .query(query)
            .topK(topK)
            .build();

        return vectorStore.similaritySearch(request);
    }

    // 带相似度阈值的检索
    public List<Document> searchWithThreshold(String query, int topK, double threshold) {
        SearchRequest request = SearchRequest.builder()
            .query(query)
            .topK(topK)
            .similarityThreshold(threshold)  // 过滤低相似度结果
            .build();

        return vectorStore.similaritySearch(request);
    }
}

3.4 完整RAG流程

java 复制代码

@Service
public class RAGService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public RAGService(
            ChatClient.Builder builder,
            VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.chatClient = builder
            .defaultSystem("基于上下文回答，信息不足就说无法回答")
            .build();
    }

    // 完整RAG：检索 + 生成
    public String ask(String question) {
        // 1. 检索相关文档
        List<Document> docs = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(question)
                .topK(3)
                .build()
        );

        // 2. 拼接上下文
        String context = docs.stream()
            .map(Document::getText)
            .collect(Collectors.joining("\n---\n"));

        // 3. LLM生成答案
        return chatClient.prompt()
            .user("上下文：\n" + context + "\n\n问题：" + question)
            .call()
            .content();
    }
}

以上代码可直接运行（需启动Milvus）。

四、Spring AI实现：GraphRAG

Spring AI没有GraphRAG原生支持，需集成Neo4j。

4.1 技术选型

图数据库：Neo4j（Spring Data Neo4j支持完善）
实体抽取：调用外部LLM服务（如LightRAG）
查询语言：Cypher

4.2 Spring Data Neo4j配置

xml 复制代码

<!-- Spring Data Neo4j（版本号由spring-boot-starter-parent管理） -->
<!-- 来源：https://spring.io/projects/spring-data-neo4j -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-neo4j</artifactId>
</dependency>

yaml 复制代码

spring:
  neo4j:
    uri: bolt://localhost:7687
    authentication:
      username: neo4j
      password: your_password

4.3 实体节点定义

java 复制代码

// 实体节点（人名、公司、地点等）
@Node
public class Entity {
    @Id
    private String name;
    private String type;  // PERSON, COMPANY, LOCATION
    private String description;

    @Relationship(type = "RELATED", direction = OUTGOING)
    private List<Relation> relations = new ArrayList<>();

    public Entity(String name, String type) {
        this.name = name;
        this.type = type;
    }
}

// 关系属性类（必须用 @RelationshipProperties，不能用 @Node）
// 来源：Spring Data Neo4j官方文档
@RelationshipProperties
public class Relation {
    @RelationshipId
    private String id;
    private String type;  // FOUNDED, COMPETES_WITH等
    @TargetNode
    private Entity target;
    private double confidence;

    public Relation(String id, String type, Entity target) {
        this.id = id;
        this.type = type;
        this.target = target;
    }
}

4.4 图谱查询

java 复制代码

@Service
public class GraphService {

    private final Neo4jClient neo4jClient;
    private final Neo4jTemplate template;

    public GraphService(Neo4jClient neo4jClient, Neo4jTemplate template) {
        this.neo4jClient = neo4jClient;
        this.template = template;
    }

    // 查询两个实体之间的关系路径
    // Cypher语法来源：https://neo4j.com/docs/cypher-manual/current/
    public String findPath(String e1, String e2, int maxDepth) {
        String cypher = """
            MATCH path = shortestPath(
                (a:Entity {name: $e1})-[:RELATED*..%d]-(b:Entity {name: $e2})
            )
            RETURN path
            """.formatted(maxDepth);

        return neo4jClient.query(cypher)
            .bind(e1).to("e1")
            .bind(e2).to("e2")
            .fetchAs(String.class)
            .one()
            .orElse("未找到关系路径");
    }

    // 查询某实体的所有关联实体
    public List<Entity> findRelated(String name) {
        String cypher = """
            MATCH (e:Entity {name: $name})-[:RELATED]-(related)
            RETURN related
            """;
        return template.findAll(cypher, Map.of("name", name), Entity.class);
    }
}

4.5 GraphRAG查询流程

java 复制代码

@Service
public class GraphRAGService {

    private final ChatClient chatClient;
    private final GraphService graphService;

    public GraphRAGService(
            ChatClient.Builder builder,
            GraphService graphService) {
        this.graphService = graphService;
        this.chatClient = builder
            .defaultSystem("基于图谱关系回答问题")
            .build();
    }

    public String ask(String question) {
        // 识别实体（简化版：假设格式"A和B的关系"）
        String[] entities = extractEntities(question);
        if (entities.length < 2) {
            return "请明确两个实体，如'马云和马化腾的关系'";
        }

        // 查询图谱关系路径
        String pathResult = graphService.findPath(entities[0], entities[1], 3);

        // LLM推理
        return chatClient.prompt()
            .user("图谱关系：\n" + pathResult + "\n\n问题：" + question)
            .call()
            .content();
    }

    private String[] extractEntities(String question) {
        if (question.contains("和") && question.contains("关系")) {
            return question.split("关系")[0].split("和");
        }
        return new String[0];
    }
}

注意：GraphRAG代码需配合Neo4j和实体抽取服务使用。

五、什么时候用什么

5.1 场景对比

场景	传统向量RAG	GraphRAG	原因
FAQ问答	✅ 推荐	❌ 过度	答案直接，无需推理
文档检索	✅ 推荐	❌ 成本高	找文档而非找关系
知识问答	❌ 效果差	✅ 推荐	需要实体关系推理
复杂分析	❌ 局限	✅ 推荐	多跳推理需求强
实时数据	✅ 快	❌ 慢	图谱构建需要时间

5.2 成本对比（LightRAG论文）

维度	传统向量RAG	微软GraphRAG	LightRAG
索引100万文档	~$50	~$700	~$0.05
查询延迟	<100ms	1-3秒	0.5-1秒
简单问答准确率	85%	88%	87%
多跳推理准确率	42%	78%	72%

来源：arXiv:2410.05779（查询时间：2026-05-26）

5.3 选型决策

简单问答/文档检索 → 传统向量RAG
实体关系推理 → GraphRAG
数据频繁更新 → 传统向量RAG
混合场景 → 双路召回+融合

六、踩坑总结

坑1：低估GraphRAG构建成本

微软官方警告："GraphRAG indexing can be an expensive operation"（来源：github.com/microsoft/graphrag）。

1万篇文档，构建需2-4小时
依赖高质量LLM做实体抽取

建议：小规模试验，确认效果再全量。

坑2：图数据库选型纠结

Neo4j：Spring Data支持最好，但单机版数据量受限
NebulaGraph：分布式性能好，Spring生态支持弱
HugeGraph：百度开源，国产化友好

Java项目建议Neo4j起步。

坑3：实体抽取质量不稳定

低质量LLM会导致实体识别错误、关系抽取错误。

解决方案：用高质量LLM、人工抽检图谱、设置置信度阈值。

坑4：混淆"关系"和"属性"

别用GraphRAG查属性------成本高、效果没提升。

七、下一步

先跑通传统向量RAG（Spring AI原生支持，成本低）
评估是否有复杂推理场景，有的话再引入GraphRAG
从小规模开始，监控图谱质量

GraphRAG vs 传统向量RAG：Spring AI实战对比