langgraph基于ReACT集成langchain定义的功能

之前探索了langchain基于llm判断知识图谱能否回答问题，以及生成、检查并修正cypher查询。

https://blog.csdn.net/liliang199/article/details/155074227

https://blog.csdn.net/liliang199/article/details/155093219

这里进一步探索，langgraph如何将这些功能粘合起来，组成一个可自主运行的agent系统。

首先基于langchain定义知识图谱、大模型、问题到图谱查询的转化、检查、验证、执行，然后使用langgraph，以ReACT的形式将这些功能集成到一个agent中。

1 Action状态定义

首先定义知识图谱、大模型，以及action状态数据结构，后者用于粘合langchain定义各功能模块。

1.1 知识图谱

测试数据来自blog-datasets，连接如下

https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv

复制代码

from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph()

# Import movie information

movies_query = """
LOAD CSV WITH HEADERS FROM 
'http://172.26.70.16:9000/tomasonjo.blog-datasets/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

graph.query(movies_query)

enhanced_graph = Neo4jGraph(enhanced_schema=True)

1.2 llm定义

大模型应用deepseek-r1，具备强大的处理能力。

复制代码

import os
os.environ['OPENAI_API_KEY'] = "sk-xxxx"
os.environ['OPENAI_BASE_URL'] = "http://localhost:3000/v1"

from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="deepseek-r1", temperature=0)

1.3 状态转换

定义langgraph节点的输入、输出、状态转换的基本数据结构，匹配neo4j知识图谱的处理逻辑。

复制代码

from operator import add
from typing import Annotated, List

from typing_extensions import TypedDict


class InputState(TypedDict):
    question: str


class OverallState(TypedDict):
    question: str
    next_action: str
    cypher_statement: str
    cypher_errors: List[str]
    database_records: List[dict]
    steps: Annotated[List[str], add]


class OutputState(TypedDict):
    answer: str
    steps: List[str]
    cypher_statement: str

2 langchain定义Action函数

基于之前探索内容，这里罗列langchain定义的generate_cypher、validate_cypher、correct_cypher、execute_cypher、generate_final_answer等多个模块，涵盖问题判断、cypher生成、检查、重写、执行和结果生成等多个环节。

细节内容参考。

https://blog.csdn.net/liliang199/article/details/155074227

https://blog.csdn.net/liliang199/article/details/155093219

2.1 guardrail

定义守卫节点，名称为guardrail

具体为通过prompt，让llm充当判断neo4j是否可以回答问题的守卫。

因为数据库与movie有关，这里让llm输出movie和end来指示是否能回答问题。

复制代码

from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

guardrails_system = """
As an intelligent assistant, your primary objective is to decide whether a given question is related to movies or not. 
If the question is related to movies, output "movie". Otherwise, output "end".
To make this decision, assess the content of the question and determine if it refers to any movie, actor, director, film industry, 
or related topics. Provide only the specified output: "movie" or "end".
"""
guardrails_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            guardrails_system,
        ),
        (
            "human",
            ("{question}"),
        ),
    ]
)

class GuardrailsOutput(BaseModel):
    decision: Literal["movie", "end"] = Field(
        description="Decision on whether the question is related to movies"
    )

guardrails_chain = guardrails_prompt | llm.with_structured_output(GuardrailsOutput)

def guardrails(state: InputState) -> OverallState:
    """
    Decides if the question is related to movies or not.
    """
    guardrails_output = guardrails_chain.invoke({"question": state.get("question")})
    database_records = None
    if guardrails_output.decision == "end":
        database_records = "This questions is not about movies or their cast. Therefore I cannot answer this question."
    return {
        "next_action": guardrails_output.decision,
        "database_records": database_records,
        "steps": ["guardrail"],
    }

2.2 generate_cypher

定义Cypher生成节点，名称为generate_cypher。

复制代码

from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_neo4j import Neo4jVector
from langchain_ollama import OllamaEmbeddings


examples = [
    {
        "question": "How many artists are there?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",
    },
    {
        "question": "Which actors played in the movie Casino?",
        "query": "MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name",
    },
    {
        "question": "How many movies has Tom Hanks acted in?",
        "query": "MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)",
    },
    {
        "question": "List all the genres of the movie Schindler's List",
        "query": "MATCH (m:Movie {title: 'Schindler's List'})-[:IN_GENRE]->(g:Genre) RETURN g.name",
    },
    {
        "question": "Which actors have worked in movies from both the comedy and action genres?",
        "query": "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name",
    },
    {
        "question": "Which directors have made movies with at least three different actors named 'John'?",
        "query": "MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name",
    },
    {
        "question": "Identify movies where directors also played a role in the film.",
        "query": "MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name",
    },
    {
        "question": "Find the actor with the highest number of movies in the database.",
        "query": "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1",
    },
]

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples, OllamaEmbeddings(base_url="http://localhost:11434", model="bge-m3"), Neo4jVector, k=5, input_keys=["question"]
)


from langchain_core.output_parsers import StrOutputParser

text2cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            (
                "Given an input question, convert it to a Cypher query. No pre-amble."
                "Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"
            ),
        ),
        (
            "human",
            (
                """You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.
Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!
Here is the schema information
{schema}

Below are a number of examples of questions and their corresponding Cypher queries.

{fewshot_examples}

User input: {question}
Cypher query:"""
            ),
        ),
    ]
)

text2cypher_chain = text2cypher_prompt | llm | StrOutputParser()


def generate_cypher(state: OverallState) -> OverallState:
    """
    Generates a cypher statement based on the provided schema and user input
    """
    NL = "\n"
    fewshot_examples = (NL * 2).join(
        [
            f"Question: {el['question']}{NL}Cypher:{el['query']}"
            for el in example_selector.select_examples(
                {"question": state.get("question")}
            )
        ]
    )
    generated_cypher = text2cypher_chain.invoke(
        {
            "question": state.get("question"),
            "fewshot_examples": fewshot_examples,
            "schema": enhanced_graph.schema,
        }
    )
    return {"cypher_statement": generated_cypher, "steps": ["generate_cypher"]}

2.3 validate_cypher

定义cypher检查节点，名称为validate_cypher

复制代码

from typing import List, Optional

validate_cypher_system = """
You are a Cypher expert reviewing a statement written by a junior developer.
"""

validate_cypher_user = """You must check the following:
* Are there any syntax errors in the Cypher statement?
* Are there any missing or undefined variables in the Cypher statement?
* Are any node labels missing from the schema?
* Are any relationship types missing from the schema?
* Are any of the properties not included in the schema?
* Does the Cypher statement include enough information to answer the question?

Examples of good errors:
* Label (:Foo) does not exist, did you mean (:Bar)?
* Property bar does not exist for label Foo, did you mean baz?
* Relationship FOO does not exist, did you mean FOO_BAR?

Schema:
{schema}

The question is:
{question}

The Cypher statement is:
{cypher}

Make sure you don't make any mistakes!"""

validate_cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            validate_cypher_system,
        ),
        (
            "human",
            (validate_cypher_user),
        ),
    ]
)


class Property(BaseModel):
    """
    Represents a filter condition based on a specific node property in a graph in a Cypher statement.
    """

    node_label: str = Field(
        description="The label of the node to which this property belongs."
    )
    property_key: str = Field(description="The key of the property being filtered.")
    property_value: str = Field(
        description="The value that the property is being matched against."
    )


class ValidateCypherOutput(BaseModel):
    """
    Represents the validation result of a Cypher query's output,
    including any errors and applied filters.
    """

    errors: Optional[List[str]] = Field(
        description="A list of syntax or semantical errors in the Cypher statement. Always explain the discrepancy between schema and Cypher statement"
    )
    filters: Optional[List[Property]] = Field(
        description="A list of property-based filters applied in the Cypher statement."
    )


validate_cypher_chain = validate_cypher_prompt | llm.with_structured_output(
    ValidateCypherOutput
)


from langchain_neo4j.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema

# Cypher query corrector is experimental
corrector_schema = [
    Schema(el["start"], el["type"], el["end"])
    for el in enhanced_graph.structured_schema.get("relationships")
]
cypher_query_corrector = CypherQueryCorrector(corrector_schema)


from neo4j.exceptions import CypherSyntaxError


def validate_cypher(state: OverallState) -> OverallState:
    """
    Validates the Cypher statements and maps any property values to the database.
    """
    errors = []
    mapping_errors = []
    # Check for syntax errors
    try:
        enhanced_graph.query(f"EXPLAIN {state.get('cypher_statement')}")
    except CypherSyntaxError as e:
        errors.append(e.message)
    # Experimental feature for correcting relationship directions
    corrected_cypher = cypher_query_corrector(state.get("cypher_statement"))
    if not corrected_cypher:
        errors.append("The generated Cypher statement doesn't fit the graph schema")
    if not corrected_cypher == state.get("cypher_statement"):
        print("Relationship direction was corrected")
    # Use LLM to find additional potential errors and get the mapping for values
    llm_output = validate_cypher_chain.invoke(
        {
            "question": state.get("question"),
            "schema": enhanced_graph.schema,
            "cypher": state.get("cypher_statement"),
        }
    )
    if llm_output.errors:
        errors.extend(llm_output.errors)
    if llm_output.filters:
        for filter in llm_output.filters:
            # Do mapping only for string values
            if (
                not [
                    prop
                    for prop in enhanced_graph.structured_schema["node_props"][
                        filter.node_label
                    ]
                    if prop["property"] == filter.property_key
                ][0]["type"]
                == "STRING"
            ):
                continue
            mapping = enhanced_graph.query(
                f"MATCH (n:{filter.node_label}) WHERE toLower(n.`{filter.property_key}`) = toLower($value) RETURN 'yes' LIMIT 1",
                {"value": filter.property_value},
            )
            if not mapping:
                print(
                    f"Missing value mapping for {filter.node_label} on property {filter.property_key} with value {filter.property_value}"
                )
                mapping_errors.append(
                    f"Missing value mapping for {filter.node_label} on property {filter.property_key} with value {filter.property_value}"
                )
    if mapping_errors:
        next_action = "end"
    elif errors:
        next_action = "correct_cypher"
    else:
        next_action = "execute_cypher"

    return {
        "next_action": next_action,
        "cypher_statement": corrected_cypher,
        "cypher_errors": errors,
        "steps": ["validate_cypher"],
    }

2.4 correct_cypher

定义cypher重写节点，名称为correct_cypher

复制代码

correct_cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            (
                "You are a Cypher expert reviewing a statement written by a junior developer. "
                "You need to correct the Cypher statement based on the provided errors. No pre-amble."
                "Do not wrap the response in any backticks or anything else. Respond with a Cypher statement only!"
            ),
        ),
        (
            "human",
            (
                """Check for invalid syntax or semantics and return a corrected Cypher statement.

Schema:
{schema}

Note: Do not include any explanations or apologies in your responses.
Do not wrap the response in any backticks or anything else.
Respond with a Cypher statement only!

Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.

The question is:
{question}

The Cypher statement is:
{cypher}

The errors are:
{errors}

Corrected Cypher statement: """
            ),
        ),
    ]
)

correct_cypher_chain = correct_cypher_prompt | llm | StrOutputParser()

def correct_cypher(state: OverallState) -> OverallState:
    """
    Correct the Cypher statement based on the provided errors.
    """
    corrected_cypher = correct_cypher_chain.invoke(
        {
            "question": state.get("question"),
            "errors": state.get("cypher_errors"),
            "cypher": state.get("cypher_statement"),
            "schema": enhanced_graph.schema,
        }
    )

    return {
        "next_action": "validate_cypher",
        "cypher_statement": corrected_cypher,
        "steps": ["correct_cypher"],
    }

2.5 execute_cypher

定义cypher执行节点，名称为execute_cypher

复制代码

no_results = "I couldn't find any relevant information in the database"


def execute_cypher(state: OverallState) -> OverallState:
    """
    Executes the given Cypher statement.
    """

    records = enhanced_graph.query(state.get("cypher_statement"))
    return {
        "database_records": records if records else no_results,
        "next_action": "end",
        "steps": ["execute_cypher"],
    }

2.5 generate_final_answer

定义结果生成节点，名称为generate_final_answer，是这个处理过程的最后一步。

复制代码

generate_final_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant",
        ),
        (
            "human",
            (
                """Use the following results retrieved from a database to provide
a succinct, definitive answer to the user's question.

Respond as if you are answering the question directly.

Results: {results}
Question: {question}"""
            ),
        ),
    ]
)

generate_final_chain = generate_final_prompt | llm | StrOutputParser()

final_answer = generate_final_chain.invoke(
        {"question": question, "results": records}
    )

print(f"final_answer: {final_answer}")

def generate_final_answer(state: OverallState) -> OutputState:
    """
    Decides if the question is related to movies.
    """
    final_answer = generate_final_chain.invoke(
        {"question": state.get("question"), "results": state.get("database_records")}
    )
    return {"answer": final_answer, "steps": ["generate_final_answer"]}

3 langgraph集成状态函数

langchain基于ReACT机制集成langchain定义的多种模块，将其组合成一个网状图结构，类似于目前各种agent定义的图。

3.1 状态转移定义

首先是状态转移，用"end"表示结束运行，所以对于涉及到分叉的功能模块，比如guardrail、validate_cyper，档碰到"end"时都需要直接跳转到最后一步generate_final_answer。

对于guardrail，如果next_action是end，直接跳到generate_final_answer结束，否则跳到"generate_cypher"，表示下一步是生成cypher。

对于validate_cypher，如果next_action是end，跳到generate_final_answer；否则按具体情况分别跳到correct_cypher或execute_cypher。

代码示例如下。

复制代码

def guardrails_condition(
    state: OverallState,
) -> Literal["generate_cypher", "generate_final_answer"]:
    if state.get("next_action") == "end":
        return "generate_final_answer"
    elif state.get("next_action") == "movie":
        return "generate_cypher"


def validate_cypher_condition(
    state: OverallState,
) -> Literal["generate_final_answer", "correct_cypher", "execute_cypher"]:
    if state.get("next_action") == "end":
        return "generate_final_answer"
    elif state.get("next_action") == "correct_cypher":
        return "correct_cypher"
    elif state.get("next_action") == "execute_cypher":
        return "execute_cypher"

3.2 langgraph集成

这里通过langgraph定义的状态图，将上述langchain定义的功能集成到一起。

1）节点创建

首先是状态节点创建，针对langchain定义的处理函数，以node的形式加入状态图，分别为guardrails、generate_cypher、validate_cypher、correct_cypher、execute_cypher、generate_final_answer，节点名称分别对应处理函数返回的steps字段。

2）状态转移边

状态转移边分为两种情况。

对于可以直接转移的状态，比如StART到guardrails的转移、execute_cypher到generate_final_answer、correct_cypher到validate_cypher、generate_final_answer到END，则直接添加转移边。

对于存在状态分叉的情况，比如guardrail、validate_cypher，则把3.1定义的条件状态转移函数，以add_conditional_edges的方式添加到状态图中。

代码示例如下所示。

复制代码

from IPython.display import Image, display
from langgraph.graph import END, START, StateGraph

langgraph = StateGraph(OverallState, input=InputState, output=OutputState)
langgraph.add_node(guardrails)
langgraph.add_node(generate_cypher)
langgraph.add_node(validate_cypher)
langgraph.add_node(correct_cypher)
langgraph.add_node(execute_cypher)
langgraph.add_node(generate_final_answer)

langgraph.add_edge(START, "guardrails")
langgraph.add_conditional_edges(
    "guardrails",
    guardrails_condition,
)
langgraph.add_edge("generate_cypher", "validate_cypher")
langgraph.add_conditional_edges(
    "validate_cypher",
    validate_cypher_condition,
)
langgraph.add_edge("execute_cypher", "generate_final_answer")
langgraph.add_edge("correct_cypher", "validate_cypher")
langgraph.add_edge("generate_final_answer", END)

langgraph = langgraph.compile()

# View
display(Image(langgraph.get_graph().draw_mermaid_png()))

这是编译后的状态图

3.3 问答运行测试

首先，问一个与图谱中存储的电影不相关的问题"what's the weather in Spain?"

复制代码

langgraph.invoke({"question": "What's the weather in Spain?"})

输出如下，steps显示状态直接从guardrail转到generate_final_answer，说明这个问题在guardrail环节就被拦截，因为与电影话题不相关。

{'answer': 'The information provided does not relate to movies or their cast, so I cannot answer this question.',

'steps': $'guardrail', 'generate_final_answer'$ }

然后，问一些与电影相关的问题，比如"请列举电影类型"

复制代码

# langgraph.invoke({"question": "What was the cast of the Casino?"})

# question = "What was the cast of the Casino?"
question = "请列举电影类型？"

langgraph.invoke({"question": question})

输出如下，可见langgraph正常回复了问题，而且steps字段显示，回答过程经过guardrail、generate_cypher、validate_cypher、generate_final_answer等过程。

因为问题与电影话题相关，所以直接进入知识库回答流程，并没有在guardrail环节被拦截。

{'answer': '电影类型多种多样，常见的包括：动作、喜剧、剧情、科幻、恐怖、爱情、冒险、犯罪、奇幻、悬疑、战争、音乐、纪录片、动画、西部、惊悚、家庭、历史、传记、体育、超级英雄、灾难、黑色电影等。部分类型还可细分（如心理恐怖、太空科幻）或融合（如浪漫喜剧、科幻惊悚）。不同文化背景也可能衍生独特类型（如武侠片、宝莱坞歌舞片）。',

'steps': ['guardrail',

'generate_cypher',

'validate_cypher',

'generate_final_answer'],

'cypher_statement': 'MATCH (g:Genre) RETURN g.name'}

reference

langchain如何判断neo4j知识图谱是否能回答问题

https://blog.csdn.net/liliang199/article/details/155074227

langchain如何检查llm生成cypher的正确性并修正错误

https://blog.csdn.net/liliang199/article/details/155093219

如何结合langchain、neo4j实现关联检索问答-续

https://blog.csdn.net/liliang199/article/details/153731597