【 LangChain v1.2 入门系列教程】【四】结构化输出，让 Agent 返回可预测的结构

系列文章目录

【 LangChain v1.2 入门系列教程】【一】开篇入门 | 从零开始，跑通你的第一个 AI Agent
【 LangChain v1.2 入门系列教程】【二】消息类型与提示词工程
 【 LangChain v1.2 入门系列教程】【三】工具（Tools）开发，让 Agent 连接外部世界
 【 LangChain v1.2 入门系列教程】【四】结构化输出，让 Agent 返回可预测的结构
 【 LangChain v1.2 入门系列教程】【五】记忆管理，让 Agent 记住对话
 【 LangChain v1.2 入门系列教程】【六】流式输出，让 Agent 告别"想好了再说"

文章目录

系列文章目录
前言
一、response_format参数
二、选择策略
- [1. 自动选择策略](#1. 自动选择策略)
- [2. 显式用 ToolStrategy](#2. 显式用 ToolStrategy)
- [3. 显式用 ProviderStrategy](#3. 显式用 ProviderStrategy)
[三、定义 Schema：支持四种数据结构](#三、定义 Schema：支持四种数据结构)
- [1. Pydantic 模型](#1. Pydantic 模型)
- [2 . Python Dataclass（轻量选择）](#2 . Python Dataclass（轻量选择）)
- [3. 类型字典 (TypedDict)](#3. 类型字典 (TypedDict))
- [4. JSON](#4. JSON)
四、错误处理与重试机制
- 1.默认行为：自动重试
- [2. 自定义错误提示](#2. 自定义错误提示)
- [3 仅处理特定异常](#3 仅处理特定异常)
- 4.自定义错误处理函数
总结

前言

在构建实际AI应用时，我们经常需要Agent返回能被程序直接解析的数据，比如需要从回复中提取"姓名、时间、地点"等关键信息，Agent返回的是自然语言（如"用户叫张三，时间是明天"），第二次可能返回（如"用户姓名为张三，预约时间为明天"），如果我们靠正则、字符串切片根本无法覆盖所有场景，极易出现解析异常。结构化输出就是为了解决这个问题：它让Agent直接返回JSON、Pydantic模型等标准数据格式。本文将带你全面掌握LangChain v1.2中实现结构化输出的核心方法。

一、response_format参数

LangChain v1.2 中，结构化输出通过 create_agent() 的 response_format 参数控制，它会自动处理底层细节，将结构化结果存入Agent最终状态的 structured_response 键中。

javascript 复制代码

from langchain.agents import create_agent
from dataclasses import dataclass

@dataclass
class User:
    id: str
    name: str

agent = create_agent(
    model=llm,  # 模型
    response_format=User,  # 输出格式 Schema ，需提前定义
)
result = agent.invoke({"messages": [{"role": "user", "content": "..."}]})
structured_data = result["structured_response"]  # 获取结构化数据

二、选择策略

实现结构化输出支持三种策略：

策略	适用场景	底层机制
ProviderStrategy	OpenAI、Claude、Gemini 等原生支持结构化输出的模型	调用厂商原生 API（最可靠）
ToolStrategy	几乎所有支持工具调用的模型	通过工具调用模拟结构化输出（兼容性极广）
自动选择	不确定模型能力时	LangChain 自动判断（ProviderStrategy、ToolStrategy 二选一）

1. 自动选择策略

最简单的方式：直接传入 Schema 类型，LangChain 自动选择最优策略，上面"一、response_format参数"中的示例就是自动选择策略。

python 复制代码

agent = create_agent(
    model=llm,  # 模型
    response_format=YourSchema,  # 输出格式，需提前定义
)

2. 显式用 ToolStrategy

当你需要精细控制错误处理或工具消息时，显式配置

python 复制代码

from langchain.agents.structured_output import ToolStrategy

agent = create_agent(
    model=llm,# 模型
    response_format=ToolStrategy(
        schema=YourSchema, 
        handle_errors=True  # 自动重试（默认开启）
    )
)

3. 显式用 ProviderStrategy

当你明确知道模型支持原生结构化输出，保证输出的可靠性时，显示配置

python 复制代码

from langchain.agents.structured_output import ProviderStrategy

agent = create_agent(
    model=llm,# 模型
    response_format=ProviderStrategy(
        schema=YourSchema, 
        strict=True  #开启严格模式，默认 None，开启需要确认模型是否支持
    )
)

三、定义 Schema：支持四种数据结构

LangChain 支持多种 Schema 定义方式，推荐 Pydantic（功能最全）或 Dataclass（轻量）。

1. Pydantic 模型

对于Pydantic 模型我们并不陌生在上一章节（Tool工具讲解）中有过介绍，它支持完整的字段校验、描述说明、必填 / 可选控制、条件控制，兼容性最强，返回值是Pydantic对象实例，生产环境首选方案。

python 复制代码

from pydantic import BaseModel, Field
from typing import Optional
from langchain.agents import create_agent

# 定义Schema
class Movie(BaseModel):
    """电影参数"""
    year: str = Field(description="上映年份")
    actor: str = Field(description="演员")
    director: str = Field(description="导演")
    rating: Optional[int | None] = Field(
        default=None,#默认None
        description="评分（1-10分）",
        ge=1,  # 最小值
        le=10  # 最大值
    )  # 可选参数



# 创建agent
agent = create_agent(
    model=llm,
    system_prompt="你是一个智能助手，请根据用户提问作答",
    response_format=Movie,
)


# 调用模型
response = agent.invoke({"messages": [HumanMessage("泰坦尼克号")]})

structured_response=response["structured_response"] #返回Pydantic对象实例
print(structured_response)#输出year='1997' actor='莱昂纳多·迪卡普里奥' director='詹姆斯·卡梅隆' rating=9
print(structured_response.year) #上映年份 输出：1997
print(structured_response.actor) #演员 输出：莱昂纳多·迪卡普里奥
print(structured_response.director) #导演 输出：詹姆斯·卡梅隆
print(structured_response.rating) #评分 输出：9

2 . Python Dataclass（轻量选择）

轻量级选择，适合简单场景。返回值是对象实例。

python 复制代码

from dataclasses import dataclass
from typing import Optional

@dataclass
class SimpleContact:
    name: str
    phone: str
    email: Optional[str] = None #可选,默认值None


# 创建agent
agent = create_agent(
    model=llm,
    system_prompt="你是一个智能助手，请根据用户提问作答",
    response_format=SimpleContact,
)


# 调用模型
response = agent.invoke(
    {"messages": [HumanMessage("张三，手机号15555555555，邮箱 123456@qq.com")]}
)

structured_response = response["structured_response"] #返回SimpleContact对象实例

print(structured_response.name)  # 输出：张三
print(structured_response.phone)  # 输出：15555555555
print(structured_response.email)  # 输出：123456@qq.com

3. 类型字典 (TypedDict)

适合与现有类型系统集成，返回值是字典。

python 复制代码

from typing import TypedDict

class SimpleContact(TypedDict):
    name: str
    phone: str
    email: str


# 创建agent
agent = create_agent(
    model=llm,
    system_prompt="你是一个智能助手，请根据用户提问作答",
    response_format=SimpleContact,
)


# 调用模型
response = agent.invoke(
    {"messages": [HumanMessage("张三，手机号15555555555，邮箱 123456@qq.com")]}
)

structured_response = response["structured_response"] #返回字典

print(structured_response["name"])  # 输出：张三
print(structured_response["phone"])  # 输出：15555555555
print(structured_response['email'])  # 输出：123456@qq.com

4. JSON

跨语言、跨框架通用的格式，适合已有 JSON Schema 规范、需要和其他系统对接的场景，返回值是字典

python 复制代码

# 标准JSON Schema定义
simple_contact_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string", "description": "用户姓名"},
        "phone": {"type": "string", "description": "用户手机号码"},
        "email": {"type": "string", "description": "用户邮箱地址"}
    },
    "required": ["name",  "phone"],
    "additionalProperties": False
}    


# 创建agent
agent = create_agent(
    model=llm,
    system_prompt="你是一个智能助手，请根据用户提问作答",
    response_format=simple_contact_schema,
)


# 调用模型
response = agent.invoke(
    {"messages": [HumanMessage("张三，手机号15555555555，邮箱 123456@qq.com")]}
)

structured_response = response["structured_response"]

print(structured_response["name"])  # 输出：张三
print(structured_response["phone"])  # 输出：15555555555
print(structured_response['email'])  # 输出：123456@qq.com

四、错误处理与重试机制

模型生成结构化输出时可能出错：格式不对、字段缺失、值超出范围。LangChain v1.2 提供智能重试机制。

1.默认行为：自动重试

python 复制代码

class Movie(BaseModel):
    """电影参数"""
    year: str = Field(description="上映年份")
    actor: str = Field(description="演员")
    director: str = Field(description="导演")
    rating: Optional[int | None] = Field(
        default=None,
        description="评分（1-10分）",
        ge=1,  # 最小值
        le=10  # 最大值
    )  # 可选参数，默认None
    
agent = create_agent(
    model=llm,
    response_format=ToolStrategy(Movie)  # handle_errors=True 默认开启
)

# 调用模型
response = agent.invoke({"messages": [HumanMessage("""
       提取电影信息：
        片名：测试电影
        年份：2024
        主演：测试演员
        导演：测试导演
        评分：15（这是特殊评分系统，允许超过10分）
        重要：必须如实提取评分值，不要修改，即使超过10也要保留原值！
""")]})

# 模型返回 rating=15（超出范围），系统会自动：
# 1. 捕获 ValidationError
# 2. 向模型发送错误提示（ToolMessage）
# 3. 要求模型重新生成
# 4. 返回正确的结构化数据

2. 自定义错误提示

python 复制代码

agent = create_agent(
    model=llm,
    response_format=ToolStrategy(
        schema=Movie, 
        handle_errors="评分必须在 1-10 之间，请检查并重新提供。"
    ),
)

3 仅处理特定异常

python 复制代码

from pydantic import ValidationError

agent = create_agent(
    model=llm,
    response_format=ToolStrategy(
        schema=Movie,
        handle_errors=ValidationError  # 仅对 ValidationError 重试，其他异常直接抛出
    )
)

4.自定义错误处理函数

python 复制代码

# 自定义错误处理函数
def custom_error_handler(error: Exception) -> str:
    """自定义错误处理逻辑"""
    if isinstance(error, StructuredOutputValidationError):
        return "数据格式验证失败：评分必须在 1-10 之间，请修正后重试。"
    elif isinstance(error, MultipleStructuredOutputsError):
        return "错误：返回了多个结果，请只选择最相关的一个。"
    else:
        return f"未知错误：{str(error)}"


# 创建agent - 使用自定义错误处理函数
agent = create_agent(
    model=llm,
    response_format=ToolStrategy(
        schema=Movie,
        handle_errors=custom_error_handler,  # 传入自定义函数
    ),
)

总结

结构化输出是 Agent 落地生产环境的核心基石，它彻底解决了大模型输出不可控的问题，让 Agent 的输出结果可以被程序直接、稳定地调用。在下一篇教程中，我们将继续讲解 LangChain Agent 的记忆模块，实现 Agent 的多轮对话上下文感知，让 Agent 拥有"记忆能力"