LangChain中的结构化输出

LangChain 中的结构化输出，就是让大语言模型（LLM）不再是只返回一段自由文本，而是按照预定义好的格式（比如一个 JSON 对象）来返回数据。

这样做的好处很明显：返回的数据是结构化 、类型安全的，便于程序直接解析和使用，而无需通过写复杂的提示词（Prompt）去"碰运气"。

在 LangChain 中，实现结构化输出主要有两种方式，适用于不同的场景。

方法一：`with_structured_output` (推荐)

这是 LangChain 从 0.2 版本开始推荐的主流方法，也是目前大多数使用的方案。它通过与模型供应商的API直接集成，支持 Pydantic、JSON Schema、TypedDict 等方式定义 Schema。

复制代码

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

# 1. 定义期望的输出格式
class Joke(BaseModel):
    """笑话的结构化输出"""
    setup: str = Field(description="笑话的铺垫或问题")
    punchline: str = Field(description="笑话的结尾或答案")
    rating: float = Field(description="给这个笑话的可笑程度打分 (1到10)")

# 2. 初始化模型并包装
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(Joke)

# 3. 调用模型，直接获取结构化结果
result: Joke = structured_llm.invoke("给我讲一个关于程序员的笑话")
print(f"问题: {result.setup}")
print(f"答案: {result.punchline}")
print(f"评分: {result.rating}")

优点：
- 代码极其简洁 ：一行 .with_structured_output(Joke) 就完成了所有配置。直接返回 Pydantic 对象实例，类型提示完善，IDE 友好。
- 官方推荐：这是 LangChain 官方文档首推的方式。
- 灵活：通过 method 参数，可以指定底层使用哪种技术（如 'function_calling' 或 'json_schema'），来适配不同模型的特性。
适用场景：绝大多数需要 LLM 输出结构化数据的场景，如信息提取、分类、生成配置文件等。

方法二：`PydanticOutputParser`

这是 LangChain 早期版本中的经典方法，通过 Prompt Engineering 的方式来实现。它的核心思想是，不依赖模型的特殊 API，而是通过精心设计的 Prompt 来"引导"模型输出，再用一个解析器（Parser）来提取结果。

复制代码

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

# 1. 定义格式
class Joke(BaseModel):
    setup: str = Field(description="笑话的铺垫")
    punchline: str = Field(description="笑话的答案")

# 2. 创建解析器并获取格式指令
parser = PydanticOutputParser(pydantic_object=Joke)
format_instructions = parser.get_format_instructions()

# 3. 将格式指令注入到 Prompt 中
prompt = PromptTemplate(
    template="回答用户的问题。\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": format_instructions},
)

# 4. 构建并运行链
model = ChatOpenAI(temperature=0)
chain = prompt | model | parser
result: Joke = chain.invoke({"query": "讲个笑话"})
print(result)

优点：
- 通用性极强：不依赖模型的特定 API，只要模型能读懂指令、能生成 JSON，就能工作。对任何模型都适用。
- 原理透明：可以清晰地看到发送给模型的指令内容，方便调试和优化。
适用场景：
- 当使用不支持原生结构化输出的模型时。
- 需要对 Prompt 有极致的控制力，或者希望深入理解 LangChain 工作原理时。

两种方式对比

特性	`with_structured_output` (推荐)	`PydanticOutputParser`
实现原理	调用模型原生API（如 Tool Calling / JSON Mode），由模型商在底层实现	通过 LangChain 精心设计的 Prompt 引导模型输出 JSON
底层实现	原生能力	提示词工程
依赖	需要模型本身支持特定功能（如工具调用）	对任何LLM都适用
代码简洁性	极高，方法链式调用	较高，需手动构建 Prompt 和调用 Parser
官方推荐度	首选	备用方案
灵活性	依赖模型提供的方法	极高，可完全自定义 Prompt 逻辑

with_structured_output 是新版LangChain的默认选择，利用的是模型本身的结构化能力。PydanticOutputParser 是传统方法，本质上是通过提示词来告诉模型返回特定格式。

LangChain中的结构化输出

方法一：with_structured_output (推荐)

方法二：PydanticOutputParser

两种方式对比

方法一：`with_structured_output` (推荐)

方法二：`PydanticOutputParser`