精准控制：提示词工程与结构化输出

文章目录

[1. 提示词模板 (PromptTemplate)](#1. 提示词模板 (PromptTemplate))
- [1.1 基础字符串模板](#1.1 基础字符串模板)
- [1.2 聊天模板 (ChatPromptTemplate)](#1.2 聊天模板 (ChatPromptTemplate))
- [1.3 少样本提示 (Few-Shot Prompts)](#1.3 少样本提示 (Few-Shot Prompts))
- - [场景 1：静态示例（适合示例少且固定的情况）](#场景 1：静态示例（适合示例少且固定的情况）)
  - [场景 2：动态示例选择 (Example Selector)](#场景 2：动态示例选择 (Example Selector))
[2. 结构化输出 (Structured Output)](#2. 结构化输出 (Structured Output))
- [2.1 为什么要用 `with_structured_output`？](#2.1 为什么要用 with_structured_output？)
- [2.2 基础用法：提取 Pydantic 对象](#2.2 基础用法：提取 Pydantic 对象)
- [2.3 进阶用法：嵌套结构与列表](#2.3 进阶用法：嵌套结构与列表)
- [2.4 故障处理与降级策略 (Fallback)](#2.4 故障处理与降级策略 (Fallback))
总结

核心痛点 ：大模型的输出不可控（总是胡说八道或格式错误）。本篇将教你如何通过提示词模板 管理输入，并通过结构化输出强制模型生成可被代码解析的 JSON 数据。

1. 提示词模板 (PromptTemplate)

在开发 LLM 应用时，我们很少直接把字符串发给模型（硬编码），而是使用模板来管理变量。

1.1 基础字符串模板

PromptTemplate 适用于普通的文本补全模型，或者不需要区分角色的场景。

python 复制代码

from langchain_core.prompts import PromptTemplate

# 1. 定义模板：使用 {variable} 作为占位符
template = PromptTemplate.from_template(
    "你是一个起名大师。请为一家生产{product}的公司起一个{style}的名字。"
)

# 2. 填充变量
# .format() 方法仅返回字符串，不调用模型
prompt_str = template.format(product="运动鞋", style="既霸气又时尚")
print(prompt_str) 
# 输出: 你是一个起名大师。请为一家生产运动鞋的公司起一个既霸气又时尚的名字。

# 3. 结合 LCEL 调用
# chain = template | model

技巧：部分填充 (Partial Variables)

有时某些变量是固定的（比如当前时间、格式说明），不需要每次请求都传。

python 复制代码

import datetime

template = PromptTemplate(
    template="今天是{date}。请告诉我历史上今天发生了什么？涉及领域：{topic}",
    input_variables=["topic"],
    # 预先固定 date 变量
    partial_variables={"date": datetime.date.today()} 
)

print(template.format(topic="科技"))

1.2 聊天模板 (ChatPromptTemplate)

对于 ChatModel，必须使用 ChatPromptTemplate 来构建消息列表。它能很好地管理 System/User/AI 角色。

python 复制代码

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个专业的翻译助手，目标语言是{target_language}。"),
    # MessagesPlaceholder 是一个特殊的占位符，常用于插入历史对话记录 (List[Message])
    MessagesPlaceholder("chat_history"), 
    ("human", "{text}")
])

# 模拟历史消息
from langchain_core.messages import AIMessage, HumanMessage
history = [
    HumanMessage(content="你好"), 
    AIMessage(content="你好！有什么我可以帮你的吗？")
]

messages = chat_prompt.format_messages(
    target_language="日文",
    chat_history=history,
    text="我喜欢编程"
)
# 这里生成的是一个包含 System, Human, AI, Human 四条消息的 List

1.3 少样本提示 (Few-Shot Prompts)

Few-Shot (少样本) 是通过给模型提供几个示例（Examples）来迅速提升其表现的技术。这就好比教小孩做题，光讲规则（Zero-Shot）不如给几个例题（Few-Shot）来得快。

场景 1：静态示例（适合示例少且固定的情况）

python 复制代码

from langchain_core.prompts import FewShotChatMessagePromptTemplate

# 1. 准备示例库
examples = [
    {"input": "happy", "output": "悲伤"},
    {"input": "tall", "output": "矮小"},
]

# 2. 定义单个示例的格式
example_prompt = ChatPromptTemplate.from_messages(
    [("human", "{input}"), ("ai", "{output}")]
)

# 3. 创建 Few-Shot 模板
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

# 4. 组装最终 Prompt
final_prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个反义词大师，请直接输出反义词，不要解释。"),
    few_shot_prompt, # 插入示例
    ("human", "{input}"),
])

# 调用
chain = final_prompt | model
print(chain.invoke({"input": "big"}).content) # 输出: 微小

场景 2：动态示例选择 (Example Selector)

当你有 100 个示例，但每次 Prompt 只能塞进 3 个时，如何选择最相关的 3 个？

这就需要 Example Selector 配合 向量数据库。

python 复制代码

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.example_selectors import SemanticSimilarityExampleSelector

# 1. 假设这是你的大规模示例库
examples = [
    {"input": "高兴", "output": "悲伤"},
    {"input": "高大", "output": "矮小"},
    {"input": "晴朗", "output": "阴雨"},
    {"input": "复杂", "output": "简单"},
    # ... 假设有更多
]

# 2. 使用向量数据库进行语义检索
# 这会把 input 文本转成向量，存入 Chroma 本地库
example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    Chroma,
    k=1 # 每次只选 1 个最相似的例子
)

# 3. 构建动态 Prompt
dynamic_few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_selector=example_selector, # 这里传入 selector 而不是 examples 列表
    example_prompt=example_prompt,
)

# 测试：输入"天气好"，应该会匹配到"晴朗"这个例子，而不是"高兴"
prompt = dynamic_few_shot_prompt.format(input="天气好")
print(prompt)

注意：使用 Few-Shot 会增加 Token 消耗。在生产环境中，需要权衡示例数量（k 值）与 API 成本。

2. 结构化输出 (Structured Output)

这是构建 Agent 必须掌握的技能。我们需要确保模型不仅仅是"聊天"，还能产生机器可读的数据（通常是 JSON）。

2.1 为什么要用 `with_structured_output`？

在旧版 LangChain 中，我们通常使用 JsonOutputParser，它需要在 Prompt 中注入类似 "请返回 JSON 格式，不要包含 Markdown" 的指令。这种方式很不稳定，模型经常忘了格式。

新版的 .with_structured_output() 直接利用了 OpenAI 等模型的 Function Calling (Tools) 或 JSON Mode 能力，从底层强制模型按格式输出，准确率极高。

2.2 基础用法：提取 Pydantic 对象

python 复制代码

from typing import Optional, List
from pydantic import BaseModel, Field

# 1. 定义数据结构 (Schema)
# description 字段非常重要！它是给 AI 看的"说明书"
class Joke(BaseModel):
    setup: str = Field(description="笑话的铺垫部分")
    punchline: str = Field(description="笑话的好笑梗")
    rating: Optional[int] = Field(description="笑话评分，1-10分", default=None)

# 2. 绑定结构
structured_llm = model.with_structured_output(Joke)

# 3. 调用
# 此时 invoke 返回的不再是 AIMessage，而是 Joke 类的实例！
result = structured_llm.invoke("给我讲个关于程序员的笑话")

print(f"类型: {type(result)}") # <class '__main__.Joke'>
print(f"梗: {result.punchline}")

2.3 进阶用法：嵌套结构与列表

实际业务中，我们往往需要提取更复杂的数据，比如"提取书单"。

python 复制代码

# 子对象
class Book(BaseModel):
    title: str = Field(description="书名")
    author: str = Field(description="作者")

# 父对象
class BookList(BaseModel):
    books: List[Book] = Field(description="推荐的书籍列表")
    summary: str = Field(description="推荐理由总结")

# 绑定
structured_llm = model.with_structured_output(BookList)

query = "请推荐两本关于科幻的小说，并给出推荐语。"
result = structured_llm.invoke(query)

for book in result.books:
    print(f"书名: {book.title} | 作者: {book.author}")
print(f"总结: {result.summary}")

2.4 故障处理与降级策略 (Fallback)

虽然 .with_structured_output() 很强大，但偶尔也会失败（比如模型返回了无法解析的 JSON）。为了保证系统的健壮性，我们可以使用 include_raw=True 来获取原始响应，或者结合 JsonOutputParser 作为降级方案。

python 复制代码

# 获取原始响应和解析结果
structured_llm_raw = model.with_structured_output(Joke, include_raw=True)
result = structured_llm_raw.invoke("讲个笑话")

print(result["parsed"]) # 解析后的 Pydantic 对象 (如果成功)
print(result["raw"])    # 原始的 AIMessage 对象

如果 parsed 为 None，说明解析失败，你可以记录 raw 响应以便调试，或者触发重试逻辑。

总结

管理 Prompt ：不要拼字符串，用 PromptTemplate 和 ChatPromptTemplate 优雅地管理变量和角色。
Few-Shot：遇到模型听不懂指令时，给几个例子（静态或动态）是最高效的调优手段。
Structured Output ：Pydantic + .with_structured_output() 是清洗非结构化文本的神器，它让大模型变成了可调用的 API 函数。

下一篇，我们将让 AI 拥有"记忆"和"手脚"（工具调用），这是迈向 Agent 的关键一步。

精准控制：提示词工程与结构化输出

文章目录

1. 提示词模板 (PromptTemplate)

1.1 基础字符串模板

1.2 聊天模板 (ChatPromptTemplate)

1.3 少样本提示 (Few-Shot Prompts)

场景 1：静态示例（适合示例少且固定的情况）

场景 2：动态示例选择 (Example Selector)

2. 结构化输出 (Structured Output)

2.1 为什么要用 with_structured_output？

2.2 基础用法：提取 Pydantic 对象

2.3 进阶用法：嵌套结构与列表

2.4 故障处理与降级策略 (Fallback)

总结

2.1 为什么要用 `with_structured_output`？