（教学）Agent 构建 Prompt（提示词）4. JsonOutputParser

JsonOutputParser是一个工具，能让用户指定所需的 JSON 模式。其目的是使大型语言模型（LLM）能够查询数据并以符合指定模式的 JSON 格式返回结果。为了确保 LLM 能够准确且高效地处理数据，并生成所需的格式的 JSON，该模型必须具备足够的能力（例如智能）。例如，llama-70B 模型的容量比 llama-8B 模型更大，因此更适合处理复杂的数据。 [注意] JSON（JavaScript 对象表示法） 是一种轻量级的数据交换格式，用于存储和组织数据。它在网页开发中起着至关重要的作用，并且被广泛用于服务器与客户端之间的通信。JSON 基于易于阅读的文本，便于机器解析和生成。 JSON 数据由键值对组成。在这里，"键"是一个字符串，而"值"可以是各种数据类型。JSON 有两种主要的结构：

对象：由一对一对的键值对组成，这些键值对被括号 { } 包围。每个键通过冒号（：）与对应的值相关联，而多个键值对之间则用逗号（，）分隔。
数组：一个有序的值列表，由方括号 [ ] 包围。数组中的值之间用逗号（，）分隔。

json 复制代码

{
  "name": "John Doe",
  "age": 30,
  "is_student": false,
  "skills": ["Java", "Python", "JavaScript"],
  "address": {
    "street": "123 Main St",
    "city": "Anytown"
  }
}

定义模型

python 复制代码

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from dotenv import load_dotenv
import os

load_dotenv()

Qwen2_5_7B_Instruct_llm = ChatOpenAI(
    temperature=0.1,  # 控制输出的随机性和创造性，值越低输出越稳定可预测，值越高输出越有创意但可能偏离预期 (范围: 0.0 ~ 2.0)
    model_name="Qwen/Qwen2.5-7B-Instruct",  # 硅基流动支持的模型名称
    openai_api_key=os.getenv("SILICONFLOW_API_KEY"),  # 从环境变量获取API密钥
    openai_api_base="https://api.siliconflow.cn/v1"  # 硅基流动API的基础URL
)

定义输出数据模式格式

python 复制代码

class Topic(BaseModel):
    description: str = Field(description="主题的简要描述")
    hashtags: str = Field(description="关键词的标签格式（至少2个）")

使用 `JsonOutputParser` 设置解析器并将指令注入到提示模板中

python 复制代码

question = "请解释全球变暖的严重性。"

# 设置解析器并将指令注入提示模板
parser = JsonOutputParser(pydantic_object=Topic)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a >list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The >object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
json 复制代码
{"properties": {"description": {"description": "主题的简要描述", "title": "Description", "type": "string"}, "hashtags": {"description": "关键词的标签格式（至少2个）", "title": "Hashtags", "type": "string"}}, "required": ["description", "hashtags"]}

构建提示模板

python 复制代码

from langchain_core.prompts import PromptTemplate


prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "你是一个友好的AI助手。请简洁回答问题。"),
        ("user", "#Format: {format_instructions}\n\n#Question: {question}"),
    ]
)
prompt = prompt.partial(format_instructions=parser.get_format_instructions())

chain = prompt | Qwen2_5_7B_Instruct_llm | parser
answer = chain.invoke({"question": question})

解析输出

python 复制代码

type(answer)

dict

python 复制代码

answer

{'description': '全球变暖导致极端天气事件频发，冰川融化，海平面上升，生态系统受到威胁，农业生产受到影响，人类健康面临风险，经济遭受损失。', 'hashtags': '#全球变暖 #极端天气 #冰川融化 #海平面上升 #生态系统 #农业生产 #人类健康 #经济损失'}

不设定数据结构的 `JsonOutputParser`

你可以在不使用 Pydantic 的情况下生成 JSON 格式的输出。

按照以下步骤来实现：

python 复制代码

# 编写你的问题
question = "请提供关于全球变暖的信息。在description中包含解释说明，在`hashtags`中包含相关关键词。"

# 初始化 JsonOutputParser
parser = JsonOutputParser()

# 设置提示模板
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "我是一个友好的AI助手。我会简明扼要地回答问题。"),
        ("user", "#格式: {format_instructions}\n\n#问题: {question}"),
    ]
)

# 将格式指令注入提示
prompt = prompt.partial(format_instructions=parser.get_format_instructions())

# 将提示、模型和 JsonOutputParser 组合成链
chain = prompt | Qwen2_5_7B_Instruct_llm | parser

# 使用你的问题运行链
response = chain.invoke({"question": question})
print(response)

{'description': '全球变暖是指地球表面平均温度的长期上升趋势，主要由温室气体（如二氧化碳、甲烷）的增加引起。这些气体在大气中形成一层"毯子"，捕获太阳热量，导致地球温度升高。全球变暖导致极端天气事件增多、冰川融化、海平面上升等问题，对生态系统和人类社会造成严重影响。', 'hashtags': ['全球变暖', '气候变化', '温室效应', '环境保护', '可持续发展']}

（ 教学 ）Agent 构建 Prompt（提示词）4. JsonOutputParser