Prompt 工程实战：System Prompt 设计、Few-shot 与 Chain-of-Thought

文章目录

- 一、换个问法，结果天差地别
- [二、Prompt 的三要素与权重分配](#二、Prompt 的三要素与权重分配)
- - [2.1 三要素架构](#2.1 三要素架构)
  - [2.2 三要素的权重差异](#2.2 三要素的权重差异)
- [三、System Prompt 设计模式](#三、System Prompt 设计模式)
- - [3.1 四段式 System Prompt 模板](#3.1 四段式 System Prompt 模板)
  - [3.2 角色设定的梯度设计](#3.2 角色设定的梯度设计)
  - [3.3 输出格式约束的三种级别](#3.3 输出格式约束的三种级别)
- [四、Few-shot Prompting：用示例引导模式推断](#四、Few-shot Prompting：用示例引导模式推断)
- - [4.1 Few-shot 的核心原理](#4.1 Few-shot 的核心原理)
  - [4.2 示例选择策略](#4.2 示例选择策略)
  - [4.3 Zero-shot vs Few-shot vs CoT 的效果差异](#4.3 Zero-shot vs Few-shot vs CoT 的效果差异)
- 五、Chain-of-Thought：让模型"一步步想"
- - [5.1 CoT 为什么有效](#5.1 CoT 为什么有效)
  - [5.2 Zero-shot CoT vs Few-shot CoT](#5.2 Zero-shot CoT vs Few-shot CoT)
- [六、ReAct 模式：推理与行动的循环](#六、ReAct 模式：推理与行动的循环)
- - [6.1 ReAct 的核心循环](#6.1 ReAct 的核心循环)
  - [6.2 ReAct 的 Prompt 设计](#6.2 ReAct 的 Prompt 设计)
  - [6.3 用正则解析 ReAct 输出](#6.3 用正则解析 ReAct 输出)
- [七、Prompt 模板化：用 Jinja2 管理 Prompt 工程](#七、Prompt 模板化：用 Jinja2 管理 Prompt 工程)
- - [7.1 Jinja2 模板引擎](#7.1 Jinja2 模板引擎)
- [八、结构化输出控制：100% 可解析](#八、结构化输出控制：100% 可解析)
- - [8.1 JSON 模式 + Pydantic 解析](#8.1 JSON 模式 + Pydantic 解析)
  - [8.2 结构化输出可靠性对比](#8.2 结构化输出可靠性对比)
- [九、Prompt A/B 测试与迭代](#九、Prompt A/B 测试与迭代)
- - [9.1 系统化的 Prompt 评估](#9.1 系统化的 Prompt 评估)
  - [9.2 评估指标设计](#9.2 评估指标设计)
- [十、实战：Python 代码审查 Prompt 模板库](#十、实战：Python 代码审查 Prompt 模板库)
- - [10.1 完整 Prompt 模板](#10.1 完整 Prompt 模板)
  - [10.3 典型输出示例](#10.3 典型输出示例)
- 十一、小结

一、换个问法，结果天差地别

同一个模型，面对同一个问题，不同的提问方式可能产生完全不同的答案。让 GPT-4o 直接"审查这段代码"，它可能返回一段泛泛而谈的散文；但如果给它明确的角色（"你是一个有 10 年经验的 Python 代码审查员"）、输出格式约束（"请以 JSON 格式返回，包含 severity、line、message 字段"）和审查清单（"检查空指针、SQL 注入、资源泄漏"），输出就会从散文变成可直接解析的结构化报告。

Prompt 工程不是"跟 AI 说好话"，而是用结构化约束降低 LLM 输出的不确定性。好的 System Prompt = 角色定义 + 输出格式 + 约束条件 + 示例。本文从 Prompt 构成要素、System Prompt 设计模式、Few-shot、Chain-of-Thought、ReAct、模板化管理、结构化输出和 A/B 测试八个维度，建立一套工程化的 Prompt 管理体系。

二、Prompt 的三要素与权重分配

2.1 三要素架构

#mermaid-svg-mWhZU4IE28JDdTBC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-mWhZU4IE28JDdTBC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-mWhZU4IE28JDdTBC .error-icon{fill:#552222;}#mermaid-svg-mWhZU4IE28JDdTBC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-mWhZU4IE28JDdTBC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-mWhZU4IE28JDdTBC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-mWhZU4IE28JDdTBC .marker.cross{stroke:#333333;}#mermaid-svg-mWhZU4IE28JDdTBC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-mWhZU4IE28JDdTBC p{margin:0;}#mermaid-svg-mWhZU4IE28JDdTBC .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-mWhZU4IE28JDdTBC .cluster-label text{fill:#333;}#mermaid-svg-mWhZU4IE28JDdTBC .cluster-label span{color:#333;}#mermaid-svg-mWhZU4IE28JDdTBC .cluster-label span p{background-color:transparent;}#mermaid-svg-mWhZU4IE28JDdTBC .label text,#mermaid-svg-mWhZU4IE28JDdTBC span{fill:#333;color:#333;}#mermaid-svg-mWhZU4IE28JDdTBC .node rect,#mermaid-svg-mWhZU4IE28JDdTBC .node circle,#mermaid-svg-mWhZU4IE28JDdTBC .node ellipse,#mermaid-svg-mWhZU4IE28JDdTBC .node polygon,#mermaid-svg-mWhZU4IE28JDdTBC .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-mWhZU4IE28JDdTBC .rough-node .label text,#mermaid-svg-mWhZU4IE28JDdTBC .node .label text,#mermaid-svg-mWhZU4IE28JDdTBC .image-shape .label,#mermaid-svg-mWhZU4IE28JDdTBC .icon-shape .label{text-anchor:middle;}#mermaid-svg-mWhZU4IE28JDdTBC .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-mWhZU4IE28JDdTBC .rough-node .label,#mermaid-svg-mWhZU4IE28JDdTBC .node .label,#mermaid-svg-mWhZU4IE28JDdTBC .image-shape .label,#mermaid-svg-mWhZU4IE28JDdTBC .icon-shape .label{text-align:center;}#mermaid-svg-mWhZU4IE28JDdTBC .node.clickable{cursor:pointer;}#mermaid-svg-mWhZU4IE28JDdTBC .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-mWhZU4IE28JDdTBC .arrowheadPath{fill:#333333;}#mermaid-svg-mWhZU4IE28JDdTBC .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-mWhZU4IE28JDdTBC .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-mWhZU4IE28JDdTBC .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mWhZU4IE28JDdTBC .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-mWhZU4IE28JDdTBC .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mWhZU4IE28JDdTBC .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-mWhZU4IE28JDdTBC .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-mWhZU4IE28JDdTBC .cluster text{fill:#333;}#mermaid-svg-mWhZU4IE28JDdTBC .cluster span{color:#333;}#mermaid-svg-mWhZU4IE28JDdTBC div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-mWhZU4IE28JDdTBC .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-mWhZU4IE28JDdTBC rect.text{fill:none;stroke-width:0;}#mermaid-svg-mWhZU4IE28JDdTBC .icon-shape,#mermaid-svg-mWhZU4IE28JDdTBC .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-mWhZU4IE28JDdTBC .icon-shape p,#mermaid-svg-mWhZU4IE28JDdTBC .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-mWhZU4IE28JDdTBC .icon-shape .label rect,#mermaid-svg-mWhZU4IE28JDdTBC .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-mWhZU4IE28JDdTBC .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-mWhZU4IE28JDdTBC .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-mWhZU4IE28JDdTBC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 完整 Prompt
System Message

权重最高
User Message

权重中等
Assistant Message

权重最低
角色设定

你是一个资深 Python 工程师
输出格式

请返回 JSON
约束条件

不要解释，只返回结果
具体任务

审查这段代码
上下文信息

代码内容 / 文档片段
历史对话

少样本示例
思维链痕迹

模型自己的推理过程

System Message 在模型处理时的权重最高------它定义了整个对话的"全局规则"。User Message 提供具体任务，Assistant Message 则用于多轮对话中传递上下文或 Few-shot 示例。

python 复制代码

messages = [
    {
        "role": "system",
        "content": "你是一个专业的 Python 代码审查员。请严格按以下 JSON 格式返回审查结果："
                   '[{"severity": "high|medium|low", "line": 行号, "message": "问题描述"}]'
    },
    {
        "role": "user",
        "content": "请审查以下代码：\n\ndef get_user(user_id):\n    query = f\"SELECT * FROM users WHERE id = {user_id}\"\n    return db.execute(query)"
    }
]

# System Message 定义了"全局规则"：角色 + 格式
# User Message 定义了"具体任务"：审查这段代码

2.2 三要素的权重差异

要素	作用范围	影响强度	使用频率
System Message	整个对话	最高------定义全局规则	每个请求必带
User Message	当前轮次	中------提供具体输入	每轮用户输入
Assistant Message	历史轮次	低------提供上下文/示例	多轮对话/Few-shot

工程经验：System Message 中不要塞入过长的具体任务描述------它更适合放"规则"和"格式"。具体任务应该放在 User Message 中。如果 System Message 超过 2000 token，模型可能会"遗忘"其中的一部分约束。

三、System Prompt 设计模式

3.1 四段式 System Prompt 模板

经过大量工程实践验证，一个完整的 System Prompt 应该包含四个部分：

python 复制代码

SYSTEM_PROMPT_TEMPLATE = """# 角色设定
{role}

# 任务描述
{task}

# 输出格式
{output_format}

# 约束条件
{constraints}
"""

# 示例：Python 代码审查
system_prompt = SYSTEM_PROMPT_TEMPLATE.format(
    role="你是一个有 10 年经验的 Python 高级开发工程师，专注于代码质量和安全性审查。",
    task="审查用户提供的 Python 代码，识别潜在 bug、安全漏洞、性能问题和风格违规。",
    output_format='''请严格按以下 JSON 数组格式返回审查结果，不要添加任何其他内容：
[
  {
    "severity": "high|medium|low",
    "category": "security|bug|performance|style",
    "line": 整数行号,
    "message": "问题描述及修复建议"
  }
]''',
    constraints="""1. 只返回 JSON，不要 Markdown 代码块包裹
2. 如果没有发现问题，返回空数组 []
3. severity 的判定标准：
   - high：会导致程序崩溃、数据丢失或安全漏洞
   - medium：可能导致未定义行为或性能问题
   - low：风格问题或最佳实践建议"""
)

3.2 角色设定的梯度设计

角色设定不是越详细越好，而是需要"恰到好处"的粒度。过度具体的角色设定可能让模型陷入不必要的约束，而过于模糊则无法提供有效的行为引导。

粒度	示例	适用场景
极简	"你是一个助手"	通用问答
中等	"你是一个 Python 代码审查员"	特定任务
详细	"你是一个有 10 年经验的 Python 高级开发工程师，曾在 Google 和 Dropbox 工作，专注于代码质量和安全性审查。你的审查风格严格但建设性，每条建议都附带具体的修复代码。"	高质量专业输出

实验发现：角色设定中加入"经验年限"和"知名公司背景"确实能提升输出质量------这被称为"角色锚定效应"。但超过两句话的角色描述会出现边际递减，第三句话开始几乎不再提升质量。

3.3 输出格式约束的三种级别

#mermaid-svg-aWAj4iJhkN1nOObv{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-aWAj4iJhkN1nOObv .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-aWAj4iJhkN1nOObv .error-icon{fill:#552222;}#mermaid-svg-aWAj4iJhkN1nOObv .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-aWAj4iJhkN1nOObv .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-aWAj4iJhkN1nOObv .marker{fill:#333333;stroke:#333333;}#mermaid-svg-aWAj4iJhkN1nOObv .marker.cross{stroke:#333333;}#mermaid-svg-aWAj4iJhkN1nOObv svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-aWAj4iJhkN1nOObv p{margin:0;}#mermaid-svg-aWAj4iJhkN1nOObv .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-aWAj4iJhkN1nOObv .cluster-label text{fill:#333;}#mermaid-svg-aWAj4iJhkN1nOObv .cluster-label span{color:#333;}#mermaid-svg-aWAj4iJhkN1nOObv .cluster-label span p{background-color:transparent;}#mermaid-svg-aWAj4iJhkN1nOObv .label text,#mermaid-svg-aWAj4iJhkN1nOObv span{fill:#333;color:#333;}#mermaid-svg-aWAj4iJhkN1nOObv .node rect,#mermaid-svg-aWAj4iJhkN1nOObv .node circle,#mermaid-svg-aWAj4iJhkN1nOObv .node ellipse,#mermaid-svg-aWAj4iJhkN1nOObv .node polygon,#mermaid-svg-aWAj4iJhkN1nOObv .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-aWAj4iJhkN1nOObv .rough-node .label text,#mermaid-svg-aWAj4iJhkN1nOObv .node .label text,#mermaid-svg-aWAj4iJhkN1nOObv .image-shape .label,#mermaid-svg-aWAj4iJhkN1nOObv .icon-shape .label{text-anchor:middle;}#mermaid-svg-aWAj4iJhkN1nOObv .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-aWAj4iJhkN1nOObv .rough-node .label,#mermaid-svg-aWAj4iJhkN1nOObv .node .label,#mermaid-svg-aWAj4iJhkN1nOObv .image-shape .label,#mermaid-svg-aWAj4iJhkN1nOObv .icon-shape .label{text-align:center;}#mermaid-svg-aWAj4iJhkN1nOObv .node.clickable{cursor:pointer;}#mermaid-svg-aWAj4iJhkN1nOObv .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-aWAj4iJhkN1nOObv .arrowheadPath{fill:#333333;}#mermaid-svg-aWAj4iJhkN1nOObv .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-aWAj4iJhkN1nOObv .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-aWAj4iJhkN1nOObv .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-aWAj4iJhkN1nOObv .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-aWAj4iJhkN1nOObv .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-aWAj4iJhkN1nOObv .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-aWAj4iJhkN1nOObv .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-aWAj4iJhkN1nOObv .cluster text{fill:#333;}#mermaid-svg-aWAj4iJhkN1nOObv .cluster span{color:#333;}#mermaid-svg-aWAj4iJhkN1nOObv div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-aWAj4iJhkN1nOObv .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-aWAj4iJhkN1nOObv rect.text{fill:none;stroke-width:0;}#mermaid-svg-aWAj4iJhkN1nOObv .icon-shape,#mermaid-svg-aWAj4iJhkN1nOObv .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-aWAj4iJhkN1nOObv .icon-shape p,#mermaid-svg-aWAj4iJhkN1nOObv .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-aWAj4iJhkN1nOObv .icon-shape .label rect,#mermaid-svg-aWAj4iJhkN1nOObv .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-aWAj4iJhkN1nOObv .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-aWAj4iJhkN1nOObv .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-aWAj4iJhkN1nOObv :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 输出格式约束
级别1：自然语言描述

请用 JSON 格式返回
级别2：JSON Schema

定义字段类型和必填项
级别3：代码示例

展示期望的输出样例
格式遵守率 ~70%
格式遵守率 ~90%
格式遵守率 ~95%
配合 response_format

可达 ~99%

级别 1 最简单但最不可靠。级别 2 使用 JSON Schema 精确定义结构。级别 3 用具体的输出示例引导模型------Few-shot 的核心思想。

python 复制代码

# 级别 3：代码示例约束（最可靠）
system_prompt = """你是一个数据提取助手。请从用户提供的文本中提取实体，按以下格式返回：

示例输入："张三在 2024 年 5 月购买了 iPhone 15"
示例输出：
{
  "person": "张三",
  "date": "2024-5",
  "product": "iPhone 15"
}

请只返回 JSON，不要其他内容。"""

四、Few-shot Prompting：用示例引导模式推断

4.1 Few-shot 的核心原理

Few-shot 的核心思想是：给模型 2-3 个输入-输出示例，模型会自动推断其中的模式，然后对新的输入应用相同的模式。这比单纯用自然语言描述规则更可靠------因为模型是"模仿"而非"理解"。

python 复制代码

few_shot_messages = [
    {
        "role": "system",
        "content": "将用户的情绪分类为：positive / neutral / negative"
    },
    # 示例 1
    {
        "role": "user",
        "content": "这个产品太棒了，完全超出预期！"
    },
    {
        "role": "assistant",
        "content": "positive"
    },
    # 示例 2
    {
        "role": "user",
        "content": "快递到了，包装完好。"
    },
    {
        "role": "assistant",
        "content": "neutral"
    },
    # 示例 3
    {
        "role": "user",
        "content": "等了两周才发货，客服还一直敷衍。"
    },
    {
        "role": "assistant",
        "content": "negative"
    },
    # 实际请求
    {
        "role": "user",
        "content": "性价比很高，会推荐给朋友。"
    }
]

4.2 示例选择策略

示例的质量比数量更重要。3 个精心挑选的示例往往优于 10 个随机示例。

策略	说明	适用场景
多样性	覆盖不同类别和边界情况	分类任务
代表性	选择最典型的样本	通用任务
边界案例	包含容易混淆的样本	精确度要求高的任务
难度递进	从简单到复杂排列	复杂推理任务

关键发现：示例的顺序会影响模型表现------将最清晰、最无歧义的示例放在前面，模型更容易"抓住"正确的模式。把模糊或边界案例放在后面作为"进阶"。

4.3 Zero-shot vs Few-shot vs CoT 的效果差异

#mermaid-svg-8Mz4WmPMRJf3dAwU{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-8Mz4WmPMRJf3dAwU .error-icon{fill:#552222;}#mermaid-svg-8Mz4WmPMRJf3dAwU .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-8Mz4WmPMRJf3dAwU .marker{fill:#333333;stroke:#333333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .marker.cross{stroke:#333333;}#mermaid-svg-8Mz4WmPMRJf3dAwU svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-8Mz4WmPMRJf3dAwU p{margin:0;}#mermaid-svg-8Mz4WmPMRJf3dAwU .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .cluster-label text{fill:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .cluster-label span{color:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .cluster-label span p{background-color:transparent;}#mermaid-svg-8Mz4WmPMRJf3dAwU .label text,#mermaid-svg-8Mz4WmPMRJf3dAwU span{fill:#333;color:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .node rect,#mermaid-svg-8Mz4WmPMRJf3dAwU .node circle,#mermaid-svg-8Mz4WmPMRJf3dAwU .node ellipse,#mermaid-svg-8Mz4WmPMRJf3dAwU .node polygon,#mermaid-svg-8Mz4WmPMRJf3dAwU .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .rough-node .label text,#mermaid-svg-8Mz4WmPMRJf3dAwU .node .label text,#mermaid-svg-8Mz4WmPMRJf3dAwU .image-shape .label,#mermaid-svg-8Mz4WmPMRJf3dAwU .icon-shape .label{text-anchor:middle;}#mermaid-svg-8Mz4WmPMRJf3dAwU .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .rough-node .label,#mermaid-svg-8Mz4WmPMRJf3dAwU .node .label,#mermaid-svg-8Mz4WmPMRJf3dAwU .image-shape .label,#mermaid-svg-8Mz4WmPMRJf3dAwU .icon-shape .label{text-align:center;}#mermaid-svg-8Mz4WmPMRJf3dAwU .node.clickable{cursor:pointer;}#mermaid-svg-8Mz4WmPMRJf3dAwU .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .arrowheadPath{fill:#333333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-8Mz4WmPMRJf3dAwU .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-8Mz4WmPMRJf3dAwU .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-8Mz4WmPMRJf3dAwU .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-8Mz4WmPMRJf3dAwU .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .cluster text{fill:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU .cluster span{color:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-8Mz4WmPMRJf3dAwU .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-8Mz4WmPMRJf3dAwU rect.text{fill:none;stroke-width:0;}#mermaid-svg-8Mz4WmPMRJf3dAwU .icon-shape,#mermaid-svg-8Mz4WmPMRJf3dAwU .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-8Mz4WmPMRJf3dAwU .icon-shape p,#mermaid-svg-8Mz4WmPMRJf3dAwU .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-8Mz4WmPMRJf3dAwU .icon-shape .label rect,#mermaid-svg-8Mz4WmPMRJf3dAwU .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-8Mz4WmPMRJf3dAwU .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-8Mz4WmPMRJf3dAwU .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-8Mz4WmPMRJf3dAwU :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 任务复杂度
简单分类

情绪/主题
复杂推理

数学/逻辑
多步任务

Agent/工具调用
Zero-shot 足够

准确率 85%+
Few-shot + CoT

准确率提升 15-30%
ReAct 模式

分解为 Thought-Action-Observation

五、Chain-of-Thought：让模型"一步步想"

5.1 CoT 为什么有效

Chain-of-Thought（思维链）的核心指令是简单的：在 Prompt 中加入"请一步步思考"或 "Let's think step by step"。这个看似 trivial 的技巧，在数学推理和逻辑推理任务中可以将准确率提升 15-30%。

CoT 有效的根本原因是：LLM 的生成过程是自回归的------每个 token 的生成依赖于前面所有 token。当模型被明确要求"一步步思考"时，它会把中间推理步骤显式地写出来，这些中间步骤反过来为后续步骤提供了更丰富的上下文，减少了"跳步"导致的错误。

python 复制代码

# 不加 CoT ------ 直接问
result = client.chat([
    Message(role="user", content="17 × 24 = ?")
])
# 模型可能直接给答案，错误率较高

# 加 CoT ------ 要求展示推理过程
result = client.chat([
    Message(role="user", content="17 × 24 = ? 请一步步计算，展示每一步的推理过程。")
])
# 模型会输出：
# 17 × 24 = 17 × (20 + 4) = 17×20 + 17×4 = 340 + 68 = 408

5.2 Zero-shot CoT vs Few-shot CoT

方式	方法	效果	适用场景
Zero-shot CoT	在 Prompt 末尾加"请一步步思考"	中等提升	通用推理任务
Few-shot CoT	示例中包含详细的推理步骤	最大提升	特定类型的复杂推理
Self-Consistency	多次采样，取多数答案	进一步提升	答案可验证的任务

python 复制代码

# Few-shot CoT 示例
few_shot_cot = [
    Message(role="system", content="你是一个数学助手。请展示完整的推理过程。"),
    Message(role="user", content="15 × 13 = ?"),
    Message(role="assistant", content="""让我一步步计算：
15 × 13
= 15 × (10 + 3)
= 15 × 10 + 15 × 3
= 150 + 45
= 195
所以答案是 195。"""),
    Message(role="user", content="23 × 17 = ?")
]

六、ReAct 模式：推理与行动的循环

6.1 ReAct 的核心循环

ReAct（Reasoning + Acting）是一种让 LLM 与外部工具交互的模式。核心思想是：LLM 不直接给出答案，而是循环执行"思考 → 行动 → 观察"三个步骤，直到获得足够信息给出最终答案。
#mermaid-svg-hmuFSCzFhhxK4HzB{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-hmuFSCzFhhxK4HzB .error-icon{fill:#552222;}#mermaid-svg-hmuFSCzFhhxK4HzB .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-hmuFSCzFhhxK4HzB .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-hmuFSCzFhhxK4HzB .marker{fill:#333333;stroke:#333333;}#mermaid-svg-hmuFSCzFhhxK4HzB .marker.cross{stroke:#333333;}#mermaid-svg-hmuFSCzFhhxK4HzB svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-hmuFSCzFhhxK4HzB p{margin:0;}#mermaid-svg-hmuFSCzFhhxK4HzB .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB .cluster-label text{fill:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB .cluster-label span{color:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB .cluster-label span p{background-color:transparent;}#mermaid-svg-hmuFSCzFhhxK4HzB .label text,#mermaid-svg-hmuFSCzFhhxK4HzB span{fill:#333;color:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB .node rect,#mermaid-svg-hmuFSCzFhhxK4HzB .node circle,#mermaid-svg-hmuFSCzFhhxK4HzB .node ellipse,#mermaid-svg-hmuFSCzFhhxK4HzB .node polygon,#mermaid-svg-hmuFSCzFhhxK4HzB .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-hmuFSCzFhhxK4HzB .rough-node .label text,#mermaid-svg-hmuFSCzFhhxK4HzB .node .label text,#mermaid-svg-hmuFSCzFhhxK4HzB .image-shape .label,#mermaid-svg-hmuFSCzFhhxK4HzB .icon-shape .label{text-anchor:middle;}#mermaid-svg-hmuFSCzFhhxK4HzB .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-hmuFSCzFhhxK4HzB .rough-node .label,#mermaid-svg-hmuFSCzFhhxK4HzB .node .label,#mermaid-svg-hmuFSCzFhhxK4HzB .image-shape .label,#mermaid-svg-hmuFSCzFhhxK4HzB .icon-shape .label{text-align:center;}#mermaid-svg-hmuFSCzFhhxK4HzB .node.clickable{cursor:pointer;}#mermaid-svg-hmuFSCzFhhxK4HzB .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-hmuFSCzFhhxK4HzB .arrowheadPath{fill:#333333;}#mermaid-svg-hmuFSCzFhhxK4HzB .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-hmuFSCzFhhxK4HzB .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-hmuFSCzFhhxK4HzB .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-hmuFSCzFhhxK4HzB .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-hmuFSCzFhhxK4HzB .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-hmuFSCzFhhxK4HzB .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-hmuFSCzFhhxK4HzB .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-hmuFSCzFhhxK4HzB .cluster text{fill:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB .cluster span{color:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-hmuFSCzFhhxK4HzB .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-hmuFSCzFhhxK4HzB rect.text{fill:none;stroke-width:0;}#mermaid-svg-hmuFSCzFhhxK4HzB .icon-shape,#mermaid-svg-hmuFSCzFhhxK4HzB .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-hmuFSCzFhhxK4HzB .icon-shape p,#mermaid-svg-hmuFSCzFhhxK4HzB .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-hmuFSCzFhhxK4HzB .icon-shape .label rect,#mermaid-svg-hmuFSCzFhhxK4HzB .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-hmuFSCzFhhxK4HzB .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-hmuFSCzFhhxK4HzB .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-hmuFSCzFhhxK4HzB :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是
否
用户问题
Thought

分析问题
需要工具？
Action

调用工具
Observation

获取结果
Final Answer

给出答案

6.2 ReAct 的 Prompt 设计

python 复制代码

REACT_PROMPT = """你是一个智能助手，可以使用以下工具：

工具列表：
1. search(query: str) - 搜索引擎，返回相关网页摘要
2. calculator(expression: str) - 计算器，返回数学表达式的结果
3. weather(city: str) - 天气查询，返回指定城市的天气

请按以下格式思考和工作：

问题：用户的问题
思考：分析当前状态，决定下一步行动
行动：工具名称[参数]
观察：工具的返回结果
...
（重复思考-行动-观察直到获得足够信息）
思考：已获得足够信息，可以给出答案
最终答案：对用户的回答

开始！

问题：{question}
思考："""

# 使用示例
question = "北京今天的天气适合穿羽绒服吗？"
prompt = REACT_PROMPT.format(question=question)

# 模型输出示例：
# 思考：需要查询北京的天气
# 行动：weather["北京"]
# 观察：北京今天晴，气温 -2~8°C，北风 3 级
# 思考：气温较低，适合穿羽绒服
# 最终答案：北京今天气温 -2~8°C，天气较冷，建议穿羽绒服。

6.3 用正则解析 ReAct 输出

python 复制代码

import re

def parse_react_output(output: str) -> Dict:
    """解析 ReAct 格式的模型输出"""
    thoughts = re.findall(r'思考：(.+)', output)
    actions = re.findall(r'行动：(\w+)\[(.+)\]', output)
    observations = re.findall(r'观察：(.+)', output)
    final_answer = re.search(r'最终答案：(.+)', output)

    return {
        "thoughts": thoughts,
        "actions": actions,  # [(tool_name, argument), ...]
        "observations": observations,
        "final_answer": final_answer.group(1) if final_answer else None
    }

七、Prompt 模板化：用 Jinja2 管理 Prompt 工程

7.1 Jinja2 模板引擎

将 Prompt 硬编码在 Python 字符串中是不可维护的------无法版本管理、无法复用、无法测试。Jinja2 模板引擎将 Prompt 从代码中分离，实现真正的"Prompt 即代码"。

python 复制代码

from jinja2 import Environment, FileSystemLoader
import json

# 加载模板
env = Environment(loader=FileSystemLoader("prompts/"))
template = env.get_template("code_review.j2")

# 渲染模板
prompt = template.render(
    role="Python 代码审查员",
    language="Python",
    rules=[
        "检查 SQL 注入漏洞",
        "检查空指针引用",
        "检查资源泄漏"
    ],
    code="def get_user(user_id):\n    ..."
)

jinja2 复制代码

{# prompts/code_review.j2 #}
你是一个有 10 年经验的 {{ language }} 高级开发工程师。

## 审查规则
{% for rule in rules %}
- {{ rule }}
{% endfor %}

## 输出格式
请严格按以下 JSON 格式返回：
[
  {
    "severity": "high|medium|low",
    "line": 行号,
    "message": "问题描述"
  }
]

## 待审查代码
```{{ language }}
{{ code }}

复制代码

### 7.2 Prompt 的版本管理与单元测试

```python
import pytest

class TestCodeReviewPrompt:
    """Prompt 的单元测试"""

    def test_prompt_renders_correctly(self):
        template = env.get_template("code_review.j2")
        prompt = template.render(
            role="Python 代码审查员",
            language="Python",
            rules=["检查 SQL 注入"],
            code="print('hello')"
        )
        assert "Python 代码审查员" in prompt
        assert "检查 SQL 注入" in prompt
        assert "print('hello')" in prompt

    def test_prompt_output_is_valid_json(self):
        """测试模型输出是否为有效 JSON"""
        # 使用固定的测试输入
        test_code = "query = f'SELECT * FROM users WHERE id = {user_id}'"
        # 调用模型...
        # 验证输出是合法 JSON
        import json
        output = client.chat(...)  # 使用测试 Prompt
        result = json.loads(output)
        assert isinstance(result, list)

八、结构化输出控制：100% 可解析

8.1 JSON 模式 + Pydantic 解析

python 复制代码

from pydantic import BaseModel, Field
from typing import List

class CodeIssue(BaseModel):
    severity: str = Field(pattern="^(high|medium|low)$")
    category: str = Field(pattern="^(security|bug|performance|style)$")
    line: int = Field(ge=1)
    message: str = Field(min_length=10)

class CodeReviewResult(BaseModel):
    issues: List[CodeIssue]
    summary: str

# 使用 JSON 模式 + Pydantic 解析
result = client.chat(
    messages=[...],
    response_format={"type": "json_object"}
)

# 解析并验证
try:
    review = CodeReviewResult.model_validate_json(result.content)
    print(f"发现 {len(review.issues)} 个问题")
    for issue in review.issues:
        print(f"  [{issue.severity}] 第 {issue.line} 行: {issue.message}")
except Exception as e:
    print(f"解析失败: {e}")
    print(f"原始输出: {result.content}")

8.2 结构化输出可靠性对比

方法	格式遵守率	实现复杂度	适用场景
自然语言描述	~70%	低	原型验证
JSON Schema	~85%	中	标准 API
`response_format=json_object`	~95%	低	OpenAI 兼容模型
`response_format=json_schema`	~99%	中	GPT-4o 等支持 Strict 模式
Few-shot + 示例	~90%	中	所有模型
Pydantic 后验证	100%（可重试）	中	生产环境

生产环境的推荐组合：response_format={"type": "json_object"} + Pydantic 后验证 + 失败时重试（带更详细的格式说明）。这个组合可以接近 100% 的解析成功率。

九、Prompt A/B 测试与迭代

9.1 系统化的 Prompt 评估

Prompt 工程不能靠"感觉"，需要量化指标驱动迭代。

python 复制代码

from dataclasses import dataclass
from typing import List, Callable

@dataclass
class PromptVariant:
    name: str
    system_prompt: str
    user_prompt_template: str

@dataclass
class TestCase:
    name: str
    input_data: str
    expected_output: str
    evaluate_fn: Callable[[str, str], float]  # (actual, expected) -> score

class PromptABTest:
    """Prompt A/B 测试框架"""

    def __init__(self, client: BaseLLMClient):
        self.client = client
        self.variants: List[PromptVariant] = []
        self.test_cases: List[TestCase] = []

    def add_variant(self, variant: PromptVariant):
        self.variants.append(variant)

    def add_test_case(self, case: TestCase):
        self.test_cases.append(case)

    def run(self) -> Dict[str, Dict[str, float]]:
        """运行所有变体在所有测试用例上的评估"""
        results = {}

        for variant in self.variants:
            variant_scores = {}
            for case in self.test_cases:
                messages = [
                    Message(role="system", content=variant.system_prompt),
                    Message(role="user", content=variant.user_prompt_template.format(
                        input_data=case.input_data
                    ))
                ]
                result = self.client.chat(messages)
                score = case.evaluate_fn(result.content, case.expected_output)
                variant_scores[case.name] = score
            results[variant.name] = variant_scores

        return results

    def report(self, results: Dict):
        """生成对比报告"""
        print("=" * 60)
        print("Prompt A/B 测试报告")
        print("=" * 60)

        for variant_name, scores in results.items():
            avg_score = sum(scores.values()) / len(scores)
            print(f"\n{variant_name}:")
            print(f"  平均得分: {avg_score:.2f}")
            for case, score in scores.items():
                print(f"  {case}: {score:.2f}")

9.2 评估指标设计

指标	计算方式	适用场景
格式正确率	可解析为目标格式的比例	结构化输出任务
内容准确率	与标准答案的匹配度	分类/提取任务
Token 效率	输出长度 / 任务复杂度	成本敏感场景
延迟	首 token 时间 / 总时间	实时交互场景
一致性	同一输入多次调用的方差	稳定性要求高的场景

十、实战：Python 代码审查 Prompt 模板库

10.1 完整 Prompt 模板

python 复制代码

# prompts/python_code_review.j2
你是一个有 10 年经验的 Python 高级开发工程师，曾在大型互联网公司负责核心系统的代码审查。

## 审查维度
1. 安全性：SQL 注入、XSS、命令注入、敏感信息硬编码
2. 正确性：空指针、越界访问、资源泄漏、并发问题
3. 性能：不必要的循环、N+1 查询、内存泄漏
4. 可维护性：过长函数、魔法数字、缺少类型注解

## 输出格式
请严格返回以下 JSON 格式，不要 Markdown 代码块：
{
  "issues": [
    {
      "severity": "high|medium|low",
      "category": "security|correctness|performance|maintainability",
      "line": 1,
      "message": "具体问题描述",
      "suggestion": "修复建议及示例代码"
    }
  ],
  "summary": "总体评价（2-3 句话）"
}

## 待审查代码
```python
{{ code }}

复制代码

### 10.2 完整调用代码

```python
import json
from jinja2 import Environment, FileSystemLoader

# 加载模板
env = Environment(loader=FileSystemLoader("prompts/"))
template = env.get_template("python_code_review.j2")

# 待审查代码
code_to_review = '''
def process_payment(user_id, amount, card_number):
    query = f"UPDATE users SET balance = balance - {amount} WHERE id = {user_id}"
    db.execute(query)
    log.info(f"Processed payment for user {user_id}, card: {card_number}")
    return True
'''

# 渲染 Prompt
prompt = template.render(code=code_to_review)

# 调用模型
messages = [
    Message(role="user", content=prompt)  # 整个 Prompt 作为 User Message
]

result = client.chat(messages, response_format={"type": "json_object"})

# 解析结果
try:
    review = json.loads(result.content)
    print(f"发现 {len(review['issues'])} 个问题\n")
    for issue in review["issues"]:
        print(f"[{issue['severity'].upper()}] {issue['category']}")
        print(f"  第 {issue['line']} 行: {issue['message']}")
        print(f"  建议: {issue['suggestion']}\n")
    print(f"总结: {review['summary']}")
except json.JSONDecodeError:
    print("模型未返回有效 JSON，原始输出：")
    print(result.content)

10.3 典型输出示例

json 复制代码

{
  "issues": [
    {
      "severity": "high",
      "category": "security",
      "line": 3,
      "message": "SQL 注入漏洞：用户输入的 amount 和 user_id 直接拼接到 SQL 语句中",
      "suggestion": "使用参数化查询：db.execute('UPDATE users SET balance = balance - ? WHERE id = ?', (amount, user_id))"
    },
    {
      "severity": "high",
      "category": "security",
      "line": 4,
      "message": "敏感信息泄露：信用卡号被记录到日志中",
      "suggestion": "不要记录完整卡号，只记录后 4 位或使用 token 替代"
    },
    {
      "severity": "medium",
      "category": "correctness",
      "line": 2,
      "message": "缺少输入验证：amount 可能为负数或零",
      "suggestion": "添加参数校验：assert amount > 0, 'amount must be positive'"
    }
  ],
  "summary": "代码存在严重的 SQL 注入漏洞和敏感信息泄露风险，需要立即修复。建议引入 ORM 或参数化查询，并建立日志脱敏机制。"
}

十一、小结

Prompt 工程是一门"约束的艺术"------用结构化的信息降低 LLM 输出的不确定性，而非用更多的文字增加信息量。

System Prompt 的四段式模板（角色 + 任务 + 格式 + 约束）是经过工程验证的可靠结构。输出格式约束从自然语言描述升级到 JSON Schema 再到具体示例，格式遵守率从 70% 提升到 95% 以上。Few-shot 的核心不是示例数量，而是示例质量------多样性、代表性和边界案例的覆盖。

Chain-of-Thought 将准确率提升 15-30% 的秘密在于：自回归生成过程中，显式的中间推理步骤为后续步骤提供了更丰富的上下文。ReAct 模式则将 CoT 从"内部思考"扩展到"外部行动"，让 LLM 可以与搜索引擎、计算器、API 等工具交互。

Jinja2 模板化将 Prompt 从代码中分离，实现了版本管理、复用和单元测试。A/B 测试框架则让 Prompt 迭代从"凭感觉"变成"看数据"。

此前专栏关于 LLM API 统一调用层设计、FastAPI 工程化以及 JSON 数据处理的文章，为本文提供了从 Prompt 构造到模型调用再到结果解析的完整上游支撑。如果本文对 Prompt 工程实践有所启发，欢迎点赞、收藏与关注。