Prompt Engineering 的发展历程

1️⃣ 原始时代：直接问（2020-2021，GPT-3 时代）

最朴素的 prompt engineering 就是直接提问：

问: "What is the capital of France?"

答: "Paris"

这叫 Zero-shot prompting。如果需要更好的效果，就加几个例子：

问: "英国首都是哪？" 答: "伦敦"

问: "日本首都是哪？" 答: "东京"

问: "法国首都是哪？" 答:

这叫 Few-shot prompting。GPT-3 (2020) 论文的核心发现就是：大模型仅通过上下文示例就能学会做任务，不需要微调。这是 Prompt Engineering 的起源。

2️⃣ 思维链时代：把推理过程写出来（2022.1，Wei et al.）

CoT (Chain-of-Thought) 论文发现，如果你在 few-shot 示例里不仅给出答案，还把推理步骤也写出来，模型会效仿并显著提高复杂推理任务的准确率：

问: "Roger有5个网球，他又买了2盒每盒3个的网球。他现在有多少个网球？"

答: "Roger开始有5个球。2盒每盒3个 = 6个球。5 + 6 = 11。答案是11。"

问: "食堂有23个苹果，用了20个做午餐，又买了6个。现在有多少苹果？"

答:

模型就会跟着输出推理链：

"原来有23个苹果。用了20个，剩3个。又买6个，3+6=9。答案是9。"

核心技巧：你在 prompt 里"演"了推理过程，模型就学会了"先想再答"。

3️⃣ ReAct 时代：把"推理"和"行动"交织（2022.10，就是你正在看的这篇）

ReAct 把 CoT 的"推理"扩展了一步------不仅要在脑子里想，还要跟外部世界交互。prompt 的样子从：

CoT: 问 → 思考 → 答案

ReAct: 问 → 思考 → 行动(搜索) → 观察(结果) → 思考 → 行动 → 观察 → 答案

人工怎么做的：标注者实际在维基百科上搜索、浏览、Ctrl+F，然后把自己真实做的每一步（搜索了什么、看到了什么、怎么想的）原原本本写下来，做成 3-6 条示范轨迹。然后把这些轨迹作为 few-shot 示例放进 prompt。

这就是手工标注推理+行动轨迹的 Prompt Engineering。

4️⃣ 自动化时代：让模型自己产生 prompt（2023 至今）

ReAct 之后的发展：

A. Self-Consistency (SC)：同一问题问21次，取多数票（你刚学到的 CoT-SC）

B. Auto-CoT：不用人工写推理链，让模型自动生成示范示例

C. DSPy 等框架：把 prompt engineering 本身变成优化问题------自动搜索最优的 few-shot 示例和指令措辞

D. Function Calling / Tool Use：OpenAI 等把 ReAct 的"调用外部工具"变成了 API 标准功能，不需要手工编写 prompt 模板了

📌 总结

ReAct 论文中原始的 Prompt Engineering 做法就是：

人工亲自执行一遍任务 → 记录下每次"想什么、做什么、看到什么"→ 把这些轨迹作为 few-shot 示例 → 拼接成 prompt → 喂给 PaLM-540B → 模型照葫芦画瓢

虽然今天看起来"原始"，但 ReAct 这一范式直接催生了后来所有的 Agent 框架（LangChain、AutoGPT、OpenAI Function Calling 等）。你今天用的 ChatGPT 的"联网搜索"功能，本质上就是 ReAct 的工业级实现 📚

**Q:**Few-shot prompting 怎么把轨迹给LLM的，直接输入吗?

A: 对，就是直接拼进 prompt 里输入。LLM 本质上只吃文本，你把轨迹当作文本串塞进去就行。

让我用 ReAct 论文的实际 prompt 来展示（来自附录 C）：

🔍 实际长什么样

HotpotQA 的 ReAct Prompt 结构（原文 Appendix C）

Solve a question answering task with interleaving Thought, Action, Observation steps.

Thought can reason about the current situation, and Action can be three types:

(1) Search $entity$ , which searches the exact entity on Wikipedia and returns the first paragraph if it exists.

(2) Lookup $keyword$ , which returns the next sentence containing keyword in the current passage.

(3) Finish $answer$ , which returns the answer and finishes the task.

Here are some examples.

然后直接贴 6 条人工标注的轨迹作为示例，每条长这样：

Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of that area.

Action 1: Search $Colorado orogeny$

Observation 1: The Colorado orogeny was an episode of mountain building in Colorado and surrounding areas.

Thought 2: It does not mention the eastern sector. So I need to look up the eastern sector.

Action 2: Lookup $eastern sector$

Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3: Search $High Plains$

Observation 3: High Plains refers to one of two distinct land regions.

Thought 4: I need to instead search High Plains (United States).

Action 4: Search $High Plains (United States)$

Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).

Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5: Finish $1,800 to 7,000 ft$

...（重复贴5条类似的示例轨迹）...

最后把真正要问的问题接在后面：

Question: Aside from the Apple Remote, what other device can control the program Apple Remote was originally designed to interact with?

模型看到这些之后，就会照着前面的模式输出：

Thought 1: I need to search Apple Remote...

Action 1: Search $Apple Remote$

Observation 1: ...（由外部 Wikipedia API 返回）

Thought 2: ...

...

🧩 本质：Prompt 就是一个"模板 + 填充"

┌──────────────────────────────────────────┐

│ 指令（系统指令） │

│ "Solve a QA task with Thought/Action/Obs"│

├──────────────────────────────────────────┤

│ 示例 1（人工标注轨迹） │

│ Thought 1... Action 1... Obs 1... ... │

├──────────────────────────────────────────┤

│ 示例 2 │

│ Thought 1... Action 1... Obs 1... ... │

├──────────────────────────────────────────┤

│ ...（一共3-6条示例） │

├──────────────────────────────────────────┤

│ 真正的问题 │

│ Question: xxx │

│ │

│ ← 模型从这里开始生成 Thought 1... │

└──────────────────────────────────────────┘

每一步的 Observation 不是 LLM 生成的------而是把模型输出的 Action（如 Search $Apple Remote$ ）发给真正的 Wikipedia API，拿回结果后再拼进下一轮的 prompt：

第1轮: prompt + "Thought 1: ... Action 1: Search $Apple Remote$ "

→ LLM 生成到这里停止

→ 外部执行 Search → 得到结果

第2轮: 上轮prompt + "Observation 1: Apple Remote is..."

→ LLM 生成 "Thought 2: ... Action 2: ..."

→ 外部执行 → 得到结果

...重复直到 LLM 输出 Finish $答案$

📌 一句话总结

就是文本拼接。整个 ReAct 过程没有魔法------手工写好 3-6 条"思考→行动→观察"的示范轨迹，原样贴进 prompt text，LLM 照葫芦画瓢输出。每一步的观察结果由外部环境真实执行后追加回 prompt，形成闭环。这就是 2022 年的"Agent 框架"------纯文本 prompt + 外部工具调用的交替循环 📚