
OpenAI 最新发布的 gpt-oss 模型引入了全新的 Harmony 格式。与我们之前在 Qwen3 等模型中使用的 ChatML 格式截然不同,Harmony 在结构化推理和工具调用方面采用了全新方法。
项目链接:https://github.com/openai/harmony
如果我们正从事推理模型或构建推理基础设施,理解这两种格式至关重要。本文将深入探讨这两种格式,通过并排对比帮助我们理解它们之间的变化及其重要性!
更多细节可以查看:https://huggingface.co/blog/kuotient/chatml-vs-harmony
什么是 ChatML?📝
ChatML(聊天标记语言)一直是许多开源模型(特别是 Qwen 系列)的首选格式。它本质上是一种受 XML 启发的对话结构化方式,旨在清晰地分隔对话的不同部分。
主要特点: * 使用特殊标记 <|im_start|> 和 <|im_end|> 标记消息边界。
- 基于角色的简单结构(系统、用户、助手)。
- 思考/推理内容封装在 <think>...</think>块中。
- 工具定义和调用使用 XML 风格的标签。
- 经过实践检验,已被广泛应用于多种模型。
可以将 ChatML 看作是"经典"方法------直接、类似 XML,并且注重清晰度。
什么是 Harmony?🎭
Harmony 是 OpenAI 专门为 gpt-oss 模型设计的新型响应格式。它不仅仅是一种提示格式,更是对模型如何构造输出(特别是针对复杂推理和工具使用)的全面重新思考。
主要创新点:
- 多通道架构: 消息可以被标记为 analysis(分析/推理)、commentary(工具前导说明)或final(面向用户)。
- 角色层级: system>developer>user>assistant>tool(用于处理指令冲突)。
- 消息路由: 使用 to=语法将消息定向到不同组件。
- TypeScript 风格的工具定义: 比 JSON 模式更具表达力且更简洁。
- 各通道独立的安全标准: analysis通道不像final通道那样进行安全过滤。
Harmony 将模型的输出视为多线程对话,不同类型的内容通过不同的通道流动。
Harmony 的 TypeScript 风格语法:为何重要?🚀Harmony 最有趣的设计选择之一是使用 TypeScript 风格来定义工具,而非 JSON 模式。这不仅仅是语法糖,其背后有充分的理由。
代码定义的力量
正如近期关于代码智能体(Code Agent)的研究所示,用代码表达动作具有以下几个优点:
- 简洁性: 代码操作比 JSON 紧凑约 30%。
- 并行性: 需要 4 个并行流,每个流有 5 个操作?在 JSON 中,这将是 20 个独立的数据块;在代码中,它只是一个表达式。
- 变量管理: 易于存储和引用结果(例如 rock_image = generate_image("rock"))。
- 可读性: 智能体日志变得更加清晰。
- 大型语言模型(LLM)的流畅性: 模型在训练数据中包含大量代码。
工具定义对比
ChatML(JSON 模式):
            
            
              json
              
              
            
          
          {
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
      },
      "required": ["city"]
    }
  }
}Harmony(TypeScript 风格):
            
            
              typescript
              
              
            
          
          namespace functions {
  // Get weather for a city
  type get_weather = (_: {
    city: string,
    unit?: "celsius" | "fahrenheit", // default: celsius
  })=> any;
}TypeScript 版本不仅更简洁,而且对开发者更友好,也更易于大型语言模型(LLM)处理。
深入示例 🏊♂️
让我们看看这些格式如何在真实场景中处理从简单对话到复杂多步骤推理和工具使用的过程。
1. 基本对话(无思考过程)
最简单的情况------只提问题,不涉及任何推理步骤。
ChatML(Qwen3)
            
            
              text
              
              
            
          
          <|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>
</think>
The capital of France is Paris.<|im_end|>
OpenAI Harmony
            
            
              text
              
              
            
          
          <|start|>user<|message|>What's the capitalof France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>
观察: 即使禁用思考功能,Qwen3 仍然包含空的 <think></think> 标签作为结构提示。Harmony 则更直接------它直接跳转到 final通道输出面向用户的内容。
2. 包含思考的对话
这里将清晰展现两者在设计理念上的差异。
ChatML(Qwen3)
            
            
              text
              
              
            
          
          <|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 applesAdding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>
You would have 6apples.
Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>
OpenAI Harmony
            
            
              text
              
              
            
          
          <|start|>user<|message|>If Ihave 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.
Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 -2 = 6).<|return|>
不同之处:
- ChatML 将所有内容(包括思考过程)保留在一条消息中。
- Harmony 将推理过程(analysis通道)和答案(final通道)拆分为独立的消息。
- Harmony 的方法使得程序化地从面向用户的内容中过滤掉推理过程变得更容易。
3. 包含思考的多轮对话
此示例揭示了每种格式如何处理对话历史。
ChatML(Qwen3)
            
            
              text<|im_start|>user
              
              
            
          
          user>What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.
To calculate:80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>
Adding 25 to 12 gives us 37.<|im_end|>
重要提示: Qwen3 会删除之前轮次的思考过程,以节省 token!
OpenAI Harmony
            
            
              text
              
              
            
          
          <|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>
历史管理: 两种格式都会修剪历史推理内容,但 Harmony 的通道系统使其更明确且可控。
4. 函数调用
在处理工具方面,这两种格式存在显著差异。
###ChatML(Qwen3)
            
            
              text
              
              
            
          
          <|im_start|>system
# Tools
You may call one or more functions to assist with the user query.
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>
For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>
I'll check the current weather in Tokyo for you.
<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>
OpenAI Harmony
            
            
              text
              
              
            
          
          <|start|>developer<|message|># Tools
## functions
namespace functions {// Get weather for a city
type get_weather = (_: {
  city: string,
  unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weatherin Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity":65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>
主要差异:
- 工具定义: ChatML 使用 JSON 模式,而Harmony 使用 TypeScript 风格的类型。
- 工具响应: ChatML 将其封装为用户消息;Harmony 则使用专门的工具角色。
- 消息路由: Harmony 的 to=语法明确显示了组件之间的流程。
- 通道: Harmony 将工具交互(commentary)与最终答案分离开来。
5. 包含思维链的函数调用
复杂场景更能展现两种架构的差异。
ChatML(Qwen3)
            
            
              text
              
              
            
          
          <|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status","description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.
<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo,departing at 14:30. Now checking weather.
</think>
<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>
Yes, you should definitely pack an umbrella! YourUA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>
OpenAI Harmony
            
            
              text
              
              
            
          
          <|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} //namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|><|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo ison time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>
在复杂场景中:
- ChatML 在同一个对话轮次内,跨工具调用保留了思考过程。* Harmony 通过独立的 analysis消息明确跟踪每个推理步骤。
- Harmony 的 commentary通道可以在工具执行前包含面向用户的"前言"。
- Harmony 中的消息路由(to=)创建了清晰的执行轨迹。
这对生态系统意味着什么?🌍
Harmony 的引入代表着我们对模型输出思考方式的重大演进:
- 基于通道的安全性: 不同通道采用不同的安全标准,这对推理模型而言是颠覆性的。
- 更好的可观察性: 明确的路由和通道使得复杂智能体行为的调试变得更容易。
- 代码优先的工具: TypeScript 语法可能成为工具定义的新标准。
- 结构化推理: 在格式层面而非仅仅约定上,将思考与输出分离。
关键要点 📌
*ChatML 更简单、更成熟,并获得广泛的生态系统支持。
- Harmony 引入了通道和消息路由等强大的新概念。
- 两种格式都处理推理和工具,但在理念上大相径庭。
- Harmony 的 TypeScript 风格定义更简洁,也更便于开发者使用。
- Harmony 中的通道系统实现了对用户可见内容的精细控制。
随着生态系统的发展,理解这两种格式至关重要。虽然我们无法选择模型使用的具体格式(当然,除非我们从基础模型开始训练),但了解它们的工作原理有助于我们构建更好的推理基础设施,并最大限度地发挥这些强大推理模型的作用。