从Chat Completions到Responses，OpenAI Agent接口设计的演变

OpenAI在2025年推出了Responses API，在他们的官方宣传中，可以看到他们一直希望使用这个来替代Chat Completions。在这之前，Chat Completions已经成为了主流的通用接口规范，由于该接口每次请求都需要带上完整上下文，服务端无需管理会话状态，因此更易扩展、能高并发处理，使其在实际应用中取得了广泛的成功。

现如今主推Responses API，并不仅仅是命名更新，而是为Agent、多模态、检索增强等应用场景进行的架构升级，以期成为另一个Chat Completions的标准制定者。

设计特点

● 输入更简洁：统一用input，避免冗余的messages。

● 上下文更自然：支持previous_response_id，无需拼接历史对话。

● 工具调用升级：从functions进化为tools，并内置搜索/文件检索能力。

● 多模态支持：原生支持文本 + 图像，未来可扩展音频/视频。

● 事件流机制：不仅能流式输出文本，还能捕获工具调用事件。

python 复制代码

# Chat Completion返回内容
{
    "id": "chatcmpl-B9MHDbslfkBeAs8l4bebGdFOJ6PeG",
    "object": "chat.completion",
    "created": 1741570283,
    "model": "gpt-4o-2024-08-06",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The image shows a wooden boardwalk path running through a lush green field or meadow. The sky is bright blue with some scattered clouds, giving the scene a serene and peaceful atmosphere. Trees and shrubs are visible in the background.",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 1117,
        "completion_tokens": 46,
        "total_tokens": 1163,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0,
            "audio_tokens": 0,
            "accepted_prediction_tokens": 0,
            "rejected_prediction_tokens": 0
        }
    },
    "service_tier": "default",
    "system_fingerprint": "fp_fc9f1d7035"
}

python 复制代码

# Response API返回内容
{
    "id": "resp_67ccd3a9da748190baa7f1570fe91ac604becb25c45c1d41",
    "object": "response",
    "created_at": 1741476777,
    "status": "completed",
    "error": null,
    "incomplete_details": null,
    "instructions": null,
    "max_output_tokens": null,
    "model": "gpt-4o-2024-08-06",
    "output": [
        {
            "type": "message",
            "id": "msg_67ccd3acc8d48190a77525dc6de64b4104becb25c45c1d41",
            "status": "completed",
            "role": "assistant",
            "content": [
                {
                    "type": "output_text",
                    "text": "The image depicts a scenic landscape with a wooden boardwalk or pathway leading through lush, green grass under a blue sky with some clouds. The setting suggests a peaceful natural area, possibly a park or nature reserve. There are trees and shrubs in the background.",
                    "annotations": []
                }
            ]
        }
    ],
    "parallel_tool_calls": true,
    "previous_response_id": null,
    "reasoning": {
        "effort": null,
        "summary": null
    },
    "store": true,
    "temperature": 1,
    "text": {
        "format": {
            "type": "text"
        }
    },
    "tool_choice": "auto",
    "tools": [],
    "top_p": 1,
    "truncation": "disabled",
    "usage": {
        "input_tokens": 328,
        "input_tokens_details": {
            "cached_tokens": 0
        },
        "output_tokens": 52,
        "output_tokens_details": {
            "reasoning_tokens": 0
        },
        "total_tokens": 380
    },
    "user": null,
    "metadata": {}
}

从这两个接口返回的数据类型可以看出，Responses API输出的信息更加丰富。Chat Completions主要用于对话生成，输入是messages，输出的所有信息都在choices中，甚至连工具调用也在这个字段中，仅仅多出一个usage用于监控token的消耗量。相比之下，Responses API主要是为了满足现代Agent 应用而开发出来的，输入是更通用的input（可以包含文本、多模态），输出不仅有纯文本 output_text，还包含工具调用、事件流、用量统计、状态 ID 等，该接口维护了状态信息，因此允许后台调用，这对于那些耗时长的Agent任务特别合适。

使用对比

1 基础对话

python 复制代码

# Chat Completions
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain what an API is"}]
)
print(resp.choices[0].message.content)

# Responses API
resp = client.responses.create(
    model="gpt-4o-mini",
    input="Explain what an API is"
)
print(resp.output[0].content[0].text)

2 多轮对话

python 复制代码

# Chat Completions
messages = [{"role": "user", "content": "Explain what an API is"}]
resp1 = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
messages.append({"role": "assistant", "content": resp1.choices[0].message.content})
messages.append({"role": "user", "content": "Summarize in one sentence"})
resp2 = client.chat.completions.create(model="gpt-4o-mini", messages=messages)

# Responses API
resp1 = client.responses.create(model="gpt-4o-mini", input="Explain what an API is")
resp2 = client.responses.create(
    model="gpt-4o-mini",
    previous_response_id=resp1.id,
    input="Summarize in one sentence"
)

在Chat Completions里，如果你要做多轮对话，需要手动拼接历史 messages，或者自己裁剪上下文。而Responses API提供了两种选择，并且兼容对Chat Completions的调用习惯：

● 无状态调用：和以前一样，只发本轮输入，轻量快捷。

● 有状态调用：传入previous_response_id，平台会自动承接上下文。这样，状态管理既可以交给平台，也可以继续自己控制，灵活度更高。

3 工具调用

python 复制代码

# Chat Completions
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    functions=[{
        "name": "get_weather",
        "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
    }]
)

# Responses API
resp = client.responses.create(
    model="gpt-4o-mini",
    input="What's the weather in Paris?",
    tools=[{
        "name": "get_weather",
        "type": "function",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}}
        }
    }]
)

Responses API把所有能力整合进了tools：

● 包含多种内置工具：如Web 搜索、文件检索、代码解释器、图像生成等

● 支持远程工具：允许用户通过MCP把数据库、API、SaaS等其他服务接入

相比之下，在Chat Completions里，虽然支持function calling，但是一般需要用户自行定义相关函数，而且缺乏统一的事件流进行维护。

4 内置工具

Web 搜索：

python 复制代码

# Chat Completions
# 不支持

# Responses API
# 内置 web_search_preview
resp = client.responses.create(
    model="gpt-4.1",
    tools=[{"type": "web_search_preview"}],
    input="What are the latest AI news today?"
)

文件搜索：

python 复制代码

# Chat Completions
# 需自行构建Embedding + Vector DB

# Responses API
# 内置 file_search
resp = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "file_search",
        "vector_store_ids": ["vs_123"],
        "max_num_results": 5
    }],
    input="Summarize the research about transformers"
)

5 流式输出

python 复制代码

# Chat Completions
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

# Responses API
# 以事件流的形式返回
stream = client.responses.stream(
    model="gpt-4o-mini",
    input="Write a poem"
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

Chat Completions的流式返回就是token流，它的流式数据主体是choices 数组，每个chunk包含delta，需要通过拼接delta.content来构成最终的对话内容。

python 复制代码

// Chat Completions Stream Chunk
{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "created": 171787,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "delta": { "role": "assistant", "content": "Hello" }
      ...
    }
  ]
}

Responses API的流式返回是一个叫事件流的东西，包含了多种状态，如：token增量、工具调用事件、工具结果事件、推理过程等。

python 复制代码

// Responses API Stream Chunk Example: Reasoning Summary
{
  "type": "response.reasoning_summary_text.delta",
  "item_id": "rs_...",
  "output_index": 0,
  "summary_index": 0,
  "delta": "Thinking about the user's question..."
}

对比汇总

特性	Chat Completions	Responses API
基础调用	messages=[...]	input="..."（也兼容旧写法）
上下文	手动拼接 messages	previous_response_id
工具调用	functions	tools（统一函数 & 内置工具）
Web 搜索	不支持	内置
文件检索	不支持	内置
流式输出	增量token	事件流（含工具调用事件）
多模态	部分支持	原生支持文本+图像

一些思考

OpenAI将MCP集成至其Responses API中，可视为对Anthropic在智能体架构领域所提出标准的一次重要响应。这一举措表明，OpenAI并不满足于仅仅作为大模型提供商，而是有意主动参与并影响下一代 Agent设计框架的演进方向，力图在多元智能体生态中占据关键位置。其新推出的Responses API虽然引入了有状态的管理能力，与传统的无状态 Chat Completions形成明显区别，在复杂会话控制方面表现更为灵活。

目前来看，不少框架已经及时更新了对Responses API的兼容。如vLLM v0.10版本已经实现了Responses API的接口，可以使用OpenAI的SDK进行调用。不过，由于Responses API是有状态的，并且内置了一些工具和向量存储，而vLLM本身是无状态的高效推理引擎，不会维护会话等状态信息，所以实际用起来并不会完全一致。这点与Chat Completions不同。若要真正扩大行业号召力、推动广泛采用，仅靠OpenAI一家的推动仍显不足，仍亟需更多开发框架、工具链及第三方生态的支持。