这篇文章瞄准的是官方文档中 Tutorial
章节的 Agents
小节,介绍了如何定义与使用Agent。
- 官网链接:https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/agents.html# ;
【注意】:这篇文章比较长,但涵盖了大部分常用的Agent功能,建议耐心看完。
Agents
AutoGen 自身提供一些预设好的 Agent,我们可以随便打开一个Agent看看微软如何实现类与注释,以及如果我们想要定义自己的Agent需要重写哪些函数:
python
from autogen_agentchat.agents import AssistantAgent
agent = AssistantAgent()
在 vscode
中直接点击 AssistantAgent
就可以查看具体实现:

那么我们就必须对下面这些成员函数与成员变量进行重写:
name
: 唯一的Agent名;description
: 对这个Agent的描述;on_messages()
: Agent向LLM发送信息并返回一个Response
对象,同时强调Agent应该维护一个状态而不是每次都要将完整的历史信息传递给模型;on_messages_stream()
: 和上面那个函数功能一致,但返回的对象是一个迭代器AgentEvent
或者ChatMessage
对象;on_reset()
: 重置Agent为初始状态;run()
andrun_stream()
: 用来调用成员函数on_messages()
、on_messages_stream()
的便捷方法;
Assistant Agent && Getting Responses
我这里将 Assistant Agent
和 Getting Responses
这两段内容合并到一块儿写,因为前者告诉你怎样定义,后者告诉你怎样调用,本质是做了同一个动作的示范,如果单独运行都不会得到任何输出。
Assistant Agent
是 AutoGen 库自带的一个 Agnet 也是官方文档中介绍的第一个Agent,地位等价于 smolagents 的 CodeAgent
,但其作用更加广泛,是字面意义上的"助手"。此外,示例中的Tool和smolagents库中的Tool是一个概念,都需要我们事先定义一个给Agent使用的工具,这种设计也是现代AI Agent的一个惯例,目的是防止以下两点:
- 因为LLM 幻觉生成了危险的代码;
- 被人恶意微调的模型有目的性地生成危险代码;
官网示例中对工具 web_search
没有具体实现,而是直接返回了一个固定的字符串,如果你想要对其进行实现就需要手动修改这部分内容:
python
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_agentchat.ui import Console
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient
import os, asyncio
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
# 定义了一个能够检索网页的工具
async def web_search(query: str) -> str:
"""Find information on the web"""
return "AutoGen is a programming framework for building multi-agent applications."
# 创建OpenAI-4o模型
model_client = OpenAIChatCompletionClient(
model="gpt-4o",
# api_key="YOUR_API_KEY",
)
# 定义Agent
agent = AssistantAgent(
name="assistant",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks.",
)
# 定义异步阻塞调用函数
async def assistant_run() -> None:
response = await agent.on_messages(
[TextMessage(content="Find information on AutoGen", source="user")],
cancellation_token=CancellationToken(),
)
print(response.inner_messages)
print(response.chat_message)
# 从对话中抽取出模型具体返回内容
print("Final Contant:")
print(response.chat_message.content)
# 异步调用
asyncio.run(assistant_run())
运行结果如下:
bash
$ python demo.py
[ToolCallRequestEvent(source='assistant', models_usage=RequestUsage(prompt_tokens=61, completion_tokens=16), content=[FunctionCall(id='call_i1it9b6ST7U6QNGknXMDpjCD', arguments='{"query":"AutoGen"}', name='web_search')], type='ToolCallRequestEvent'), ToolCallExecutionEvent(source='assistant', models_usage=None, content=[FunctionExecutionResult(content='AutoGen is a programming framework for building multi-agent applications.', call_id='call_i1it9b6ST7U6QNGknXMDpjCD', is_error=False)], type='ToolCallExecutionEvent')]
source='assistant' models_usage=None content='AutoGen is a programming framework for building multi-agent applications.' type='ToolCallSummaryMessage'
Final Contant:
AutoGen is a programming framework for building multi-agent applications.
可以看见这里模型直接调用了工具 web_search
并返回了我们指定的字符串。
根据官网描述 on_messages
方法返回的是一个Response
对象,这个对象中的 chat_message
属性是模型的最终响应结果 、inner_messages
是模型的思考过程 ,那么显然,如果我们只想要模型响应的文字部分,直接提取chat_message.content
属性就可以。除此之外,官网文档中给了两个Note:
【Note 1 】:因为on_messages
方法会对Agent内部状态进行更新,这会将每次的聊天记录追加到Agent的历史记录中,因此 你需要避免同一个问题反复调用,或者将所有对话记录都作为参数传递进去;
如果你自己脱离开源框架写过Agent,那么你肯定会遇到模型无法记住和你历史对话的情况,因为对于纯模型而言你的每次输入就是一个新的请求,如果你想要让模型记住之前和它聊了什么,就需要自己组织格式然后将一段时间内聊天记录+当前新的问题打包好发给模型。Agent最大的作用就是帮你完成了这一步。微软在这里强调了你不必再去管理聊天记录。
【Note 2 】:这个note是告诉你当前版本与 v0.2
版本之间的差异,默认状态下Agent会直接返回工具的执行结果,而不需要你二次调用。
这一小段末尾官方还说可以用 run()
代替 on_messages()
使用但没有给出示例,我这边写上来,只需要修改上面代码中的 assistant_run()
函数即可:
python
async def assistant_run() -> None:
response = await agent.run(task="Find information on AutoGen")
print(response.messages[-1].content)
运行结果如下:
python
$ python demo.py
AutoGen is a programming framework for building multi-agent applications.
Multi-Modal Input
AssistantAgent
可以通过 MultiModalMessage
消息来处理 多模态 输入。官方demo展示了使用 MultiModalMessage
将文本与图像绑定起来作为模型输入,这一点要比smolagents方便不少。
python
from io import BytesIO
import asyncio, os
import PIL
import cv2
import numpy as np
import requests
from autogen_agentchat.messages import MultiModalMessage
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core import CancellationToken
from autogen_core import Image
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
# 随机生成一个size=300*200的图像
pil_image = PIL.Image.open(BytesIO(requests.get("https://picsum.photos/300/200").content))
img = Image(pil_image)
multi_modal_message = MultiModalMessage(content=["Can you describe the content of this image?", img], source="user")
img
# 将拉取下来的图像保存一下
opencv_img = np.array(pil_image)
opencv_img = cv2.cvtColor(opencv_img, cv2.COLOR_RGB2BGR)
cv2.imwrite("./image.jpg", opencv_img)
# 定义一个使用OpenAI接口的模型client
open_ai_model_client = OpenAIChatCompletionClient(
model="gpt-4o",
)
# 定义一个Agent并绑定模型client
agent = AssistantAgent(
name="assistant",
model_client=open_ai_model_client,
)
# 定义异步阻塞函数
async def main():
response = await agent.on_messages([multi_modal_message], CancellationToken())
print(response.chat_message.content)
# 执行
if __name__ == '__main__':
asyncio.run(main())
运行结果如下:
bash
$ python demo.py
This black and white image shows a path leading up a hill, topped with grass and surrounded by a fence on one side. The landscape is hilly with rolling terrain in the background. The sky appears overcast.

Streaming Messages
AutoGen库同样也提供了流式输出方法,让你能知道模型整个输出过程,最重要的是可以让你知道模型没有卡住:
python
from io import BytesIO
import asyncio, os
from autogen_agentchat.messages import TextMessage
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core import CancellationToken
from autogen_agentchat.ui import Console
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
# 定义一个使用OpenAI接口的模型client
open_ai_model_client = OpenAIChatCompletionClient(
model="gpt-4o",
)
# 定义一个Agent并绑定模型client
agent = AssistantAgent(
name="assistant",
model_client=open_ai_model_client,
)
async def assistant_run_stream() -> None:
# Option 1: read each message from the stream (as shown in the previous example).
# async for message in agent.on_messages_stream(
# [TextMessage(content="Find information on AutoGen", source="user")],
# cancellation_token=CancellationToken(),
# ):
# print(message)
# Option 2: use Console to print all messages as they appear.
await Console(
agent.on_messages_stream(
[TextMessage(content="Find information on AutoGen", source="user")],
cancellation_token=CancellationToken(),
),
output_stats=True, # Enable stats printing.
)
asyncio.run(assistant_run_stream())
运行结果如下:
bash
$ python demo.py
---------- assistant ----------
AutoGen is a tool used in software development to automate the generation of code and other repetitive tasks. It is particularly useful for creating programs that require extensive boilerplate code, such as configuration files, state machines, and command line interface utilities. By automating the generation of such code, AutoGen can save time and reduce errors in the development process.
AutoGen works by using a set of templates and specifications. Developers define what they need in these templates, and AutoGen processes them to create the desired output automatically. This makes it easier to maintain and update code, as changes can be made in one place and automatically applied across all generated parts.
It is especially popular in projects that follow the DRY (Don't Repeat Yourself) principle, which emphasizes the importance of reducing duplication in code. By encapsulating patterns and repetitive code into a template, AutoGen helps ensure consistency and efficiency.
Please note that there are various tools and frameworks that offer similar functionalities to AutoGen, and the specific implementation details might vary. If you are looking for a specific version or type of AutoGen, please provide more details.
TERMINATE
[Prompt tokens: 41, Completion tokens: 224]
---------- Summary ----------
Number of inner messages: 0
Total prompt tokens: 41
Total completion tokens: 224
Duration: 8.38 seconds
你在这可能注意到了上面的代码并没有提供 web_search
这个工具,但Agent仍然能够输出模型内容,这是因为你是用的是OpenAI API,等价于直接在网页和 gpt-4o 对话。
Using Tools
大型语言模型LLM通常仅限于生成文本,但是许多复杂的任务都需要使用特定操作的外部工具,例如从API或数据库中获取数据,以及我们之前一直是用的网页搜索。现代LLM现在可以接受可用工具模式列表(工具及其参数的描述)并生成工具调用消息。这被称为工具调用 **Tool Calling ** 或函数调用 Function Calling,官方在这里对什么是工具调用以及底层具体怎么实现提供了两个拓展链接,感兴趣的可以学习下:
- OpenAI:https://platform.openai.com/docs/guides/function-calling;
- Anthropic:https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview ;
AutoGen库在工具上也提供了自定义功能,可以是一个Python函数也可以是继承于autogen_core.tools.BaseTool
的对象。最强大的功能来了:如果你的tool不能返回统一格式的字符串,可以通过参数 reflect_on_tool_use=True
让Angent总结工具的输出内容。
这个功能可以大幅缩减我们的工作量,因为你自定义的工具也可以是调用模型得到的结果,但由于模型每次输出的格式都有可能变化,如果对每种潜在的输出格式都进行统一那工作量是爆炸的。
Built-in Tools
AutoGen 提供了下面这些给Assistant Agent
内置的工具可以直接使用,更多信息可以查看API文档 autogen_ext.tools 中红框内的几条:
graphrag
:使用graphrag索引的工具。http
:用于发出http请求的工具。langchain
:使用langchain工具的适配器。mcp
:使用模型聊天协议(mcp)服务器的工具。
Function Tool
和smolagents库一样,可以将一个函数定义成Agent使用的工具,也需要对这个函数的功能进行描述,但不用对输入输出进行强制说明,下面对比一下就能深刻体会了:
- AutoGen 定义函数工具:
python
async def web_search_func(query: str) -> str:
"""Find information on the web"""
return "AutoGen is a programming framework for building multi-agent applications."
- smolagents定义函数工具:
python
@tool
def web_search_func(query: str) -> str:
""" This tool is use for search web resources about query.
Args:
query: Search keyword or describtion.
Returns:
str: Search result.
"""
return "AutoGen is a programming framework for building multi-agent applications."
官方示例如下:
python
from autogen_core.tools import FunctionTool
async def web_search_func(query: str) -> str:
"""Find information on the web"""
return "AutoGen is a programming framework for building multi-agent applications."
web_search_function_tool = FunctionTool(web_search_func, description="Find information on the web")
print(web_search_function_tool.schema)
运行结果:
bash
$ python demo.py
{'name': 'web_search_func', 'description': 'Find information on the web', 'parameters': {'type': 'object', 'properties': {'query': {'description': 'query', 'title': 'Query', 'type': 'string'}}, 'required': ['query'], 'additionalProperties': False}, 'strict': False}
Model Context Protocol Tools
AssistantAgent
还可以使用模型上下文协议(Model Context Protocol , MCP)服务器调用 mpc_server_tools()
提供的工具,在运行之前还需要安装以下依赖:
bash
$ pip install mcp json_schema_to_pydantic uv
官网给示例同样无法直接运行,对其稍微修改:
python
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.tools.mcp import StdioServerParams, mcp_server_tools
import os, asyncio
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
async def main():
fetch_mcp_server = StdioServerParams(command="uvx", args=["mcp-server-fetch"])
tools = await mcp_server_tools(fetch_mcp_server)
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(name="fetcher", model_client=model_client, tools=tools, reflect_on_tool_use=True) # type: ignore
result = await agent.run(task="Summarize the content of https://en.wikipedia.org/wiki/Seattle")
print(result.messages[-1].content)
asyncio.run(main())
运行后会自动下载一些库:
python
$ python demo.py
Installed 33 packages in 24ms
Warning: node executable not found, reverting to pure-Python mode. Install Node.js v10 or newer to use Readability.js.
Seattle is the most populous city in the state of Washington and the Pacific Northwest region of North America. As of 2023, it has a population of 755,078, making it the 18th-most populous city in the United States. Seattle is known for its rapid growth, with a significant increase in population from 2010 to 2020. It is located on an isthmus between Puget Sound and Lake Washington and serves as the county seat of King County. Seattle's metropolitan area population is over 4 million, ranking it the 15th-most populous in the U.S.
Seattle's nickname is "The Emerald City," a reference to the lush evergreen forests of the area. It is home to landmarks such as the Space Needle, Mount Rainier, and the Olympic Mountains. The city was founded on November 13, 1851, and named after Chief Seattle. Seattle operates under a mayor--council government system with a land area of 142.07 square miles.
TERMINATE
Langchain Tools
AutoGen也兼容了 Langchain
库,需要安装下面的依赖:
bash
$ pip install langchain_experimental
对官方示例进行修改:
python
import pandas as pd
from autogen_ext.tools.langchain import LangChainToolAdapter
from langchain_experimental.tools.python.tool import PythonAstREPLTool
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.ui import Console
from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken
import asyncio, os
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
async def main():
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
tool = LangChainToolAdapter(PythonAstREPLTool(locals={"df": df}))
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(
"assistant", tools=[tool], model_client=model_client, system_message="Use the `df` variable to access the dataset."
)
await Console(
agent.on_messages_stream(
[TextMessage(content="What's the average age of the passengers?", source="user")], CancellationToken()
),
output_stats=True,
)
asyncio.run(main())
运行结果如下:
bash
$ python demo.py
---------- assistant ----------
[FunctionCall(id='call_gnmUHvN6lL1SaotvibU5okVZ', arguments='{"query":"# Calculate the average age of passengers in the dataset\\ndf[\'age\'].mean()"}', name='python_repl_ast')]
[Prompt tokens: 111, Completion tokens: 34]
---------- assistant ----------
[FunctionExecutionResult(content="KeyError: 'age'", call_id='call_gnmUHvN6lL1SaotvibU5okVZ', is_error=False)]
---------- assistant ----------
KeyError: 'age'
---------- Summary ----------
Number of inner messages: 2
Total prompt tokens: 111
Total completion tokens: 34
Duration: 3.54 seconds
Parallel Tool Calls
一些模型支持并行工具调用,默认情况下如果模型client产生多个工具调用,AssistantAgent
会自动并行调用这些工具。如果你想要控制并行调用这个过程,那么需要在模型client这个级别上进行约束。
OpenAIChatCompletionClient
和 AzureOpenAIChatCompletionClient
这两个模型client都支持并行工具调用,可以通过设置参数parallel_tool_calls=False
禁止模型并行调用工具,这里因为只涉及到一个参数值修改,所以我就直接贴上官方代码不跑一边了:
python
model_client_no_parallel_tool_call = OpenAIChatCompletionClient(
model="gpt-4o",
parallel_tool_calls=False, # type: ignore
)
agent_no_parallel_tool_call = AssistantAgent(
name="assistant",
model_client=model_client_no_parallel_tool_call,
tools=[web_search],
system_message="Use tools to solve tasks.",
)
Structured Output
结构化输出允许模型返回预定义模式JSON文本,该模式可以作为Pydantic BaseModel
类输入内容,其实就是 限定模型输出范围,只允许模型返回你指定的几个元素中的一个或多个。
【注意】:结构化输出仅适用于支持它的模型 ,同时还要求模型client也支持结构化输出。目前只有 OpenAIChatCompletionClient
和AzureOpenAIChatComplete Client
支持结构化输出。
结构化输出最大的好处在于:可以将思维链推理添加到代理的响应中。
python
from typing import Literal
from pydantic import BaseModel
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
import asyncio, os
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
# 定义结构化响应
class AgentResponse(BaseModel):
thoughts: str
response: Literal["happy", "sad", "neutral"]
# 定义模型client
model_client = OpenAIChatCompletionClient(
model="gpt-4o",
response_format=AgentResponse, # type: ignore
)
agent = AssistantAgent(
"assistant",
model_client=model_client,
system_message="Categorize the input as happy, sad, or neutral following the JSON format.",
)
asyncio.run(Console(agent.run_stream(task="I am happy.")))
运行结果如下:
bash
$ python demo.py
---------- user ----------
I am happy.
---------- assistant ----------
{"thoughts":"The user explicitly states they are feeling happy.","response":"happy"}
Streaming Tokens
可以通过设置model_client_stream=True
来流式传输模型client生成的Token,大部分模型都支持流式输出结果,如果你的模型运行后报错需要先和模型提供方确认是否支持流式输出。
对官方示例进行修改:
python
import os
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken
from autogen_ext.models.openai import OpenAIChatCompletionClient
os.environ["OPENAI_API_KEY"] = "你的OpenAI API Key"
model_client = OpenAIChatCompletionClient(model="gpt-4o")
streaming_assistant = AssistantAgent(
name="assistant",
model_client=model_client,
system_message="You are a helpful assistant.",
model_client_stream=True, # 启用流式响应
)
async def main():
async for message in streaming_assistant.on_messages_stream( # type: ignore
[TextMessage(content="Name two cities in South America", source="user")],
cancellation_token=CancellationToken(),
):
print(message)
asyncio.run(main())
运行结果如下:
bash
$ python demo.py
source='assistant' models_usage=None content='Two' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' cities' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' South' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' America' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' are' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Buenos' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Aires' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' which' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' is' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' the' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' capital' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' of' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Argentina' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' and' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Rio' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' de' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Janeiro' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' which' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' is' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' a' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' major' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' city' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' in' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' Brazil' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='.' type='ModelClientStreamingChunkEvent'
Response(chat_message=TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0), content='Two cities in South America are Buenos Aires, which is the capital of Argentina, and Rio de Janeiro, which is a major city in Brazil.', type='TextMessage'), inner_messages=[])
Using Model Context
AssistantAgent
中的model_context
可以传入ChatCompletionContext
对象,允许Agent使用不同的模型输出的上下文,默认情况下,AssistantAgent
使用UnboundedChatCompletionContext
,其作用是将完整的对话历史发送到模型。
如果你只想给模型传递最后n条信息的上下文,可以使用BufferedChatCompletionContext
做限定,这里因为也是只修改了参数,就不具体运行了。
python
from autogen_core.model_context import BufferedChatCompletionContext
# Create an agent that uses only the last 5 messages in the context to generate responses.
agent = AssistantAgent(
name="assistant",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks.",
model_context=BufferedChatCompletionContext(buffer_size=5), # Only use the last 5 messages in the context.
)
恭喜你,到达这里说明你耐心地看完了整片文章,说实话对于AutoGen文档中的这篇内容排版我不是很喜欢,因为一次性的知识太多,如果将其拆解成 工具调用 和 上下文限制 这两部分的话效果会好很多也更容易啃。
总而言之,任重而道远。