OpenManus 原理浅析（二）——4个 Agent 类的实现

前言

上一篇文章中介绍了 OpenManus Agent 的基本运行流程，其中提及"有4个类会参与其中，按照继承关系从子到父分别是：Manus -> ToolCallAgent -> ReActAgent -> BaseAgent。除此之外，还有一些工具类也会参与到流程中"。本文就稍微详细的分析这4个与 Agent 运行流程相关的类的实现。

BaseAgent

BaseAgent 是一个管理代理状态和执行的抽象基类。提供状态管理、内存管理和执行循环等基础功能。BaseAgent 的子类必须实现 step 方法。

Manus.run 就是继承自 BaseAgent，在 run 方法中，通过 while 循环不断调用 step 方法，直到满足退出条件为止。

基类

BaseAgent 继承自 BaseModel 和 ABC。

BaseModel 是 Pydantic 库提供的基类，提供数据验证、字段管理、序列化等功能。

ABC (Abstract Base Class) 是 Python 标准库中的抽象基类，提供接口定义、防止实例化等功能。

核心 run 方法

BaseAgent 的基本工作流程在 run 方法中：

python 复制代码

async def run(self, request: Optional[str] = None) -> str:
    """Execute the agent's main loop asynchronously.

    Args:
        request: Optional initial user request to process.

    Returns:
        A string summarizing the execution results.

    Raises:
        RuntimeError: If the agent is not in IDLE state at start.
    """
    # 如果当前 Agent 不空闲，抛出异常
    if self.state != AgentState.IDLE:
        raise RuntimeError(f"Cannot run agent from state: {self.state}")

    if request:
        # 将用户输入添加到记忆中
        self.update_memory("user", request)

    results: List[str] = []
    async with self.state_context(AgentState.RUNNING):
        # 开启循环，结束条件是 达到最大步骤 或 状态达到FINISHED
        while (
            self.current_step < self.max_steps and self.state != AgentState.FINISHED
        ):
            self.current_step += 1
            logger.info(f"Executing step {self.current_step}/{self.max_steps}")
            step_result = await self.step()

            # Check for stuck state
            # 判断 agent 是否陷入循环对话
            if self.is_stuck():
                self.handle_stuck_state()

            results.append(f"Step {self.current_step}: {step_result}")

        if self.current_step >= self.max_steps:
            self.current_step = 0
            self.state = AgentState.IDLE
            results.append(f"Terminated: Reached max steps ({self.max_steps})")
    await SANDBOX_CLIENT.cleanup()
    return "\n".join(results) if results else "No steps executed"

run 方法调用时，会先判断当前 Agent 是否空闲，并将用户输入添加到记忆中。

run 方法的核心，或者说 Agent 的核心是一个循环：

在每次循环中会调用由子类实现的 step 方法。
在 step 方法返回后判断 Agent 是否陷入循环对话。如果陷入循环对话，则通过 handle_stuck_state 方法设置下一步行动提示词，告知 Agent 考虑新策略并避免重复无效路径。
循环的结束条件是达到最大步骤或状态达到 FINISHED。

关键属性

system_prompt：系统层级的提示词，设置为 None，由子类赋值。
next_step_prompt：下一步行动的提示，设置为 None，由子类赋值。
llm：大模型实例，LLM 类型，详见下一篇文章。
memory：记忆实例，Memory 类型，详见下一篇文章。
state：Agent 状态，AgentState 枚举类型，包括 IDLE（空闲状态）、RUNNING（正在执行任务）、FINISHED（已完成任务）、ERROR（执行过程中发生错误）四种状态。
max_steps：最大步骤，默认为10。
current_step：当前步骤。
duplicate_threshold：重复阈值，当 Agent 产出的消息重复数量超过该值，则认为 Agent 陷入循环对话。默认为2。

关键方法

initialize_agent

对模型进行验证，在创建 BaseAgent 或其子类的实例后执行，这个功能是由 BaseModel 提供的。

python 复制代码

# model_validator 对模型进行验证，after 表示在字段验证之后执行
@model_validator(mode="after")
def initialize_agent(self) -> "BaseAgent":
    """Initialize agent with default settings if not provided."""
    if self.llm is None or not isinstance(self.llm, LLM):
        self.llm = LLM(config_name=self.name.lower())
    if not isinstance(self.memory, Memory):
        self.memory = Memory()
    return self

在 initialize_agent 中会创建 LLM 和 Memory 实例。

state_context

改变 Agent 状态的方法，准确的说，是一个上下文管理器。在进入该上下文时，Agent 状态会被修改为新值，退出上下文时，Agent 状态会被还原。

python 复制代码

@asynccontextmanager
async def state_context(self, new_state: AgentState):
    """Context manager for safe agent state transitions.

    Args:
        new_state: The state to transition to during the context.

    Yields:
        None: Allows execution within the new state.

    Raises:
        ValueError: If the new_state is invalid.
    """
    if not isinstance(new_state, AgentState):
        raise ValueError(f"Invalid state: {new_state}")

    previous_state = self.state
    self.state = new_state
    try:
        yield
    except Exception as e:
        self.state = AgentState.ERROR  # Transition to ERROR on failure
        raise e
    finally:
        # 无论执行成功与否，都自动恢复到原始状态
        self.state = previous_state  # Revert to previous state

state_context 方法被 asynccontextmanager 装饰，即 state_context 方法提供一个上下文管理器，可以被 Python 的 with 语法调用，调用时代码的执行顺序是：

with 语句首先执行 yield 之前的语句；
然后，由 yield 把变量输出出去，执行 with 语句内部的所有语句；
最后执行 yield 之后的语句。

更多关于 Python 的 with 语法及上下文管理器可以参考：

update_memory

更新记忆：

python 复制代码

def update_memory(
    self,
    role: ROLE_TYPE,  # type: ignore 是 user、system、assistant、tool 中的一个
    content: str,
    base64_image: Optional[str] = None,
    **kwargs,
) -> None:
    """Add a message to the agent's memory.

    Args:
        role: The role of the message sender (user, system, assistant, tool).
        content: The message content.
        base64_image: Optional base64 encoded image.
        **kwargs: Additional arguments (e.g., tool_call_id for tool messages).

    Raises:
        ValueError: If the role is unsupported.
    """
    message_map = {
        "user": Message.user_message,
        "system": Message.system_message,
        "assistant": Message.assistant_message,
        "tool": lambda content, **kw: Message.tool_message(content, **kw),
    }

    if role not in message_map:
        raise ValueError(f"Unsupported message role: {role}")

    # Create message with appropriate parameters based on role
    kwargs = {"base64_image": base64_image, **(kwargs if role == "tool" else {})}
    self.memory.add_message(message_map[role](content, **kwargs))

采用策略模式，根据传入的角色不同，创建对应类型的消息，并添加到记忆中。

is_stuck

检测 Agent 是否陷入循环对话：

python 复制代码

def is_stuck(self) -> bool:
    """Check if the agent is stuck in a loop by detecting duplicate content"""
    # 消息数量少于2条，不认为陷入循环
    if len(self.memory.messages) < 2:
        return False

    # 获取最后一条消息，如果内容为空则返回False
    last_message = self.memory.messages[-1]
    if not last_message.content:
        return False

    # Count identical content occurrences
    # 统计相同内容的出现次数，具体做法：
    # 从后往前遍历历史消息，统计与最后一条消息内容相同且角色为"assistant"的消息数量
    # 如果相同内容出现次数超过预设阈值，则判定为陷入循环
    duplicate_count = sum(
        1
        for msg in reversed(self.memory.messages[:-1]) # reversed用于创建反向迭代器，即从最后一个元素开始遍历
        if msg.role == "assistant" and msg.content == last_message.content
    )

    return duplicate_count >= self.duplicate_threshold

判断逻辑为：

如果目前的消息总数少于2条，不认为陷入循环。
获取最后一条消息，如果内容为空，也不认为陷入循环。
统计相同内容的出现次数，具体做法：
1. 从后往前遍历历史消息，统计与最后一条消息内容相同且角色为"assistant"的消息数量。
2. 如果相同内容出现次数超过预设阈值（duplicate_threshold），则判定为陷入循环。

handle_stuck_state

处理智能体的卡顿状态，当检测到重复响应时（is_stuck 方法返回结果为 True），会添加一个提示信息到下一步的提示中，提醒 Agent 考虑新策略并避免重复无效路径：

python 复制代码

def handle_stuck_state(self):
    """Handle stuck state by adding a prompt to change strategy"""
    # 当检测到重复响应时，会添加一个提示信息到下一步的提示中，提醒 Agent 考虑新策略并避免重复无效路径
    stuck_prompt = "\
    Observed duplicate responses. Consider new strategies and avoid repeating ineffective paths already attempted."
    self.next_step_prompt = f"{stuck_prompt}\n{self.next_step_prompt}"
    logger.warning(f"Agent detected stuck state. Added prompt: {stuck_prompt}")

step

由子类实现的抽象方法，每一次 TAO 循环其实就是调用该方法实现：

python 复制代码

 # abstractmethod 用于装饰抽象方法，step 由子类实现
@abstractmethod
async def step(self) -> str:
    """Execute a single step in the agent's workflow.

    Must be implemented by subclasses to define specific behavior.
    """

该方法要求返回字符串。

messages

分别通过 Getter 和 Setter 定义了一个名为 messages 的属性(property)，它提供了对 self.memory.messages 的便捷访问方式：

python 复制代码

# property装饰器用于将类的方法转换为只读属性
@property
def messages(self) -> List[Message]:
    """Retrieve a list of messages from the agent's memory."""
    return self.memory.messages

@messages.setter
def messages(self, value: List[Message]):
    """Set the list of messages in the agent's memory."""
    self.memory.messages = value

ReActAgent

ReActAgent 是 BaseAgent 的子类，是实现推理-行动模式 Agent 基类。

ReActAgent 实现了step 方法，在 step 方法中先后调用 think/act 方法。step 方法会在 BaseAgent 中通过循环不断重复，直到满足退出条件。

think/act 方法是 ReActAgent 新增的两个抽象方法，需要具体的子类来具体实现，分别实现推理和行动两个功能。

ReActAgent 代码很少：

python 复制代码

class ReActAgent(BaseAgent, ABC):
    name: str
    description: Optional[str] = None

    system_prompt: Optional[str] = None
    next_step_prompt: Optional[str] = None

    llm: Optional[LLM] = Field(default_factory=LLM)
    memory: Memory = Field(default_factory=Memory)
    state: AgentState = AgentState.IDLE

    max_steps: int = 10
    current_step: int = 0

    @abstractmethod
    async def think(self) -> bool:
        """Process current state and decide next action"""

    @abstractmethod
    async def act(self) -> str:
        """Execute decided actions"""

    async def step(self) -> str:
        """Execute a single step: think and act."""
        should_act = await self.think()
        if not should_act:
            return "Thinking complete - no action needed"
        return await self.act()

ToolCallAgent

基本功能

ToolCallAgent 是 ReActAgent 的子类。ToolCallAgent 是处理工具调用的 Agent，能够解析 LLM 返回的工具调用并执行。

ToolCallAgent 实现了ReActAgent 中定义的两个抽象方法 think 和 act，从而完成了 ReAct 框架中的推理-行动循环。

think 方法负责分析当前状态并决定下一步行动：

使用 LLM 分析当前状态，这一步会把工具传递给LLM（Function Call）。
由 LLM 选择合适的工具。
根据不同的工具选择模式（none/auto/required）决定是否需要执行 act。

act 方法负责执行工具调用并处理结果：

遍历执行每个工具调用。
收集和处理工具执行结果。
更新内存中的执行历史。

在 act 方法中，还会判断 Agent 是否达到了完成状态。

关键属性

system_prompt：系统层级的提示词，在 BaseAgent 中定义，在 ToolCallAgent 中覆盖为 You are an agent that can execute tool calls。
next_step_prompt：下一步行动的提示，在 BaseAgent 中定义，在 ToolCallAgent 中覆盖为 If you want to stop interaction, use `terminate` tool/function call.。大意为如果你想停止交互，请使用"终止"工具/函数调用。
available_tools：
1. 可使用的工具，ToolCollection 类型。ToolCollection 类中维护了一组工具，并提供 execute 方法调用指定的工具。
2. 工具应该是 BaseTool 的子类，详见下一篇文章。
3. 默认包含的工具有：
  1. CreateChatCompletion：使用指定的格式创建结构化输出。
  2. Terminate：当请求得到满足或助手无法继续执行任务时，终止交互。
tool_choices：工具选择类型，只能是 none、auto、required 之一，默认为 ```auto``。
tool_calls：
1. 工具调用列表，List[ToolCall] 类型。
2. ToolCall 类中只有 id（str）、type（str，默认 function）、function （Function）三个属性。
3. 如果之前有接触过 Function Call 的话，可以看出 ToolCall 其实就是 ChatCompletionMessageToolCall 类型。
special_tool_names：特殊工具名称，默认只有 Terminate。当 Terminate 工具调用后，会认为 Agent 达到了完成状态。
max_steps：最大步骤，在 BaseAgent 中定义，默认为10。在 ToolCallAgent 中修改为 30。

关键方法

think

think 方法向 LLM 询问下一步应该调用的工具，返回是否需要调用工具。

简化后代码如下：

python 复制代码

async def think(self) -> bool:
    """Process current state and decide next actions using tools"""
    
    if self.next_step_prompt:
        # 将 next_step_prompt 转换成 Message 对象，并添加到 messages 中
        user_msg = Message.user_message(self.next_step_prompt)
        self.messages += [user_msg]

    try:
        # 向 LLM 询问是否调用工具
         # ask_tool 返回 result.choices[0].message，所以 response 是 ChatCompletionMessage 类型
        response = await self.llm.ask_tool(
            messages=self.messages,
            system_msgs=(
                [Message.system_message(self.system_prompt)]
                if self.system_prompt
                else None
            ),
            # to_params 函数将 available_tools 转换成 openAI Function call 要求的格式，形如：
            # [
            #   {
            #     "type": "function",
            #     "function": {
            #         "name": self.name,
            #         "description": self.description,
            #         "parameters": self.parameters,
            #     },
            #   },
            # ]
            tools=self.available_tools.to_params(),
            tool_choice=self.tool_choices,
        )
    except ValueError:
        raise
    except Exception as e:
        # 如果是因为超出了 token 限制，返回 False
        if hasattr(e, "__cause__") and isinstance(e.__cause__, TokenLimitExceeded):
            token_limit_error = e.__cause__
            logger.error(
                f"🚨 Token limit error (from RetryError): {token_limit_error}"
            )
            self.memory.add_message(
                Message.assistant_message(
                    f"Maximum token limit reached, cannot continue execution: {str(token_limit_error)}"
                )
            )
            self.state = AgentState.FINISHED
            return False
        raise

    # 提取 tool_calls 结果
    # 正常情况下，tool_calls 形如：
    # [ChatCompletionMessageToolCall(id='call_5UWZsEERn1Iew2QI3OjRPYnG', function=Function(arguments='{"x":1024,"y":10086}', name='add'), type='function')]
    self.tool_calls = tool_calls = (
        response.tool_calls if response and response.tool_calls else []
    )
    content = response.content if response and response.content else ""

    # Log response info
    # ...

    try:
        if response is None:
            raise RuntimeError("No response received from the LLM")

        # 如果是不进行工具选择，但 LLM 又选择了工具，返回 False
        if self.tool_choices == ToolChoice.NONE:
            if tool_calls:
                logger.warning(
                    f"🤔 Hmm, {self.name} tried to use tools when they weren't available!"
                )
            if content:
                self.memory.add_message(Message.assistant_message(content))
                return True
            return False

        # 将 LLM 的返回添加到 memory 中
        assistant_msg = (
            Message.from_tool_calls(content=content, tool_calls=self.tool_calls)
            if self.tool_calls
            else Message.assistant_message(content)
        )
        self.memory.add_message(assistant_msg)

        if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls:
            return True  # Will be handled in act()

        # For 'auto' mode, continue with content if no commands but content exists
        if self.tool_choices == ToolChoice.AUTO and not self.tool_calls:
            return bool(content)

        return bool(self.tool_calls)
    except Exception as e:
        logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}")
        self.memory.add_message(
            Message.assistant_message(
                f"Error encountered while processing: {str(e)}"
            )
        )
        return False

其实 think 方法就是完成 Function Call，并将 LLM 返回的 tool_calls 保存到 self.tool_calls 中，供 act 方法使用，返回是否需要使用工具。

self.tool_calls 形如：

python 复制代码

[ChatCompletionMessageToolCall(id='call_5UWZsEERn1Iew2QI3OjRPYnG', function=Function(arguments='{"x":1024,"y":10086}', name='add'), type='function')]

act

act 方法遍历 self.tool_calls，并通过 self.execute_tool 方法执行，最终将执行结果保存到记忆中：

python 复制代码

async def act(self) -> str:
    """Execute tool calls and handle their results"""
    if not self.tool_calls:
        if self.tool_choices == ToolChoice.REQUIRED:
            raise ValueError(TOOL_CALL_REQUIRED)

        # Return last message content if no tool calls
        return self.messages[-1].content or "No content or commands to execute"

    results = []
    for command in self.tool_calls:
        # Reset base64_image for each tool call
        self._current_base64_image = None

        # command 其实是 ChatCompletionMessageToolCall 类型
        result = await self.execute_tool(command)

        if self.max_observe:
            result = result[: self.max_observe]

        logger.info(
            f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}"
        )

        # Add tool response to memory
        tool_msg = Message.tool_message(
            content=result,
            tool_call_id=command.id,
            name=command.function.name,
            base64_image=self._current_base64_image,
        )
        self.memory.add_message(tool_msg)
        results.append(result)

    return "\n\n".join(results)

think/act 的交互

ToolCallAgent 中并没有直接调用 think/act 方法的地方，而是在 ReActAgent 的 steps 方法中调用：

python 复制代码

async def step(self) -> str:
    """Execute a single step: think and act."""
    should_act = await self.think()
    if not should_act:
        return "Thinking complete - no action needed"
    return await self.act()

execute_tool

执行工具的具体方法，会捕获执行过程中抛出的异常，简化后代码如下：

python 复制代码

async def execute_tool(self, command: ToolCall) -> str:
    """Execute a single tool call with robust error handling"""
    if not command or not command.function or not command.function.name:
        return "Error: Invalid command format"

    name = command.function.name
    # 如果工具不存在，返回错误
    if name not in self.available_tools.tool_map:
        return f"Error: Unknown tool '{name}'"

    try:
        # 解析参数
        args = json.loads(command.function.arguments or "{}")

        # 执行工具
        logger.info(f"🔧 Activating tool: '{name}'...")
        result = await self.available_tools.execute(name=name, tool_input=args)

        # Handle special tools
        await self._handle_special_tool(name=name, result=result)

        # Check if result is a ToolResult with base64_image
        # ...

        # 格式化显示结果
        observation = (
            f"Observed output of cmd `{name}` executed:\n{str(result)}"
            if result
            else f"Cmd `{name}` completed with no output"
        )

        return observation
    except json.JSONDecodeError:
        error_msg = f"Error parsing arguments for {name}: Invalid JSON format"
        logger.error(
            f"📝 Oops! The arguments for '{name}' don't make sense - invalid JSON, arguments:{command.function.arguments}"
        )
        return f"Error: {error_msg}"
    except Exception as e:
        error_msg = f"⚠️ Tool '{name}' encountered a problem: {str(e)}"
        logger.exception(error_msg)
        return f"Error: {error_msg}"

_handle_special_tool

在执行工具后，会调用 _handle_special_tool 方法判断所调用的工具是否是特殊工具。实际就是判断是否调用了 Terminate 工具，该工具用于终止 Agent 执行。

python 复制代码

async def _handle_special_tool(self, name: str, result: Any, **kwargs):
    """Handle special tool execution and state changes"""
    if not self._is_special_tool(name):
        return

    if self._should_finish_execution(name=name, result=result, **kwargs):
        # Set agent state to finished
        logger.info(f"🏁 Special tool '{name}' has completed the task!")
        self.state = AgentState.FINISHED

@staticmethod
def _should_finish_execution(**kwargs) -> bool:
    """Determine if tool execution should finish the agent"""
    return True

def _is_special_tool(self, name: str) -> bool:
    """Check if tool name is in special tools list"""
    return name.lower() in [n.lower() for n in self.special_tool_names]

Manus

Manus 是最终用户使用的 Agent 类，继承自 ToolCallAgent。在 Manus 中集成了多种工具，包括Python执行、Google搜索、浏览器操作和文件保存。

在 main.py 中，其实就是先调用 Manus.create() 创建并初始化 Agent 实例，然后通过 agent.run(prompt) 方法运行 Agent。

使用

在 main.py 中：

python 复制代码

async def main():
    # 解析命令行参数
    parser = argparse.ArgumentParser(description="Run Manus agent with a prompt")
    parser.add_argument(
        "--prompt", type=str, required=False, help="Input prompt for the agent"
    )
    args = parser.parse_args()

    # 创建并初始化 Manus agent
    agent = await Manus.create()
    try:
        prompt = args.prompt if args.prompt else input("Enter your prompt: ")
        if not prompt.strip():
            logger.warning("Empty prompt provided.")
            return

        logger.warning("Processing your request...")
        # 执行 agent
        await agent.run(prompt)
        logger.info("Request processing completed.")
    except KeyboardInterrupt:
        logger.warning("Operation interrupted.")
    finally:
        # Ensure agent resources are cleaned up before exiting
        await agent.cleanup()

关键属性

system_prompt：
1. 系统层级的提示词，在 BaseAgent 中定义。
2. 在 Manus 中覆盖为：
css 复制代码
```
"You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, web browsing, or human interaction (only for extreme cases), you can handle it all."
"The initial directory is: {directory}"
```
大意为：你是OpenManus，一个全能的人工智能助手，旨在解决用户提出的任何任务。你可以使用各种工具来高效地完成复杂的请求。无论是编程、信息检索、文件处理、网页浏览还是人机交互（仅限极端情况），你都可以处理。初始目录是：｛directory｝

next_step_prompt：

下一步行动的提示，在 BaseAgent 中定义。
在 Manus 中覆盖为：

vbnet 复制代码

Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.

If you want to stop the interaction at any point, use the `terminate` tool/function call.

大意为：

c 复制代码

根据用户需求，主动选择最合适的工具或工具组合。对于复杂的任务，您可以分解问题并使用不同的工具逐步解决。使用每个工具后，清楚地解释执行结果并建议下一步。
如果你想在任何时候停止交互，请使用"terminate"工具/函数调用。

max_steps：最大步骤，在 BaseAgent 中定义，默认为10。在 ToolCallAgent 中修改为 30。但在 Manus 中设置为 20。
mcp_clients：
1. MCP 客户端，MCPClients 类型。
2. MCPClients 支持通过 SSE、stdio 链接服务器，使用的是 MCP Python SDK。
available_tools：
1. 可使用的工具，ToolCollection 类型。ToolCollection 类中维护了一组工具，并提供 execute 方法调用指定的工具。
2. 工具应该是 BaseTool 的子类，详见下一篇文章。
3. 该属性在 ToolCallAgent 中定义，在 Manus 中覆盖为包含如下工具的 ToolCollection：
  1. PythonExecute：一个用于执行具有超时和安全限制的Python代码的工具。
  2. BrowserUseTool：一个强大的浏览器自动化工具，允许通过各种操作与网页进行交互。
  3. StrReplaceEditor：一种用于查看、创建和编辑具有沙盒支持的文件的工具。
  4. AskHuman：一个向人类寻求帮助的工具。
  5. Terminate：当请求得到满足或助手无法继续执行任务时，终止交互。
connected_servers：字典，维护连接MCP服务器的关系。

关键方法

create

create 方法就是初始化 MCP 服务，并返回 Manus 实例：

python 复制代码

@classmethod
async def create(cls, **kwargs) -> "Manus":
    """Factory method to create and properly initialize a Manus instance."""
    instance = cls(**kwargs)
    await instance.initialize_mcp_servers()
    instance._initialized = True
    return instance

初始化 MCP 服务

初始化 MCP 服务涉及 initialize_mcp_servers 和 connect_mcp_server 两个方法，initialize_mcp_servers 中调用 connect_mcp_server，核心用法就是用不同的方式链接 MCP 服务（SSE 或 stdio）：

python 复制代码

async def initialize_mcp_servers(self) -> None:
    """Initialize connections to configured MCP servers."""
    for server_id, server_config in config.mcp_config.servers.items():
        try:
            if server_config.type == "sse":
                if server_config.url:
                    await self.connect_mcp_server(server_config.url, server_id)
                    logger.info(
                        f"Connected to MCP server {server_id} at {server_config.url}"
                    )
            elif server_config.type == "stdio":
                if server_config.command:
                    await self.connect_mcp_server(
                        server_config.command,
                        server_id,
                        use_stdio=True,
                        stdio_args=server_config.args,
                    )
                    logger.info(
                        f"Connected to MCP server {server_id} using command {server_config.command}"
                    )
        except Exception as e:
            logger.error(f"Failed to connect to MCP server {server_id}: {e}")

async def connect_mcp_server(
    self,
    server_url: str,
    server_id: str = "",
    use_stdio: bool = False,
    stdio_args: List[str] = None,
) -> None:
    """Connect to an MCP server and add its tools."""
    # 不同的链接方式
    if use_stdio:
        await self.mcp_clients.connect_stdio(
            server_url, stdio_args or [], server_id
        )
        self.connected_servers[server_id or server_url] = server_url
    else:
        await self.mcp_clients.connect_sse(server_url, server_id)
        self.connected_servers[server_id or server_url] = server_url

    # 当成功连接到一个新的MCP服务器后，从所有 MCP 工具中找出等于 server_id 的
    new_tools = [
        tool for tool in self.mcp_clients.tools if tool.server_id == server_id
    ]
    # 将这些新工具添加到可用工具集合中
    self.available_tools.add_tools(*new_tools)

清理资源

当工作结束时，会释放掉相关资源，主要是断开与 MCP 服务器的链接：

python 复制代码

async def disconnect_mcp_server(self, server_id: str = "") -> None:
    """Disconnect from an MCP server and remove its tools."""
    # mcp_clients.disconnect 方法如果没有传递 server_id，则会断开与所有 MCP 服务器的链接
    await self.mcp_clients.disconnect(server_id)
    if server_id:
        self.connected_servers.pop(server_id, None)
    else:
        self.connected_servers.clear()

    # Rebuild available tools without the disconnected server's tools
    # 重新构建可用工具
    # 先筛选出所有非 MCP 工具
    base_tools = [
        tool
        for tool in self.available_tools.tools
        if not isinstance(tool, MCPClientTool)
    ]
    self.available_tools = ToolCollection(*base_tools)
    # 添加所有没有断开链接的 MCP 服务的工具
    self.available_tools.add_tools(*self.mcp_clients.tools)

async def cleanup(self):
    """Clean up Manus agent resources."""
    if self.browser_context_helper:
        await self.browser_context_helper.cleanup_browser()
    # Disconnect from all MCP servers only if we were initialized
    if self._initialized:
        await self.disconnect_mcp_server()
        self._initialized = False

think

Manus 重写了ToolCallAgent 中的 think 方法，主要改动点是检查最近 3 条消息中是否使用了浏览器工具，如果正在使用浏览器，则更新下一步提示词为浏览器上下文相关的提示。最终还是调用父类的思考方法进行决策：

python 复制代码

async def think(self) -> bool:
    """Process current state and decide next actions with appropriate context."""
    if not self._initialized:
        await self.initialize_mcp_servers()
        self._initialized = True

    original_prompt = self.next_step_prompt
    # 检查最近3条消息中是否使用了浏览器工具
    recent_messages = self.memory.messages[-3:] if self.memory.messages else []
    # any 函数检查是否至少有一条消息满足条件
    browser_in_use = any(
        # 这段代码由于被()包裹，是生成器表达式
        tc.function.name == BrowserUseTool().name
            for msg in recent_messages
                if msg.tool_calls
                    for tc in msg.tool_calls
    )

    if browser_in_use:
        self.next_step_prompt = (
            await self.browser_context_helper.format_next_step_prompt()
        )

    result = await super().think()

    # 恢复原始的下一步提示词
    self.next_step_prompt = original_prompt

    return result