Trae-Agent中的sandbox逻辑分析

Trae Agent Sandbox 沙箱核心逻辑

概述

Sandbox（沙箱）是 Trae Agent 评估模块中的 Docker 容器封装类，用于为 Patch 选择（Selector Agent）提供隔离的执行环境。本文档详细分析 Sandbox 的核心逻辑和实现。

一、整体架构

scss 复制代码

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Selector Agent (Patch 选择)                          │
│                              需要验证 Patch                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              Sandbox (沙箱)                                  │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  1. 启动 Docker 容器 (SWE-bench 镜像)                                │   │
│  │  2. 复制工具到容器                                                   │   │
│  │  3. 检出代码到指定 commit                                            │   │
│  │  4. 提供交互式 Shell 会话                                            │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         Session (交互式会话)                                 │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  - 在容器中执行命令                                                  │   │
│  │  - 应用 Patch 进行验证                                               │   │
│  │  - 运行测试检查正确性                                                │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

二、核心组件

2.1 Sandbox 类

文件 : evaluation/patch_selection/trae_selector/sandbox.py

python 复制代码

class Sandbox:
    """
    Docker 沙箱环境
    为 Selector Agent 提供隔离的代码执行环境
    """
    
    def __init__(
        self,
        namespace: str,      # 镜像命名空间 (如 "swebench")
        name: str,           # 镜像名称
        tag: str,            # 镜像标签 (如 "latest")
        instance: dict,      # 实例信息 (包含 instance_id, base_commit)
        tools_path: str,     # 宿主机工具目录路径
    ):
        self.namespace = namespace
        self.name = name
        self.tag = tag
        self.client = docker.from_env()  # Docker 客户端
        self.commit_id = instance["base_commit"]  # 代码基线 commit
        self.instance_id = instance["instance_id"]
        self.container = None
        self.shell = None    # pexpect Shell 会话
        self.tools_path = tools_path

2.2 容器生命周期管理

启动容器

python 复制代码

def start_container(self):
    """启动 Docker 容器并初始化环境"""
    # 1. 构建完整镜像名
    image = f"{self.namespace}/{self.name}:{self.tag}"
    # 例如: "swebench/sweb.eval.x86_64.astropy__astropy-14369:latest"
    
    # 2. 启动容器
    self.container = self.client.containers.run(
        image,
        detach=True,           # 后台运行
        tty=True,              # 分配伪终端
        stdin_open=True,       # 保持 stdin 打开
        privileged=True,       # 特权模式
        volumes={
            "/tmp": {"bind": "/tmp", "mode": "rw"}  # 挂载 /tmp
        },
    )
    print(f"Container {self.container.short_id} started")
    
    # 3. 复制工具到容器
    cmd = (
        f"chmod -R 777 {self.tools_path} && "
        f"docker cp {self.tools_path} {self.container.name}:/home/swe-bench/"
    )
    subprocess.run(cmd, check=True, shell=True)
    
    # 4. 检出代码到基线 commit
    checkout_res = self.container.exec_run(f"git checkout {self.commit_id}")
    print("checkout: ", checkout_res)

停止容器

python 复制代码

def stop_container(self):
    """停止并清理容器"""
    if self.container:
        # 关闭 Shell 会话
        if self.shell and self.shell.isalive():
            self.shell.close(force=True)
            self.shell = None
        
        # 停止并删除容器
        self.container.stop()
        self.container.remove()
        print(f"Container {self.container.short_id} stopped and removed")
        self.container = None

2.3 交互式 Shell 会话

启动 Shell

python 复制代码

def start_shell(self):
    """启动交互式 Bash Shell"""
    if not self.container:
        raise Exception("Container not started. Call start_container() first.")
    
    # 关闭已存在的 Shell
    if self.shell and self.shell.isalive():
        self.shell.close(force=True)
    
    # 使用 pexpect 启动交互式 Shell
    command = f"docker exec -it {self.container.id} /bin/bash"
    self.shell = pexpect.spawn(command, maxread=200000)
    
    # 等待 Shell 提示符
    self.shell.expect([r"\$ ", r"# "], timeout=10)

Session 类 - 命令执行接口

python 复制代码

def get_session(self):
    """获取可执行会话对象"""
    self.start_shell()
    
    class Session:
        def __init__(self, sandbox):
            self.sandbox = sandbox
        
        def execute(self, command, timeout=60):
            """在沙箱中执行命令"""
            try:
                # 1. 发送命令
                if command[-1] != "&":
                    # 非后台命令，添加 sleep 确保完成
                    self.sandbox.shell.sendline(command + " && sleep 0.5")
                else:
                    self.sandbox.shell.sendline(command)
                
                # 2. 清空缓冲区
                self.sandbox.shell.before = b""
                self.sandbox.shell.after = b""
                self.sandbox.shell.buffer = b""
                
                time.sleep(2)  # 等待命令执行
                
                # 3. 等待提示符
                self.sandbox.shell.expect(
                    [r"swe-bench@.*:.*\$ ", r"root@.*:.*# "],
                    timeout=timeout
                )
                
                # 4. 收集输出
                output = (
                    self.sandbox.shell.before.decode("utf-8") +
                    self.sandbox.shell.after.decode("utf-8") +
                    self.sandbox.shell.buffer.decode("utf-8")
                )
                
                # 5. 清理输出（移除命令回显和空行）
                output_lines = output.split("\r\n")
                if len(output_lines) > 1:
                    output_lines = output_lines[1:-1]
                result_message = "\n".join(output_lines)
                result_message = result_message.replace("\x1b[?2004l\r", "")
                
                return result_message
                
            except pexpect.TIMEOUT:
                # 超时处理
                partial_output = "..."
                return (
                    "### Observation: "
                    f"Error: Command '{command}' timed out after {timeout} seconds. "
                    f"Partial output:\n + {partial_output}"
                )
        
        def close(self):
            """关闭会话"""
            if self.sandbox.shell:
                self.sandbox.shell.sendline("exit")
                self.sandbox.shell.expect(pexpect.EOF)
                self.sandbox.shell.close(force=True)
                self.sandbox.shell = None
    
    return Session(self)

三、使用流程

3.1 完整使用示例

python 复制代码

from evaluation.patch_selection.trae_selector.sandbox import Sandbox

# 1. 创建沙箱
sandbox = Sandbox(
    namespace="swebench",
    name="sweb.eval.x86_64.astropy__astropy-14369",
    tag="latest",
    instance={
        "instance_id": "astropy__astropy-14369",
        "base_commit": "abc123def456"
    },
    tools_path="/path/to/tools"
)

# 2. 启动容器
try:
    sandbox.start_container()
    
    # 3. 获取项目路径
    project_path = sandbox.get_project_path()
    print(f"Project path: {project_path}")
    
    # 4. 获取会话并执行命令
    session = sandbox.get_session()
    
    # 应用 Patch
    output = session.execute("git apply /home/swe-bench/patch.diff")
    print(output)
    
    # 运行测试
    output = session.execute("pytest test_file.py -v")
    print(output)
    
    # 查看文件
    output = session.execute("cat src/main.py")
    print(output)
    
    # 5. 关闭会话
    session.close()
    
finally:
    # 6. 停止容器
    sandbox.stop_container()

四、与 Selector Agent 的集成

4.1 在 Selector Agent 中使用

文件 : evaluation/patch_selection/trae_selector/selector_agent.py

python 复制代码

class SelectorAgent:
    def __init__(self, sandbox: Sandbox, candidate_list: list[CandidatePatch], ...):
        self.sandbox = sandbox
        self.sandbox_session = self.sandbox.get_session()
        
        # 重置代码到基线
        self.sandbox_session.execute("git reset --hard HEAD")
    
    def run(self):
        """执行 Patch 选择"""
        # Agent 可以通过工具调用在沙箱中执行命令
        # 例如：查看代码、应用 Patch、运行测试
        
        # 工具调用会被转换为 session.execute()
        # 如：bash 工具 → session.execute("ls -la")
        # 如：str_replace_based_edit_tool → session.execute("python edit_tool.py ...")
        pass

4.2 工具调用执行流程

python 复制代码

# selector_agent.py 中的工具调用解析
def parse_tool_response(answer: LLMResponse, sandbox_session):
    for tool_call in answer.tool_calls:
        if tool_call.name == "bash":
            # 构建命令
            cmd = (
                "cd /home/swe-bench/tools/ && "
                "/home/swe-bench/py312/bin/python3 execute_bash.py "
                f"--command {shlex.quote(tool_call.arguments['command'])}"
            )
            
            # 在沙箱中执行
            output = sandbox_session.execute(cmd)
            
        elif tool_call.name == "str_replace_based_edit_tool":
            cmd = (
                "cd /home/swe-bench/tools/ && "
                "/home/swe-bench/py312/bin/python3 execute_str_replace_editor.py "
                f"--command {tool_call.arguments['command']} "
                f"--path {tool_call.arguments['path']} "
                ...
            )
            output = sandbox_session.execute(cmd)

五、设计特点

特点	说明
隔离性	每个实例运行在独立容器中，互不影响
可复现	固定 commit，确保代码基线一致
交互式	使用 pexpect 维护持久化 Shell 会话
工具隔离	工具复制到容器中执行，不依赖宿主机
自动清理	提供 stop_container 方法清理资源
超时处理	命令执行支持超时，防止无限等待

六、与 DockerManager 的区别

特性	Sandbox	DockerManager
用途	Patch 选择验证	Agent 工具执行
位置	evaluation 模块	trae_agent 模块
镜像	SWE-bench 专用镜像	用户指定镜像
持久化	临时容器，用完即删	可保持容器运行
交互方式	Session.execute()	docker_manager.execute()
路径映射	仅 /tmp	可配置工作目录
工具执行	通过 Python 代理脚本	直接调用工具

七、关键代码位置

文件	功能
`evaluation/patch_selection/trae_selector/sandbox.py`	Sandbox 类实现
`evaluation/patch_selection/trae_selector/selector_agent.py`	Selector Agent 使用
`trae_agent/agent/docker_manager.py`	类似的 Docker 管理器

八、注意事项

容器权限 ：使用 privileged=True 运行容器，确保有足够权限
工具路径：确保工具路径正确，且已设置可执行权限
超时设置：长时间运行的命令需要调整 timeout 参数
资源清理 ：务必调用 stop_container() 清理资源
编码问题 ：输出解码使用 errors="replace" 处理特殊字符

最后更新: 2026-03-16