PentestGPT V2源码研究之记忆子系统

文章目录

[记忆子系统详解 - 上下文管理机制](#记忆子系统详解 - 上下文管理机制)
- 一、记忆子系统概述
- - [1.1 系统架构图](#1.1 系统架构图)
  - [1.2 核心组件关系](#1.2 核心组件关系)
- [二、StateStore - 状态存储核心](#二、StateStore - 状态存储核心)
- - [2.1 数据库模式设计](#2.1 数据库模式设计)
  - [2.2 实体模型定义](#2.2 实体模型定义)
  - [2.3 StateStore 核心方法](#2.3 StateStore 核心方法)
- [三、ContextAssembler - 上下文组装器](#三、ContextAssembler - 上下文组装器)
- - [3.1 核心职责](#3.1 核心职责)
  - [3.2 assemble() 方法详解](#3.2 assemble() 方法详解)
  - [3.3 四个上下文部分详解](#3.3 四个上下文部分详解)
  - - [3.3.1 Attack Path（攻击路径）](#3.3.1 Attack Path（攻击路径）)
    - [3.3.2 State Facts（状态事实）](#3.3.2 State Facts（状态事实）)
    - [3.3.3 Mode Guidance（模式指导）](#3.3.3 Mode Guidance（模式指导）)
    - [3.3.4 Sibling Summaries（兄弟摘要）](#3.3.4 Sibling Summaries（兄弟摘要）)
- [四、ContextCompressor - 上下文压缩器](#四、ContextCompressor - 上下文压缩器)
- - [4.1 压缩策略](#4.1 压缩策略)
  - [4.2 压缩流程](#4.2 压缩流程)
- 五、完整上下文管理流程
- - [5.1 数据流图](#5.1 数据流图)
  - [5.2 实际使用示例](#5.2 实际使用示例)
- 六、上下文管理最佳实践
- - [6.1 理解上下文结构](#6.1 理解上下文结构)
  - [6.2 调试技巧](#6.2 调试技巧)
  - [6.3 性能优化建议](#6.3 性能优化建议)
- 七、相关文件索引
- 附录：快速参考
- - [StateStore 常用查询](#StateStore 常用查询)
  - [ContextAssembler 输出结构](#ContextAssembler 输出结构)

记忆子系统详解 - 上下文管理机制

PentestGPT V2 gitee镜像仓库地址

一、记忆子系统概述

1.1 系统架构图

记忆子系统 Memory Subsystem
ContextAssembler
StateStore
ContextCompressor
BranchSummary
Tree Nodes
SQLite Database

1.2 核心组件关系

组件	文件路径	职责
ContextAssembler	`context_assembler.py`	组装 LLM 提示词的上下文
StateStore	`state_store.py`	SQLite 持久化存储实体数据
ContextCompressor	`context_compressor.py`	压缩上下文长度
BranchSummary	`branch_summary.py`	生成分支摘要

二、StateStore - 状态存储核心

2.1 数据库模式设计

位于 state_store.py:56-145

python 复制代码

# 五张核心表：

┌─────────────────┐    ┌─────────────────┐
│     hosts       │    │   services      │
├─────────────────┤    ├─────────────────┤
│ id (PK)         │◀───│ host_id (FK)    │
│ ip_address      │    │ port            │
│ hostname        │    │ protocol        │
│ os_fingerprint  │    │ service_name    │
│ discovered_at   │    │ version         │
└─────────────────┘    └─────────────────┘

┌─────────────────┐    ┌─────────────────┐
│  credentials    │    │   sessions      │
├─────────────────┤    ├─────────────────┤
│ id (PK)         │    │ id (PK)         │
│ username        │    │ host_id (FK)    │
│ credential_type │    │ session_type    │
│ domain          │    │ privilege_level │
│ valid_for       │────│ credential_id   │
└─────────────────┘    └─────────────────┘

┌───────────────────────────┐
│   vulnerabilities         │
├───────────────────────────┤
│ id (PK)                   │
│ host_id (FK)              │
│ service_id (FK)           │
│ cve_id                    │
│ description               │
│ exploitation_status       │
└───────────────────────────┘

2.2 实体模型定义

位于 models.py

python 复制代码

# HostEntity - 主机信息
class HostEntity(BaseModel):
    id: str
    ip_address: str
    hostname: str | None
    os_fingerprint: str | None
    discovered_at: datetime
    discovery_node_id: str | None

# ServiceEntity - 服务信息
class ServiceEntity(BaseModel):
    id: str
    host_id: str
    port: int
    protocol: str = 'tcp'
    service_name: str | None
    version: str | None
    discovered_at: datetime
    discovery_node_id: str | None

# CredentialEntity - 凭据信息
class CredentialEntity(BaseModel):
    id: str
    username: str
    credential_type: str = 'password'
    credential_value: str = ''
    domain: str | None
    valid_for: list[str]  # 有效的 host_id 列表
    discovered_at: datetime
    discovery_node_id: str | None

2.3 StateStore 核心方法

python 复制代码

# excalibur/memory/state_store.py:41-50
class StateStore:
    def __init__(self, db_path: str = ":memory:") -> None:
        """
        初始化存储。
        - 默认使用内存数据库 (:memory:)
        - 可传入文件路径实现持久化
        """

主要 CRUD 方法：

类别	方法	说明
Hosts	`add_host()`, `get_host()`, `get_hosts()`	主机增删查
Services	`add_service()`, `get_services_for_host()`	服务管理
Credentials	`add_credential()`, `get_credentials_for_host()`	凭据管理
Sessions	`add_session()`, `get_active_sessions()`	会话管理
Vulnerabilities	`add_vulnerability()`, `get_vulnerabilities_for_host()`	漏洞管理

三、ContextAssembler - 上下文组装器

3.1 核心职责

位于 context_assembler.py:22-88

python 复制代码

class ContextAssembler:
    """
    基于当前状态为 LLM 提示词组装上下文。
    
    从状态存储中提取事实信息，结合攻击树的结构，
    生成包含多种信息的上下文字符串供 LLM 使用。
    """

3.2 assemble() 方法详解

python 复制代码

# excalibur/memory/context_assembler.py:48-88
def assemble(
    self,
    node: AttackNode,      # 当前活动节点
    tree: AttackTree,      # 完整攻击树
    mode: str,             # 探索模式 (bfs/dfs/hybrid)
    tdi_value: float = 0.5,# 当前 TDI 值
) -> str:
    """
    从当前状态构建上下文提示词。
    
    返回包含四个部分的上下文字符串：
    1. Attack Path - 攻击路径
    2. State Facts - 状态事实
    3. Mode Guidance - 模式指导
    4. Sibling Summaries - 兄弟摘要
    """

3.3 四个上下文部分详解

3.3.1 Attack Path（攻击路径）

python 复制代码

# excalibur/memory/context_assembler.py:108-151
def _build_path_context(self, node, tree) -> str:
    """
    从根节点遍历到当前节点，显示每条路径上的状态。
    
    输出示例：
    ## Current Attack Path
      [ACTIVE] Initial reconnaissance of 10.10.10.50 (id=abc123)
        [COMPLETED] Nmap scan completed (id=def456)
          [ACTIVE] Port 80/tcp HTTP discovered (id=ghi789)>>
    """

3.3.2 State Facts（状态事实）

python 复制代码

# excalibur/memory/context_assembler.py:153-239
def _build_state_context(self, node) -> str:
    """
    从实体存储构建状态事实部分。
    
    输出示例：
    ## Known State Facts
    ### Hosts
      - 10.10.10.50 (target.local) [OS: Linux]
      - 10.10.10.51 [OS: Windows]
    
    ### Services
      - 80/tcp http 1.3.3 (host=abc123)
      - 445/tcp smb 2.0.0 (host=abc123)
    
    ### Credentials
      - admin\administrator [password] valid_for=['abc123']
    
    ### Active Sessions
      - shell on host=abc123 priv=user (id=sess001)
    
    ### Vulnerabilities
      - [exploited] CVE-2021-44228 Log4j RCE
    """

3.3.3 Mode Guidance（模式指导）

python 复制代码

# excalibur/memory/context_assembler.py:242-278
def _build_mode_context(mode, tdi_value) -> str:
    """
    构建特定于模式的指导文本。
    
    BFS (广度优先):
        Breadth-first: enumerate all attack surfaces before going deep...
    
    DFS (深度优先):
        Depth-first: focus on the most promising vector and exploit it fully...
    
    Hybrid (混合模式):
        Hybrid mode (balanced): alternate between breadth and depth based on TDI.
    """

3.3.4 Sibling Summaries（兄弟摘要）

python 复制代码

# excalibur/memory/context_assembler.py:280-321
def _build_sibling_context(self, node, tree) -> str:
    """
    构建压缩的兄弟分支摘要。
    
    输出示例：
    ## Sibling Branch Summaries
    - **def456** [COMPLETED, TDI=0.35]: findings=[port 22/tcp] tools=[nmap]
    - **jkl012** [PENDING, TDI=0.60]: findings=[] tools=[]
    """

四、ContextCompressor - 上下文压缩器

4.1 压缩策略

位于 context_compressor.py

python 复制代码

# 两种压缩阈值：
class ContextCompressor:
    def __init__(self, ideal_threshold=0.6, aggressive_threshold=0.8):
        """
        - ideal_threshold: 理想压缩阈值 (60% 容量时开始压缩)
        - aggressive_threshold: 激进压缩阈值 (80% 容量时强制压缩)
        """

4.2 压缩流程

python 复制代码

# 压缩决策逻辑：
if context_load > ideal_threshold:
    # 理想压缩：生成分支摘要，替换详细上下文
    compress(tree, context_load)
elif context_load > aggressive_threshold:
    # 激进压缩：更激进的摘要策略
    compress(tree, context_load, aggressive=True)

五、完整上下文管理流程

5.1 数据流图

ContextCompressor StateStore ContextAssembler Controller ContextCompressor StateStore ContextAssembler Controller alt $需要压缩$ get_hosts() get_services_for_host() get_credentials() assemble(node, tree, mode, tdi) _build_path_context() _build_state_context() _build_mode_context() _build_sibling_context() context_string should_compress(context_load) compress(tree, context_load)

5.2 实际使用示例

python 复制代码

# excalibur/core/controller.py:478-504
async def _egats_loop(self, initial_task: str):
    # ...
    # 1. UCB 选择节点
    current_node = self._planner.select_next_node(tree)
    
    # 2. 计算 TDI
    context_load = 0.0
    if self._context_assembler:
        ctx = self._context_assembler.assemble(
            current_node, tree, "reconnaissance"
        )
        context_load = self._context_assembler.get_context_load(ctx)
    
    tdi = self._planner.compute_tdi(current_node, tree, context_load)
    
    # 3. 选择模式
    mode = self._planner.select_mode(tdi)
    
    # 4. 组装上下文提示词
    context_prompt = self._context_assembler.assemble(
        current_node, tree, mode, tdi.value
    )
    
    # 5. 构建查询
    query = self._build_egats_query(current_node, mode, context_prompt, tdi.value)
    
    # 6. 查询 LLM
    await self.backend.query(query)

六、上下文管理最佳实践

6.1 理解上下文结构

python 复制代码

# 完整的上下文示例（简化版）：
context = """
## Current Attack Path
  [ACTIVE] Initial reconnaissance of 10.10.10.50 (id=abc123)
    [COMPLETED] Nmap scan completed (id=def456)
      [ACTIVE] Port 80/tcp HTTP discovered (id=ghi789)>>

## Known State Facts
### Hosts
  - 10.10.10.50 (target.local) [OS: Linux]
  - 10.10.10.51 [OS: Windows]

### Services
  - 80/tcp http 1.3.3 (host=abc123)
  - 445/tcp smb 2.0.0 (host=abc123)

### Credentials
  - admin\\administrator [password] valid_for=['abc123']

### Active Sessions
  - shell on host=abc123 priv=user (id=sess001)

## Exploration Mode: RECONNAISSANCE (TDI=0.45)
Breadth-first: enumerate all attack surfaces before going deep...

## Sibling Branch Summaries
- **jkl012** [PENDING, TDI=0.60]: findings=[port 22/tcp] tools=[nmap]
"""

6.2 调试技巧

python 复制代码

# 在代码中添加上下文调试输出
import logging
logger = logging.getLogger(__name__)

def assemble_debug(self, node, tree, mode, tdi_value=0.5):
    context = self.assemble(node, tree, mode, tdi_value)
    logger.debug(f"=== CONTEXT (length={len(context)}) ===")
    logger.debug(context)
    logger.debug("=========================================")
    return context

6.3 性能优化建议

场景	建议
大量主机/服务	使用 `get_services_for_host()` 限定范围，避免加载全部数据
长会话	启用 ContextCompressor，定期压缩上下文
内存受限	使用文件型 StateStore (`db_path="workspace/state.db"`)

七、相关文件索引

文件	路径	功能
ContextAssembler	`excalibur/memory/context_assembler.py`	上下文组装器
StateStore	`excalibur/memory/state_store.py`	SQLite 状态存储
ContextCompressor	`excalibur/memory/context_compressor.py`	上下文压缩器
BranchSummary	`excalibur/memory/branch_summary.py`	分支摘要生成
Models	`excalibur/memory/models.py`	实体数据模型

附录：快速参考

StateStore 常用查询

python 复制代码

# 获取所有主机
hosts = state_store.get_hosts()

# 获取特定主机的服务
services = state_store.get_services_for_host(host_id)

# 获取所有凭据
creds = state_store.get_credentials()

# 获取对特定主机有效的凭据
host_creds = state_store.get_credentials_for_host(host_id)

# 获取活动会话
active_sessions = state_store.get_active_sessions()

ContextAssembler 输出结构

python 复制代码

# assemble() 返回的上下文结构：
{
    "attack_path": str,      # 攻击路径文本
    "state_facts": str,      # 状态事实文本
    "mode_guidance": str,    # 模式指导文本
    "sibling_summaries": str # 兄弟摘要文本
}
# 实际返回的是用 "\n\n" 连接的四部分字符串