阿里开源AgentScope多智能体框架解析系列(十七)第17章:Skill系统与Runtime Sandbox集成 - AIOps实践

导读

本章讲解Skill系统与AgentScope Runtime Sandbox的集成 ,通过真实的AIOps场景展示如何构建执行脚本、访问文件系统、调用系统工具的Skill系统

学习目标

  • 理解Runtime Sandbox的架构设计
  • 掌握Skill与文件系统的关联机制
  • 学会实现脚本执行类Skill
  • 掌握系统工具(grep、cat等)的集成
  • 了解自定义工具的扩展机制
  • 实现AIOps场景:故障诊断 + 报告生成

sandbox: github.com/agentscope-...


1.1 What - 整体架构

背景问题

scss 复制代码
Skill需要的能力:
├─ 执行Shell脚本 (故障诊断、日志分析)
├─ 调用系统工具 (grep、awk、cat等)
├─ 读取文件系统 (日志、配置、数据)
├─ 管理独立环境 (隔离、安全、可复现)
└─ 跨应用通信 (Agent Runtime as Server)

传统方式的问题:
❌ 安全隐患:直接执行系统命令
❌ 环境污染:影响主应用环境
❌ 难以管理:文件系统混乱
❌ 难以扩展:工具集成繁琐

解决方案:Runtime Sandbox

scss 复制代码
┌─────────────────────────────────────────────────────────────┐
│                    AgentScope Application                   │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                  Skill Management System                ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ││
│  │  │ Diagnosis    │  │ Report Gen   │  │ Custom Tools │  ││
│  │  │ Skill        │  │ Skill        │  │ Skill        │  ││
│  │  └───────┬──────┘  └───────┬──────┘  └───────┬──────┘  ││
│  │          │                │                │            ││
│  │          └────────────────┼────────────────┘            ││
│  │                           │                            ││
│  │                  Runtime Client (gRPC)                  ││
│  └───────────────────────────┼──────────────────────────────┘
│                              │
│                    ┌─────────▼──────────┐
│                    │  Runtime Server    │
│                    │ (Sandbox Service)  │
│                    └─────────┬──────────┘
│                              │
└──────────────────────────────┼──────────────────────────────
                               │
        Network (可跨服务器)
                               │
        ┌──────────────────────▼──────────────────────┐
        │  Isolated Sandbox Environment               │
        │  ┌────────────────────────────────────────┐ │
        │  │ File System (/tmp/skill_workspace)     │ │
        │  │ ├─ /diagnostics/                       │ │
        │  │ │  ├─ scripts/                         │ │
        │  │ │  │  ├─ check_cpu.sh                 │ │
        │  │ │  │  ├─ check_memory.sh              │ │
        │  │ │  │  └─ analyze_logs.sh              │ │
        │  │ │  ├─ reports/                        │ │
        │  │ │  │  └─ diagnosis_report.md          │ │
        │  │ │  └─ tools/                          │ │
        │  │ └─ /tools/                            │ │
        │  │    ├─ grep, cat, awk (系统工具)       │ │
        │  │    ├─ custom_analyzer (自定义工具)   │ │
        │  │    └─ ...                             │ │
        │  └────────────────────────────────────────┘ │
        │                                              │
        │  Execution Engine                            │
        │  ├─ Script Executor                          │
        │  ├─ Tool Invoker                             │
        │  └─ Resource Monitor                         │
        └──────────────────────────────────────────────┘

24.1.2 Why - 为什么需要这样的集成

1. 安全隔离

bash 复制代码
没有Sandbox:
Agent运行脚本 → 直接执行系统命令 → 可能删除/var、/etc等关键文件 ❌

使用Sandbox:
Agent运行脚本 → Sandbox环境中执行 → 隔离的文件系统 ✅
                                  → 脚本出错不影响主机系统

2. 文件系统组织

scss 复制代码
Skill的文件组织问题:
- Skill A的脚本在/opt/scripts/a.sh
- Skill B的脚本在/opt/scripts/b.sh
- 报告模板在/var/templates/report.md
→ 难以管理、版本控制困难、容易冲突

解决方案(Sandbox):
/tmp/skill_workspace/
├─ skill_diagnosis/
│  ├─ scripts/        (诊断脚本)
│  ├─ tools/          (诊断工具)
│  ├─ cache/          (诊断缓存)
│  └─ reports/        (诊断报告)
├─ skill_reporting/
│  ├─ templates/      (报告模板)
│  ├─ tools/          (报告工具)
│  └─ outputs/        (生成的报告)
→ 清晰、易于版本控制、支持容器化

3. 工具管理

diff 复制代码
传统方式:
- 系统工具(grep、cat、awk)直接访问
- 自定义工具需要安装到系统目录
- 难以更新和回滚

Sandbox方式:
- Sandbox内包含必要的系统工具
- 自定义工具动态挂载或编译到Sandbox
- 工具版本与Skill绑定,易于管理

4. Agent Runtime as Server

scss 复制代码
分布式部署场景:
┌──────────────┐         ┌──────────────┐
│  Agent App 1 │         │  Agent App 2 │
│   (Service A)│         │   (Service B)│
└───────┬──────┘         └───────┬──────┘
        │                        │
        └────────┬───────────────┘
                 │
         gRPC/HTTP (跨网络)
                 │
        ┌────────▼────────┐
        │ Runtime Server  │
        │   (Shared)      │
        └─────────────────┘

优势:
✅ 多个应用共享一个Runtime Server
✅ 集中管理Sandbox资源
✅ 文件系统统一管理
✅ 成本降低

1.3 Runtime Sandbox架构详解

java 复制代码
/**
 * Runtime Sandbox核心架构
 */

// 1. Sandbox配置
public class SandboxConfig {
    private String sandboxId;           // 沙箱唯一ID
    private String workspaceRoot;       // 工作目录 /tmp/skill_workspace
    private long maxMemory;             // 最大内存 512MB
    private long maxCpuTime;            // 最大CPU时间 30s
    private List<String> allowedTools;  // 允许的工具列表
    private Map<String, String> env;    // 环境变量
}

// 2. 文件系统隔离
public class IsolatedFileSystem {
    private File workspaceRoot;
    private Map<String, SkillFileSystemLayout> skillLayouts;
    
    // 每个Skill有独立的文件结构
    public static class SkillFileSystemLayout {
        public File scriptsDir;      // /workspace/skill_name/scripts/
        public File toolsDir;        // /workspace/skill_name/tools/
        public File cacheDir;        // /workspace/skill_name/cache/
        public File outputDir;       // /workspace/skill_name/output/
        public File configDir;       // /workspace/skill_name/config/
    }
}

// 3. 脚本执行器
public class ScriptExecutor {
    public ExecutionResult execute(
        String sandboxId,
        String scriptPath,
        Map<String, String> env,
        long timeout
    ) {
        // 在Sandbox中执行脚本
        // 返回:exitCode, stdout, stderr, executionTime
    }
}

// 4. 工具注册表
public class ToolRegistry {
    private Map<String, ToolDefinition> tools;
    
    // 系统工具:grep, cat, awk, sed, tail等
    // 自定义工具:custom_analyzer, log_parser等
    
    public ToolDefinition registerTool(String name, String path, String version) {
        // 注册工具到Sandbox
    }
}

// 5. 资源监控
public class ResourceMonitor {
    public void monitorExecution(
        String sandboxId,
        long timeout,
        long maxMemory
    ) {
        // 监控CPU、内存、磁盘等资源
        // 超过限制时自动杀死进程
    }
}

2. Skill的文件系统组织

2.1 标准的Skill文件结构

bash 复制代码
/tmp/skill_workspace/
│
├─ skill_aiops_diagnostics/           # AIOps诊断Skill
│  ├─ skill_manifest.yaml              # Skill元数据
│  ├─ scripts/                         # 诊断脚本
│  │  ├─ cpu_diagnostics.sh           # CPU诊断
│  │  ├─ memory_diagnostics.sh         # 内存诊断
│  │  ├─ disk_diagnostics.sh           # 磁盘诊断
│  │  ├─ network_diagnostics.sh        # 网络诊断
│  │  ├─ log_analysis.sh               # 日志分析
│  │  └─ correlation_analysis.py       # 故障关联分析
│  │
│  ├─ tools/                           # 工具集
│  │  ├─ system/                       # 系统工具
│  │  │  ├─ grep                       # 文本搜索
│  │  │  ├─ awk                        # 文本处理
│  │  │  ├─ sed                        # 流编辑器
│  │  │  └─ tail                       # 文件尾部
│  │  │
│  │  └─ custom/                       # 自定义工具
│  │     ├─ log_parser                 # 日志解析器
│  │     ├─ metric_aggregator          # 指标聚合器
│  │     └─ anomaly_detector            # 异常检测器
│  │
│  ├─ cache/                           # 缓存目录
│  │  ├─ metrics_cache/                # 指标缓存
│  │  └─ analysis_cache/               # 分析缓存
│  │
│  ├─ reports/                         # 报告输出
│  │  ├─ diagnosis_reports/            # 诊断报告
│  │  └─ analysis_results/             # 分析结果
│  │
│  └─ config/                          # 配置文件
│     ├─ diagnostic_rules.yaml          # 诊断规则
│     ├─ thresholds.yaml                # 告警阈值
│     └─ tool_config.yaml               # 工具配置
│
├─ skill_report_generation/             # 报告生成Skill
│  ├─ skill_manifest.yaml
│  ├─ templates/                       # 报告模板
│  │  ├─ executive_summary.md          # 执行摘要模板
│  │  ├─ detailed_analysis.md          # 详细分析模板
│  │  ├─ recommendations.md            # 建议模板
│  │  └─ styles/
│  │     └─ markdown.css               # 样式
│  │
│  ├─ generators/                      # 报告生成器
│  │  ├─ pdf_generator.py              # PDF生成
│  │  ├─ html_generator.py             # HTML生成
│  │  └─ markdown_generator.py          # Markdown生成
│  │
│  ├─ outputs/                         # 生成的报告
│  │  ├─ diagnosis_report_2025_01_04.pdf
│  │  ├─ diagnosis_report_2025_01_04.html
│  │  └─ diagnosis_report_2025_01_04.md
│  │
│  └─ config/
│     ├─ template_config.yaml           # 模板配置
│     └─ style_config.yaml              # 样式配置
│
└─ shared_tools/                        # 共享工具
   ├─ log_processor                     # 日志处理工具
   ├─ data_analyzer                     # 数据分析工具
   └─ report_formatter                  # 报告格式化工具

2.2 Skill Manifest定义

yaml 复制代码
# skill_manifest.yaml - Skill元数据

apiVersion: v1
kind: SkillManifest
metadata:
  name: aiops_diagnostics
  version: "1.0.0"
  description: "AIOps故障诊断Skill"
  author: "Platform Team"
  lastModified: "2025-01-04"

spec:
  # Skill基本信息
  skillType: "diagnostic"  # diagnostic / reporting / analysis
  
  # 文件系统需求
  filesystem:
    workspaceSize: "1GB"
    persistentDirs:
      - cache/
      - config/
    tempDirs:
      - reports/
  
  # 脚本定义
  scripts:
    - name: "cpu_diagnostics"
      path: "scripts/cpu_diagnostics.sh"
      language: "bash"
      timeout: "30s"
      inputs:
        - name: "duration"
          type: "integer"
          description: "诊断时长(秒)"
      outputs:
        - name: "cpu_report"
          type: "json"
          description: "CPU诊断结果"
    
    - name: "log_analysis"
      path: "scripts/log_analysis.sh"
      language: "bash"
      timeout: "60s"
      inputs:
        - name: "log_file"
          type: "string"
          description: "日志文件路径"
        - name: "keywords"
          type: "array"
          description: "关键词列表"
      outputs:
        - name: "analysis_result"
          type: "json"
          description: "分析结果"
    
    - name: "correlation_analysis"
      path: "scripts/correlation_analysis.py"
      language: "python"
      timeout: "120s"
      inputs:
        - name: "metrics_data"
          type: "json"
          description: "指标数据"
      outputs:
        - name: "correlation_report"
          type: "json"
          description: "关联分析报告"
  
  # 工具依赖
  tools:
    system:
      - name: "grep"
        version: "*"
        required: true
      - name: "awk"
        version: "*"
        required: true
      - name: "sed"
        version: "*"
        required: false
      - name: "tail"
        version: "*"
        required: true
    
    custom:
      - name: "log_parser"
        path: "tools/custom/log_parser"
        version: "1.0.0"
        required: true
      - name: "metric_aggregator"
        path: "tools/custom/metric_aggregator"
        version: "1.0.0"
        required: true
  
  # 环境变量
  environment:
    JAVA_HOME: "/usr/lib/jvm/java-17"
    LOG_LEVEL: "INFO"
    CACHE_DIR: "${WORKSPACE}/cache"
    REPORT_DIR: "${WORKSPACE}/reports"
  
  # 资源限制
  resources:
    cpu: "2"           # 最多2个核心
    memory: "512Mi"     # 最多512MB
    timeout: "300s"     # 最多5分钟
    diskSpace: "1Gi"    # 最多1GB磁盘
  
  # 安全权限
  security:
    readPaths:
      - "/var/log/"
      - "/proc/stat"
      - "/proc/meminfo"
    writePaths:
      - "${WORKSPACE}/cache/"
      - "${WORKSPACE}/reports/"
    networkAccess: false  # 不允许网络访问
    executablePaths:
      - "scripts/"
      - "tools/"

3. AIOps故障诊断Skill实现

3.1 Skill基础框架

java 复制代码
/**
 * AIOps故障诊断Skill基类
 */
public abstract class AIOpsDiagnosticSkill extends AgentSkill {
    
    protected static final Logger logger = LoggerFactory.getLogger(AIOpsDiagnosticSkill.class);
    
    // Runtime Sandbox客户端
    protected final SandboxClient sandboxClient;
    
    // Skill文件系统
    protected final SkillFileSystem fileSystem;
    
    // 工具注册表
    protected final ToolRegistry toolRegistry;
    
    // 诊断规则
    protected final DiagnosticRuleEngine ruleEngine;
    
    public AIOpsDiagnosticSkill(
            String skillName,
            SandboxClient sandboxClient,
            SkillFileSystem fileSystem,
            ToolRegistry toolRegistry) {
        super(skillName);
        this.sandboxClient = sandboxClient;
        this.fileSystem = fileSystem;
        this.toolRegistry = toolRegistry;
        this.ruleEngine = new DiagnosticRuleEngine(this.fileSystem);
    }
    
    /**
     * 执行诊断脚本
     */
    protected ExecutionResult executeScript(
            String scriptName,
            Map<String, String> inputs,
            long timeout) throws Exception {
        
        logger.info("执行诊断脚本: {}", scriptName);
        
        // 获取脚本路径
        String scriptPath = fileSystem.getScriptPath(scriptName);
        
        // 构建脚本命令
        String command = buildCommand(scriptPath, inputs);
        
        // 在Sandbox中执行
        ExecutionResult result = sandboxClient.executeScript(
            getSandboxId(),
            command,
            timeout,
            getEnvironmentVariables()
        );
        
        logger.info("脚本执行完成: exitCode={}, duration={}ms",
            result.getExitCode(), result.getExecutionTime());
        
        return result;
    }
    
    /**
     * 调用系统工具
     */
    protected String invokeTool(
            String toolName,
            List<String> args) throws Exception {
        
        logger.debug("调用工具: {} with args: {}", toolName, args);
        
        ToolDefinition tool = toolRegistry.getTool(toolName);
        if (tool == null) {
            throw new ToolNotFoundException(toolName);
        }
        
        // 构建完整命令
        List<String> command = new ArrayList<>();
        command.add(tool.getPath());
        command.addAll(args);
        
        // 执行命令
        ExecutionResult result = sandboxClient.execute(
            getSandboxId(),
            String.join(" ", command),
            30000  // 30秒超时
        );
        
        if (result.getExitCode() != 0) {
            logger.error("工具执行失败: {}", result.getStderr());
            throw new ToolExecutionException(toolName, result.getStderr());
        }
        
        return result.getStdout();
    }
    
    /**
     * 读取文件
     */
    protected String readFile(String filePath) throws Exception {
        String fullPath = fileSystem.resolvePath(filePath);
        
        // 使用cat工具读取文件
        return invokeTool("cat", List.of(fullPath));
    }
    
    /**
     * 分析日志文件
     */
    protected LogAnalysisResult analyzeLog(
            String logFilePath,
            List<String> keywords) throws Exception {
        
        logger.info("分析日志文件: {}", logFilePath);
        
        // 执行日志分析脚本
        Map<String, String> inputs = Map.of(
            "log_file", logFilePath,
            "keywords", String.join(",", keywords)
        );
        
        ExecutionResult result = executeScript("log_analysis", inputs, 60000);
        
        // 解析结果
        return parseAnalysisResult(result.getStdout());
    }
    
    protected abstract String buildCommand(String scriptPath, Map<String, String> inputs);
    protected abstract Map<String, String> getEnvironmentVariables();
    protected abstract String getSandboxId();
    protected abstract LogAnalysisResult parseAnalysisResult(String output);
}

/**
 * 执行结果
 */
@Data
@Builder
class ExecutionResult {
    private String sandboxId;
    private int exitCode;
    private String stdout;
    private String stderr;
    private long executionTime;
    private Map<String, String> metrics;  // CPU、内存等
}

/**
 * 日志分析结果
 */
@Data
@Builder
class LogAnalysisResult {
    private List<LogEntry> errorLines;
    private List<LogEntry> warningLines;
    private Map<String, Integer> keywordCounts;
    private String summary;
    private LocalDateTime analysisTime;
}

@Data
@Builder
class LogEntry {
    private int lineNumber;
    private String timestamp;
    private String level;
    private String message;
    private String context;
}

3.2 CPU诊断脚本

bash 复制代码
#!/bin/bash
# scripts/cpu_diagnostics.sh - CPU诊断脚本

set -e

DURATION=${1:-30}
OUTPUT_FILE="/workspace/reports/cpu_diagnostics_$(date +%s).json"

echo "开始CPU诊断(${DURATION}秒)..."

# 创建JSON报告
{
    echo "{"
    echo "  \"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\","
    echo "  \"diagnostics\": {"
    
    # 1. 获取CPU核心数
    CPU_CORES=$(grep -c "^processor" /proc/cpuinfo)
    echo "    \"cpu_cores\": $CPU_CORES,"
    
    # 2. 获取CPU型号
    CPU_MODEL=$(grep "^model name" /proc/cpuinfo | head -1 | cut -d: -f2 | xargs)
    echo "    \"cpu_model\": \"$CPU_MODEL\","
    
    # 3. 获取当前CPU使用率
    echo "    \"current_usage\": {"
    
    # 使用top命令获取平均负载
    IFS=' ' read -r LA1 LA5 LA15 < /proc/loadavg
    echo "      \"load_average\": {"
    echo "        \"1m\": $LA1,"
    echo "        \"5m\": $LA5,"
    echo "        \"15m\": $LA15"
    echo "      },"
    
    # 获取每个CPU核的使用率
    echo "      \"per_cpu\": ["
    
    FIRST=true
    while IFS= read -r line; do
        if [[ $line == cpu[0-9]* ]]; then
            if [ "$FIRST" = false ]; then echo ","; fi
            FIRST=false
            
            # 解析CPU统计信息
            FIELDS=($line)
            USER=${FIELDS[1]}
            NICE=${FIELDS[2]}
            SYSTEM=${FIELDS[3]}
            IDLE=${FIELDS[4]}
            
            TOTAL=$((USER + NICE + SYSTEM + IDLE))
            USAGE=$((100 * (TOTAL - IDLE) / TOTAL))
            
            echo "        {"
            echo "          \"cpu\": \"${FIELDS[0]}\","
            echo "          \"usage_percent\": $USAGE"
            echo "        }" | tr -d '\n'
        fi
    done < /proc/stat
    
    echo ""
    echo "      ]"
    echo "    },"
    
    # 4. 诊断高CPU使用的进程
    echo "    \"top_processes\": ["
    
    FIRST=true
    ps aux --sort=-%cpu | tail -n +2 | head -5 | while read -r line; do
        if [ "$FIRST" = false ]; then echo ","; fi
        FIRST=false
        
        IFS=' ' read -r USER PID PCPU PMEM VSZ RSS TTY STAT START TIME CMD <<< "$line"
        
        echo "        {"
        echo "          \"pid\": $PID,"
        echo "          \"user\": \"$USER\","
        echo "          \"cpu_percent\": $PCPU,"
        echo "          \"memory_percent\": $PMEM,"
        echo "          \"command\": \"$(echo $CMD | cut -d' ' -f1)\""
        echo "        }" | tr -d '\n'
    done
    
    echo ""
    echo "      ]"
    echo "    }"
    echo "  },"
    
    # 5. 诊断建议
    echo "  \"recommendations\": ["
    
    if (( $(echo "$LA1 > $CPU_CORES" | bc -l) )); then
        echo "    \"警告:1分钟平均负载高于CPU核心数,系统可能过载\","
    fi
    
    if (( $(echo "$LA5 > $CPU_CORES * 0.8" | bc -l) )); then
        echo "    \"提示:5分钟平均负载较高,建议检查后台任务\","
    fi
    
    echo "    \"建议定期监控CPU使用率\"" 
    echo "  ]"
    echo "}"
    
} > "$OUTPUT_FILE"

# 输出结果路径
echo "诊断完成,结果保存到: $OUTPUT_FILE"
cat "$OUTPUT_FILE"

3.3 内存诊断Skill实现

java 复制代码
/**
 * 内存诊断Skill
 */
public class MemoryDiagnosticSkill extends AIOpsDiagnosticSkill {
    
    private static final Logger logger = LoggerFactory.getLogger(MemoryDiagnosticSkill.class);
    
    private final MetricAggregator metricAggregator;
    
    public MemoryDiagnosticSkill(
            SandboxClient sandboxClient,
            SkillFileSystem fileSystem,
            ToolRegistry toolRegistry) {
        super("memory_diagnostic", sandboxClient, fileSystem, toolRegistry);
        this.metricAggregator = new MetricAggregator();
    }
    
    /**
     * 执行内存诊断
     */
    public MemoryDiagnosticReport diagnose() throws Exception {
        logger.info("开始内存诊断...");
        
        MemoryDiagnosticReport.Builder reportBuilder = MemoryDiagnosticReport.builder()
            .timestamp(LocalDateTime.now());
        
        try {
            // Step 1: 获取内存统计
            MemoryStatistics stats = getMemoryStatistics();
            reportBuilder.statistics(stats);
            logger.info("内存统计: 总内存={}MB, 已用={}MB, 可用={}MB",
                stats.getTotalMemory() / 1024 / 1024,
                stats.getUsedMemory() / 1024 / 1024,
                stats.getAvailableMemory() / 1024 / 1024);
            
            // Step 2: 分析内存使用趋势
            List<MemoryMetric> trends = analyzeMemoryTrends();
            reportBuilder.trends(trends);
            
            // Step 3: 识别内存泄漏
            MemoryLeakDetection leakDetection = detectMemoryLeaks();
            reportBuilder.leakDetection(leakDetection);
            
            // Step 4: 诊断高内存进程
            List<ProcessMemoryInfo> topProcesses = getTopMemoryConsumers(5);
            reportBuilder.topProcesses(topProcesses);
            
            // Step 5: 生成建议
            List<String> recommendations = generateRecommendations(stats, leakDetection);
            reportBuilder.recommendations(recommendations);
            
            reportBuilder.status("SUCCESS");
            
        } catch (Exception e) {
            logger.error("内存诊断失败", e);
            reportBuilder.status("FAILED").error(e.getMessage());
        }
        
        return reportBuilder.build();
    }
    
    /**
     * 获取内存统计信息
     */
    private MemoryStatistics getMemoryStatistics() throws Exception {
        // 调用系统工具获取内存信息
        String meminfo = readFile("/proc/meminfo");
        
        // 解析/proc/meminfo
        Map<String, Long> memStats = new HashMap<>();
        for (String line : meminfo.split("\n")) {
            if (line.trim().isEmpty()) continue;
            
            // 格式: MemTotal:        16334732 kB
            String[] parts = line.split(":");
            if (parts.length == 2) {
                String key = parts[0].trim();
                String value = parts[1].trim().split("\\s+")[0];
                memStats.put(key, Long.parseLong(value) * 1024); // 转换为字节
            }
        }
        
        return MemoryStatistics.builder()
            .totalMemory(memStats.getOrDefault("MemTotal", 0L))
            .usedMemory(memStats.getOrDefault("MemTotal", 0L) - 
                       memStats.getOrDefault("MemAvailable", 0L))
            .availableMemory(memStats.getOrDefault("MemAvailable", 0L))
            .buffers(memStats.getOrDefault("Buffers", 0L))
            .cached(memStats.getOrDefault("Cached", 0L))
            .swapTotal(memStats.getOrDefault("SwapTotal", 0L))
            .swapUsed(memStats.getOrDefault("SwapTotal", 0L) - 
                     memStats.getOrDefault("SwapFree", 0L))
            .timestamp(LocalDateTime.now())
            .build();
    }
    
    /**
     * 分析内存使用趋势
     */
    private List<MemoryMetric> analyzeMemoryTrends() throws Exception {
        logger.info("分析内存使用趋势...");
        
        // 获取缓存中的历史数据
        File cacheDir = fileSystem.getCacheDir();
        List<MemoryMetric> trends = new ArrayList<>();
        
        // 读取历史指标(JSON格式)
        File[] cacheFiles = cacheDir.listFiles((dir, name) -> name.startsWith("memory_") && name.endsWith(".json"));
        
        if (cacheFiles != null) {
            Arrays.sort(cacheFiles, Comparator.comparingLong(File::lastModified));
            
            for (File cacheFile : cacheFiles) {
                MemoryMetric metric = parseMemoryMetric(readFile(cacheFile.getAbsolutePath()));
                trends.add(metric);
            }
        }
        
        // 保存当前指标到缓存
        MemoryMetric currentMetric = MemoryMetric.builder()
            .timestamp(LocalDateTime.now())
            .memoryUsagePercent(getCurrentMemoryUsagePercent())
            .build();
        
        String cacheFile = cacheDir.getAbsolutePath() + "/memory_" + System.currentTimeMillis() + ".json";
        saveMetricToCache(cacheFile, currentMetric);
        
        trends.add(currentMetric);
        return trends;
    }
    
    /**
     * 检测内存泄漏
     */
    private MemoryLeakDetection detectMemoryLeaks() throws Exception {
        logger.info("检测内存泄漏...");
        
        // 执行自定义的内存泄漏检测工具
        String result = invokeTool("memory_leak_detector", List.of(
            "--duration", "60",
            "--threshold", "80"
        ));
        
        // 解析结果
        return parseLeakDetectionResult(result);
    }
    
    /**
     * 获取内存使用最高的进程
     */
    private List<ProcessMemoryInfo> getTopMemoryConsumers(int topN) throws Exception {
        // 使用grep和awk处理ps输出
        String psOutput = invokeTool("ps", Arrays.asList(
            "aux", "--sort=-rss"
        ));
        
        List<ProcessMemoryInfo> processes = new ArrayList<>();
        
        String[] lines = psOutput.split("\n");
        for (int i = 1; i < Math.min(lines.length, topN + 1); i++) {
            String[] fields = lines[i].trim().split("\\s+");
            
            ProcessMemoryInfo info = ProcessMemoryInfo.builder()
                .pid(Integer.parseInt(fields[1]))
                .user(fields[0])
                .memoryPercent(Double.parseDouble(fields[3]))
                .memoryRss(Long.parseLong(fields[5]) * 1024) // 转换为字节
                .command(fields[10])
                .build();
            
            processes.add(info);
        }
        
        return processes;
    }
    
    /**
     * 生成诊断建议
     */
    private List<String> generateRecommendations(
            MemoryStatistics stats,
            MemoryLeakDetection leakDetection) {
        
        List<String> recommendations = new ArrayList<>();
        
        double usagePercent = (double) stats.getUsedMemory() / stats.getTotalMemory() * 100;
        
        if (usagePercent > 90) {
            recommendations.add("⚠️  严重警告:内存使用率超过90%,系统可能面临OOM风险");
            recommendations.add("   建议:立即释放不必要的内存,考虑增加物理内存");
        } else if (usagePercent > 80) {
            recommendations.add("⚠️  警告:内存使用率超过80%,性能可能下降");
            recommendations.add("   建议:监控内存使用情况,识别大内存应用");
        } else if (usagePercent > 70) {
            recommendations.add("📌 提示:内存使用率超过70%,建议定期清理");
        }
        
        if (leakDetection.isDetected()) {
            recommendations.add("⚠️  检测到可能的内存泄漏:" + leakDetection.getDescription());
            recommendations.add("   建议:分析相关应用的日志和堆转储");
        }
        
        if (stats.getSwapUsed() > 0) {
            double swapPercent = (double) stats.getSwapUsed() / stats.getSwapTotal() * 100;
            if (swapPercent > 50) {
                recommendations.add("⚠️  警告:Swap使用率过高(" + String.format("%.1f", swapPercent) + "%),性能严重下降");
                recommendations.add("   建议:检查是否存在内存泄漏或不合理的内存占用");
            }
        }
        
        recommendations.add("💡 建议:定期运行内存诊断,及时发现潜在问题");
        
        return recommendations;
    }
    
    @Override
    protected String buildCommand(String scriptPath, Map<String, String> inputs) {
        return scriptPath;  // 直接执行脚本
    }
    
    @Override
    protected Map<String, String> getEnvironmentVariables() {
        return Map.of(
            "JAVA_HOME", "/usr/lib/jvm/java-17",
            "LOG_LEVEL", "INFO"
        );
    }
    
    @Override
    protected String getSandboxId() {
        return "aiops_sandbox_memory";
    }
    
    @Override
    protected LogAnalysisResult parseAnalysisResult(String output) {
        // 实现日志分析结果解析
        return null;
    }
    
    // 辅助方法...
    private double getCurrentMemoryUsagePercent() throws Exception {
        MemoryStatistics stats = getMemoryStatistics();
        return (double) stats.getUsedMemory() / stats.getTotalMemory() * 100;
    }
    
    private MemoryMetric parseMemoryMetric(String json) {
        // 解析JSON格式的内存指标
        return MemoryMetric.builder().build();
    }
    
    private void saveMetricToCache(String path, MemoryMetric metric) throws Exception {
        // 保存指标到缓存
    }
    
    private MemoryLeakDetection parseLeakDetectionResult(String result) {
        return MemoryLeakDetection.builder()
            .detected(result.contains("leak"))
            .description(result)
            .build();
    }
}

// 数据模型
@Data
@Builder
class MemoryDiagnosticReport {
    private LocalDateTime timestamp;
    private MemoryStatistics statistics;
    private List<MemoryMetric> trends;
    private MemoryLeakDetection leakDetection;
    private List<ProcessMemoryInfo> topProcesses;
    private List<String> recommendations;
    private String status;
    private String error;
}

@Data
@Builder
class MemoryStatistics {
    private long totalMemory;
    private long usedMemory;
    private long availableMemory;
    private long buffers;
    private long cached;
    private long swapTotal;
    private long swapUsed;
    private LocalDateTime timestamp;
}

@Data
@Builder
class MemoryMetric {
    private LocalDateTime timestamp;
    private double memoryUsagePercent;
}

@Data
@Builder
class MemoryLeakDetection {
    private boolean detected;
    private String description;
}

@Data
@Builder
class ProcessMemoryInfo {
    private int pid;
    private String user;
    private double memoryPercent;
    private long memoryRss;
    private String command;
}

4. 自定义工具的开发与集成

4.1 自定义工具框架

java 复制代码
/**
 * 自定义工具基类
 */
public abstract class CustomTool {
    
    protected final Logger logger = LoggerFactory.getLogger(getClass());
    protected final ToolContext context;
    
    public CustomTool(ToolContext context) {
        this.context = context;
    }
    
    /**
     * 工具元数据
     */
    public abstract ToolMetadata getMetadata();
    
    /**
     * 执行工具
     */
    public abstract ToolResult execute(Map<String, String> args) throws Exception;
    
    /**
     * 工具元数据
     */
    @Data
    @Builder
    public static class ToolMetadata {
        private String name;
        private String version;
        private String description;
        private List<ParameterSpec> parameters;
        private String outputFormat;  // json / text / binary
    }
    
    @Data
    @Builder
    public static class ParameterSpec {
        private String name;
        private String type;  // string / int / boolean / array
        private String description;
        private boolean required;
        private String defaultValue;
    }
    
    @Data
    @Builder
    public static class ToolResult {
        private int exitCode;
        private String output;
        private String error;
        private Map<String, Object> metrics;  // 执行指标
    }
}

/**
 * 工具上下文
 */
@Data
public class ToolContext {
    private String toolId;
    private String sandboxId;
    private File workspaceRoot;
    private Map<String, String> environment;
    private long timeout;
}

/**
 * 日志解析工具
 */
public class LogParserTool extends CustomTool {
    
    public LogParserTool(ToolContext context) {
        super(context);
    }
    
    @Override
    public ToolMetadata getMetadata() {
        return ToolMetadata.builder()
            .name("log_parser")
            .version("1.0.0")
            .description("高性能日志解析和分析工具")
            .parameters(Arrays.asList(
                ParameterSpec.builder()
                    .name("log_file")
                    .type("string")
                    .description("日志文件路径")
                    .required(true)
                    .build(),
                ParameterSpec.builder()
                    .name("pattern")
                    .type("string")
                    .description("匹配模式(正则表达式)")
                    .required(false)
                    .build(),
                ParameterSpec.builder()
                    .name("level")
                    .type("string")
                    .description("日志级别(ERROR/WARN/INFO)")
                    .required(false)
                    .defaultValue("*")
                    .build(),
                ParameterSpec.builder()
                    .name("max_lines")
                    .type("int")
                    .description("最多返回行数")
                    .required(false)
                    .defaultValue("1000")
                    .build()
            ))
            .outputFormat("json")
            .build();
    }
    
    @Override
    public ToolResult execute(Map<String, String> args) throws Exception {
        logger.info("执行日志解析: file={}", args.get("log_file"));
        
        String logFile = args.get("log_file");
        String pattern = args.getOrDefault("pattern", ".*");
        String level = args.getOrDefault("level", "*");
        int maxLines = Integer.parseInt(args.getOrDefault("max_lines", "1000"));
        
        // 读取日志文件
        List<String> lines = readLogFile(logFile);
        
        // 解析日志
        List<LogEntry> entries = parseLogLines(lines, pattern, level, maxLines);
        
        // 统计信息
        Map<String, Integer> statistics = computeStatistics(entries);
        
        // 生成JSON输出
        String output = generateJsonOutput(entries, statistics);
        
        return ToolResult.builder()
            .exitCode(0)
            .output(output)
            .metrics(Map.of("lines_processed", lines.size(), "entries_matched", entries.size()))
            .build();
    }
    
    private List<String> readLogFile(String filePath) throws Exception {
        Path path = Paths.get(filePath);
        return Files.readAllLines(path);
    }
    
    private List<LogEntry> parseLogLines(
            List<String> lines,
            String pattern,
            String level,
            int maxLines) throws Exception {
        
        List<LogEntry> entries = new ArrayList<>();
        Pattern regex = Pattern.compile(pattern);
        
        for (String line : lines) {
            if (entries.size() >= maxLines) break;
            
            Matcher matcher = regex.matcher(line);
            if (matcher.find()) {
                // 解析日志格式(支持常见格式)
                LogEntry entry = parseSingleLogEntry(line);
                
                if ("*".equals(level) || line.contains(level)) {
                    entries.add(entry);
                }
            }
        }
        
        return entries;
    }
    
    private LogEntry parseSingleLogEntry(String line) {
        // 简单的日志解析实现
        // 支持格式: [2025-01-04 10:30:45] ERROR [com.example.Service] Message
        
        String[] parts = line.split("\\]");
        String timestamp = parts.length > 0 ? parts[0].substring(1) : "";
        String levelAndMessage = parts.length > 1 ? parts[1].trim() : "";
        
        return LogEntry.builder()
            .timestamp(timestamp)
            .message(levelAndMessage)
            .build();
    }
    
    private Map<String, Integer> computeStatistics(List<LogEntry> entries) {
        Map<String, Integer> stats = new HashMap<>();
        stats.put("total_entries", entries.size());
        
        // 统计各级别日志
        entries.stream()
            .filter(e -> e.getMessage().contains("ERROR"))
            .count();
        
        return stats;
    }
    
    private String generateJsonOutput(List<LogEntry> entries, Map<String, Integer> stats) {
        // 返回JSON格式的分析结果
        return new ObjectMapper().writeValueAsString(Map.of(
            "entries", entries,
            "statistics", stats
        ));
    }
}

/**
 * 指标聚合工具
 */
public class MetricAggregatorTool extends CustomTool {
    
    public MetricAggregatorTool(ToolContext context) {
        super(context);
    }
    
    @Override
    public ToolMetadata getMetadata() {
        return ToolMetadata.builder()
            .name("metric_aggregator")
            .version("1.0.0")
            .description("系统指标聚合和统计工具")
            .parameters(Arrays.asList(
                ParameterSpec.builder()
                    .name("metric_file")
                    .type("string")
                    .description("指标文件路径")
                    .required(true)
                    .build(),
                ParameterSpec.builder()
                    .name("aggregate_type")
                    .type("string")
                    .description("聚合类型(sum/avg/min/max/percentile)")
                    .required(true)
                    .build()
            ))
            .outputFormat("json")
            .build();
    }
    
    @Override
    public ToolResult execute(Map<String, String> args) throws Exception {
        String metricFile = args.get("metric_file");
        String aggregateType = args.get("aggregate_type");
        
        logger.info("聚合指标: file={}, type={}", metricFile, aggregateType);
        
        // 读取指标文件
        List<Double> metrics = readMetricFile(metricFile);
        
        // 执行聚合
        Map<String, Double> result = aggregate(metrics, aggregateType);
        
        String output = new ObjectMapper().writeValueAsString(result);
        
        return ToolResult.builder()
            .exitCode(0)
            .output(output)
            .metrics(Map.of("metrics_count", metrics.size()))
            .build();
    }
    
    private List<Double> readMetricFile(String filePath) throws Exception {
        // 读取指标文件
        return new ArrayList<>();
    }
    
    private Map<String, Double> aggregate(List<Double> metrics, String type) {
        Map<String, Double> result = new HashMap<>();
        
        if ("sum".equals(type)) {
            result.put("sum", metrics.stream().mapToDouble(Double::doubleValue).sum());
        } else if ("avg".equals(type)) {
            result.put("average", metrics.stream().mapToDouble(Double::doubleValue).average().orElse(0));
        } else if ("min".equals(type)) {
            result.put("min", metrics.stream().mapToDouble(Double::doubleValue).min().orElse(0));
        } else if ("max".equals(type)) {
            result.put("max", metrics.stream().mapToDouble(Double::doubleValue).max().orElse(0));
        }
        
        return result;
    }
}

4.2 工具注册和管理

java 复制代码
/**
 * 工具注册表和管理器
 */
public class ToolRegistry {
    
    private final Map<String, ToolDefinition> systemTools = new ConcurrentHashMap<>();
    private final Map<String, CustomTool> customTools = new ConcurrentHashMap<>();
    
    public ToolRegistry() {
        initializeSystemTools();
    }
    
    /**
     * 初始化系统工具
     */
    private void initializeSystemTools() {
        // 注册grep
        registerSystemTool(ToolDefinition.builder()
            .name("grep")
            .path("/bin/grep")
            .type("system")
            .version("2.28")
            .description("文本搜索工具")
            .build());
        
        // 注册cat
        registerSystemTool(ToolDefinition.builder()
            .name("cat")
            .path("/bin/cat")
            .type("system")
            .version("8.32")
            .description("文件查看工具")
            .build());
        
        // 注册awk
        registerSystemTool(ToolDefinition.builder()
            .name("awk")
            .path("/usr/bin/awk")
            .type("system")
            .version("5.1.0")
            .description("文本处理工具")
            .build());
        
        // 注册sed
        registerSystemTool(ToolDefinition.builder()
            .name("sed")
            .path("/bin/sed")
            .type("system")
            .version("4.7")
            .description("流编辑器")
            .build());
        
        // 注册tail
        registerSystemTool(ToolDefinition.builder()
            .name("tail")
            .path("/usr/bin/tail")
            .type("system")
            .version("8.32")
            .description("查看文件末尾")
            .build());
        
        // 注册ps
        registerSystemTool(ToolDefinition.builder()
            .name("ps")
            .path("/bin/ps")
            .type("system")
            .version("3.3.17")
            .description("进程信息查看")
            .build());
    }
    
    /**
     * 注册系统工具
     */
    public void registerSystemTool(ToolDefinition toolDef) {
        systemTools.put(toolDef.getName(), toolDef);
        logger.info("已注册系统工具: {} ({})", toolDef.getName(), toolDef.getPath());
    }
    
    /**
     * 注册自定义工具
     */
    public void registerCustomTool(String name, CustomTool tool) {
        customTools.put(name, tool);
        ToolMetadata metadata = tool.getMetadata();
        logger.info("已注册自定义工具: {} v{}", metadata.getName(), metadata.getVersion());
    }
    
    /**
     * 获取工具
     */
    public ToolDefinition getTool(String name) {
        return systemTools.get(name);
    }
    
    /**
     * 获取自定义工具
     */
    public CustomTool getCustomTool(String name) {
        return customTools.get(name);
    }
    
    /**
     * 验证工具是否存在
     */
    public boolean hasTool(String name) {
        return systemTools.containsKey(name) || customTools.containsKey(name);
    }
    
    /**
     * 列出所有工具
     */
    public List<String> listAllTools() {
        List<String> tools = new ArrayList<>();
        tools.addAll(systemTools.keySet());
        tools.addAll(customTools.keySet());
        return tools;
    }
}

@Data
@Builder
class ToolDefinition {
    private String name;
    private String path;
    private String type;  // system / custom
    private String version;
    private String description;
}

5. 报告生成Skill

5.1 报告模板管理

java 复制代码
/**
 * 报告模板系统
 */
public class ReportTemplateEngine {
    
    private final SkillFileSystem fileSystem;
    private final TemplateResolver resolver;
    
    public ReportTemplateEngine(SkillFileSystem fileSystem) {
        this.fileSystem = fileSystem;
        this.resolver = new TemplateResolver();
    }
    
    /**
     * 生成报告
     */
    public String generateReport(
            String templateName,
            Map<String, Object> data,
            ReportFormat format) throws Exception {
        
        logger.info("生成报告: template={}, format={}", templateName, format);
        
        // 加载模板
        String templateContent = loadTemplate(templateName);
        
        // 渲染模板
        String renderedContent = renderTemplate(templateContent, data);
        
        // 转换格式
        return convertFormat(renderedContent, format);
    }
    
    /**
     * 加载模板
     */
    private String loadTemplate(String templateName) throws Exception {
        File templateFile = new File(
            fileSystem.getTemplateDir(),
            templateName + ".md"
        );
        
        return new String(Files.readAllBytes(templateFile.toPath()));
    }
    
    /**
     * 渲染模板
     */
    private String renderTemplate(String template, Map<String, Object> data) {
        // 使用Freemarker或Velocity进行模板渲染
        // 这里简化为简单的字符串替换
        
        String result = template;
        for (Map.Entry<String, Object> entry : data.entrySet()) {
            String placeholder = "${" + entry.getKey() + "}";
            result = result.replace(placeholder, String.valueOf(entry.getValue()));
        }
        
        return result;
    }
    
    /**
     * 格式转换
     */
    private String convertFormat(String content, ReportFormat format) throws Exception {
        switch (format) {
            case MARKDOWN:
                return content;
            case HTML:
                return convertMarkdownToHtml(content);
            case PDF:
                return convertMarkdownToPdf(content);
            default:
                return content;
        }
    }
    
    private String convertMarkdownToHtml(String markdown) {
        // 使用commonmark库进行转换
        return "<html><body>" + markdown + "</body></html>";
    }
    
    private String convertMarkdownToPdf(String markdown) throws Exception {
        // 使用iText或wkhtmltopdf进行转换
        return "PDF内容";
    }
}

public enum ReportFormat {
    MARKDOWN,
    HTML,
    PDF
}

5.2 诊断报告生成Skill

java 复制代码
/**
 * 诊断报告生成Skill
 */
public class DiagnosticReportGenerationSkill extends AgentSkill {
    
    private static final Logger logger = LoggerFactory.getLogger(DiagnosticReportGenerationSkill.class);
    
    private final ReportTemplateEngine templateEngine;
    private final MemoryDiagnosticSkill memoryDiagnosticSkill;
    private final SkillFileSystem fileSystem;
    
    public DiagnosticReportGenerationSkill(
            ReportTemplateEngine templateEngine,
            MemoryDiagnosticSkill memoryDiagnosticSkill,
            SkillFileSystem fileSystem) {
        super("diagnostic_report_generation");
        this.templateEngine = templateEngine;
        this.memoryDiagnosticSkill = memoryDiagnosticSkill;
        this.fileSystem = fileSystem;
    }
    
    /**
     * 生成完整的诊断报告
     */
    public DiagnosticReport generateFullReport() throws Exception {
        logger.info("生成诊断报告...");
        
        DiagnosticReport.Builder reportBuilder = DiagnosticReport.builder()
            .timestamp(LocalDateTime.now())
            .version("1.0");
        
        try {
            // Step 1: 执行诊断
            MemoryDiagnosticReport memoryReport = memoryDiagnosticSkill.diagnose();
            reportBuilder.memoryReport(memoryReport);
            
            // Step 2: 准备报告数据
            Map<String, Object> reportData = prepareReportData(memoryReport);
            
            // Step 3: 生成Markdown报告
            String markdownContent = templateEngine.generateReport(
                "diagnostic_template",
                reportData,
                ReportFormat.MARKDOWN
            );
            reportBuilder.markdownContent(markdownContent);
            
            // Step 4: 转换为其他格式
            String htmlContent = templateEngine.generateReport(
                "diagnostic_template",
                reportData,
                ReportFormat.HTML
            );
            reportBuilder.htmlContent(htmlContent);
            
            // Step 5: 保存报告文件
            String reportPath = saveReport(markdownContent, htmlContent);
            reportBuilder.reportPath(reportPath);
            
            reportBuilder.status("SUCCESS");
            
        } catch (Exception e) {
            logger.error("报告生成失败", e);
            reportBuilder.status("FAILED").error(e.getMessage());
        }
        
        return reportBuilder.build();
    }
    
    /**
     * 准备报告数据
     */
    private Map<String, Object> prepareReportData(MemoryDiagnosticReport memoryReport) {
        Map<String, Object> data = new HashMap<>();
        
        // 执行摘要
        data.put("report_title", "AIOps诊断报告");
        data.put("report_date", LocalDate.now().format(DateTimeFormatter.ISO_DATE));
        data.put("report_time", LocalTime.now().format(DateTimeFormatter.ofPattern("HH:mm:ss")));
        
        // 内存诊断结果
        MemoryStatistics stats = memoryReport.getStatistics();
        data.put("total_memory", formatBytes(stats.getTotalMemory()));
        data.put("used_memory", formatBytes(stats.getUsedMemory()));
        data.put("memory_usage_percent", String.format("%.1f%%", 
            (double) stats.getUsedMemory() / stats.getTotalMemory() * 100));
        
        // 诊断建议
        data.put("recommendations", memoryReport.getRecommendations());
        
        // 进程信息
        List<Map<String, Object>> processesData = new ArrayList<>();
        for (ProcessMemoryInfo proc : memoryReport.getTopProcesses()) {
            Map<String, Object> procData = new HashMap<>();
            procData.put("pid", proc.getPid());
            procData.put("user", proc.getUser());
            procData.put("memory", formatBytes(proc.getMemoryRss()));
            procData.put("command", proc.getCommand());
            processesData.add(procData);
        }
        data.put("top_processes", processesData);
        
        return data;
    }
    
    /**
     * 保存报告文件
     */
    private String saveReport(String markdownContent, String htmlContent) throws Exception {
        String timestamp = LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss"));
        String filename = "diagnostic_report_" + timestamp;
        
        File reportsDir = fileSystem.getOutputDir();
        
        // 保存Markdown
        Files.write(
            new File(reportsDir, filename + ".md").toPath(),
            markdownContent.getBytes(StandardCharsets.UTF_8)
        );
        
        // 保存HTML
        Files.write(
            new File(reportsDir, filename + ".html").toPath(),
            htmlContent.getBytes(StandardCharsets.UTF_8)
        );
        
        logger.info("报告已保存: {}", reportsDir.getAbsolutePath());
        
        return reportsDir.getAbsolutePath() + "/" + filename;
    }
    
    /**
     * 格式化字节大小
     */
    private String formatBytes(long bytes) {
        if (bytes <= 0) return "0 B";
        final String[] units = new String[]{"B", "KB", "MB", "GB", "TB"};
        int digitGroups = (int) (Math.log10(bytes) / Math.log10(1024));
        return String.format("%.1f %s", bytes / Math.pow(1024, digitGroups), units[digitGroups]);
    }
}

@Data
@Builder
class DiagnosticReport {
    private LocalDateTime timestamp;
    private String version;
    private MemoryDiagnosticReport memoryReport;
    private String markdownContent;
    private String htmlContent;
    private String reportPath;
    private String status;
    private String error;
}

6. AIOps诊断和报告生成的集成示例

java 复制代码
/**
 * AIOps诊断和报告生成的集成
 */
public class AIOpsDiagnosticSystem {
    
    private static final Logger logger = LoggerFactory.getLogger(AIOpsDiagnosticSystem.class);
    
    private final SandboxClient sandboxClient;
    private final SkillFileSystem fileSystem;
    private final ToolRegistry toolRegistry;
    private final MemoryDiagnosticSkill memoryDiagnosticSkill;
    private final DiagnosticReportGenerationSkill reportGenerationSkill;
    
    public AIOpsDiagnosticSystem(String runtimeServerHost, int runtimeServerPort) throws Exception {
        logger.info("初始化AIOps诊断系统...");
        
        // 初始化Sandbox客户端
        this.sandboxClient = new SandboxClient(runtimeServerHost, runtimeServerPort);
        
        // 初始化文件系统
        this.fileSystem = new SkillFileSystem("/tmp/skill_workspace");
        fileSystem.initializeSkillDirs("aiops_diagnostics", "diagnostic_reporting");
        
        // 初始化工具注册表
        this.toolRegistry = new ToolRegistry();
        registerCustomTools();
        
        // 初始化诊断Skill
        this.memoryDiagnosticSkill = new MemoryDiagnosticSkill(
            sandboxClient, fileSystem, toolRegistry
        );
        
        // 初始化报告生成Skill
        ReportTemplateEngine templateEngine = new ReportTemplateEngine(fileSystem);
        this.reportGenerationSkill = new DiagnosticReportGenerationSkill(
            templateEngine, memoryDiagnosticSkill, fileSystem
        );
        
        logger.info("✓ AIOps诊断系统初始化完成");
    }
    
    /**
     * 注册自定义工具
     */
    private void registerCustomTools() throws Exception {
        ToolContext context = new ToolContext();
        context.setWorkspaceRoot(fileSystem.getWorkspaceRoot());
        context.setTimeout(30000);
        
        // 注册日志解析工具
        toolRegistry.registerCustomTool("log_parser", new LogParserTool(context));
        
        // 注册指标聚合工具
        toolRegistry.registerCustomTool("metric_aggregator", new MetricAggregatorTool(context));
        
        logger.info("✓ 自定义工具注册完成");
    }
    
    /**
     * 执行完整的诊断和报告流程
     */
    public DiagnosticReport runFullDiagnostics() throws Exception {
        logger.info("\n" + "=".repeat(60));
        logger.info("开始AIOps诊断和报告生成");
        logger.info("=".repeat(60));
        
        try {
            // Step 1: 执行诊断
            logger.info("\n【Step 1】执行诊断...");
            MemoryDiagnosticReport memoryReport = memoryDiagnosticSkill.diagnose();
            logger.info("✓ 诊断完成,状态: {}", memoryReport.getStatus());
            
            // Step 2: 生成报告
            logger.info("\n【Step 2】生成报告...");
            DiagnosticReport report = reportGenerationSkill.generateFullReport();
            logger.info("✓ 报告生成完成,已保存到: {}", report.getReportPath());
            
            // Step 3: 输出总结
            logger.info("\n【Step 3】诊断总结");
            logger.info("总内存: {}", formatBytes(memoryReport.getStatistics().getTotalMemory()));
            logger.info("已用内存: {}", formatBytes(memoryReport.getStatistics().getUsedMemory()));
            logger.info("内存使用率: {:.1f}%",
                (double) memoryReport.getStatistics().getUsedMemory() / 
                memoryReport.getStatistics().getTotalMemory() * 100);
            
            logger.info("\n诊断建议:");
            for (String recommendation : memoryReport.getRecommendations()) {
                logger.info("  {}", recommendation);
            }
            
            logger.info("\n" + "=".repeat(60));
            logger.info("诊断和报告生成成功");
            logger.info("=".repeat(60));
            
            return report;
            
        } catch (Exception e) {
            logger.error("诊断失败", e);
            throw e;
        }
    }
    
    private String formatBytes(long bytes) {
        if (bytes <= 0) return "0 B";
        final String[] units = new String[]{"B", "KB", "MB", "GB", "TB"};
        int digitGroups = (int) (Math.log10(bytes) / Math.log10(1024));
        return String.format("%.1f %s", bytes / Math.pow(1024, digitGroups), units[digitGroups]);
    }
    
    public static void main(String[] args) throws Exception {
        // 连接到Runtime Server
        AIOpsDiagnosticSystem system = new AIOpsDiagnosticSystem(
            "localhost",  // Runtime Server主机
            50051         // Runtime Server端口
        );
        
        // 运行诊断
        DiagnosticReport report = system.runFullDiagnostics();
        
        // 可以进一步处理报告...
    }
}
相关推荐
带刺的坐椅4 小时前
MCP 进化:让静态 Tool 进化为具备“上下文感知”的远程 Skills
java·ai·llm·agent·solon·mcp·tool-call·skills
lyx49497 小时前
Open Interpreter + 智谱GLM-4:零基础搭建能操控电脑的 AI Agent
人工智能·agent·ai本地助手
黑客-雨7 小时前
DeepSeek-V3.2深度拆解:开源模型逆袭,GPT-5迎来劲敌!
人工智能·程序员·大模型·知识图谱·agent·大模型教程·deepseek-v3.2
羞儿10 小时前
agent应用开发-一个实例的认识与构建
知识图谱·agent·工具·规划·记忆·mcp
Ftsom19 小时前
【3】kilo 消息交互完整示例
agent·ai编程·kilo
EdisonZhou1 天前
MAF快速入门(13)常见智能体编排模式
llm·agent·.net core
juhanishen2 天前
Agent skill 大白话,从零到1,范例解析
chatgpt·llm·agent·deepseek·agent skill
Ftsom2 天前
【2】kilo 消息流转过程
ai·agent·ai编程·kilo
金融RPA机器人丨实在智能2 天前
复杂业务选弘玑还是实在 Agent?2026 超自动化决策指南
agent·rpa·实在智能·实在agent
致Great2 天前
智能体效率优化最新综述:从Token狂飙到成本可控的进化之路
agent·智能体