阿里开源AgentScope多智能体框架解析系列(十七)第17章:Skill系统与Runtime Sandbox集成 - AIOps实践

导读

本章讲解Skill系统与AgentScope Runtime Sandbox的集成 ,通过真实的AIOps场景展示如何构建执行脚本、访问文件系统、调用系统工具的Skill系统

学习目标

  • 理解Runtime Sandbox的架构设计
  • 掌握Skill与文件系统的关联机制
  • 学会实现脚本执行类Skill
  • 掌握系统工具(grep、cat等)的集成
  • 了解自定义工具的扩展机制
  • 实现AIOps场景:故障诊断 + 报告生成

sandbox: github.com/agentscope-...


1.1 What - 整体架构

背景问题

scss 复制代码
Skill需要的能力:
├─ 执行Shell脚本 (故障诊断、日志分析)
├─ 调用系统工具 (grep、awk、cat等)
├─ 读取文件系统 (日志、配置、数据)
├─ 管理独立环境 (隔离、安全、可复现)
└─ 跨应用通信 (Agent Runtime as Server)

传统方式的问题:
❌ 安全隐患:直接执行系统命令
❌ 环境污染:影响主应用环境
❌ 难以管理:文件系统混乱
❌ 难以扩展:工具集成繁琐

解决方案:Runtime Sandbox

scss 复制代码
┌─────────────────────────────────────────────────────────────┐
│                    AgentScope Application                   │
│  ┌─────────────────────────────────────────────────────────┐│
│  │                  Skill Management System                ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ││
│  │  │ Diagnosis    │  │ Report Gen   │  │ Custom Tools │  ││
│  │  │ Skill        │  │ Skill        │  │ Skill        │  ││
│  │  └───────┬──────┘  └───────┬──────┘  └───────┬──────┘  ││
│  │          │                │                │            ││
│  │          └────────────────┼────────────────┘            ││
│  │                           │                            ││
│  │                  Runtime Client (gRPC)                  ││
│  └───────────────────────────┼──────────────────────────────┘
│                              │
│                    ┌─────────▼──────────┐
│                    │  Runtime Server    │
│                    │ (Sandbox Service)  │
│                    └─────────┬──────────┘
│                              │
└──────────────────────────────┼──────────────────────────────
                               │
        Network (可跨服务器)
                               │
        ┌──────────────────────▼──────────────────────┐
        │  Isolated Sandbox Environment               │
        │  ┌────────────────────────────────────────┐ │
        │  │ File System (/tmp/skill_workspace)     │ │
        │  │ ├─ /diagnostics/                       │ │
        │  │ │  ├─ scripts/                         │ │
        │  │ │  │  ├─ check_cpu.sh                 │ │
        │  │ │  │  ├─ check_memory.sh              │ │
        │  │ │  │  └─ analyze_logs.sh              │ │
        │  │ │  ├─ reports/                        │ │
        │  │ │  │  └─ diagnosis_report.md          │ │
        │  │ │  └─ tools/                          │ │
        │  │ └─ /tools/                            │ │
        │  │    ├─ grep, cat, awk (系统工具)       │ │
        │  │    ├─ custom_analyzer (自定义工具)   │ │
        │  │    └─ ...                             │ │
        │  └────────────────────────────────────────┘ │
        │                                              │
        │  Execution Engine                            │
        │  ├─ Script Executor                          │
        │  ├─ Tool Invoker                             │
        │  └─ Resource Monitor                         │
        └──────────────────────────────────────────────┘

24.1.2 Why - 为什么需要这样的集成

1. 安全隔离

bash 复制代码
没有Sandbox:
Agent运行脚本 → 直接执行系统命令 → 可能删除/var、/etc等关键文件 ❌

使用Sandbox:
Agent运行脚本 → Sandbox环境中执行 → 隔离的文件系统 ✅
                                  → 脚本出错不影响主机系统

2. 文件系统组织

scss 复制代码
Skill的文件组织问题:
- Skill A的脚本在/opt/scripts/a.sh
- Skill B的脚本在/opt/scripts/b.sh
- 报告模板在/var/templates/report.md
→ 难以管理、版本控制困难、容易冲突

解决方案(Sandbox):
/tmp/skill_workspace/
├─ skill_diagnosis/
│  ├─ scripts/        (诊断脚本)
│  ├─ tools/          (诊断工具)
│  ├─ cache/          (诊断缓存)
│  └─ reports/        (诊断报告)
├─ skill_reporting/
│  ├─ templates/      (报告模板)
│  ├─ tools/          (报告工具)
│  └─ outputs/        (生成的报告)
→ 清晰、易于版本控制、支持容器化

3. 工具管理

diff 复制代码
传统方式:
- 系统工具(grep、cat、awk)直接访问
- 自定义工具需要安装到系统目录
- 难以更新和回滚

Sandbox方式:
- Sandbox内包含必要的系统工具
- 自定义工具动态挂载或编译到Sandbox
- 工具版本与Skill绑定,易于管理

4. Agent Runtime as Server

scss 复制代码
分布式部署场景:
┌──────────────┐         ┌──────────────┐
│  Agent App 1 │         │  Agent App 2 │
│   (Service A)│         │   (Service B)│
└───────┬──────┘         └───────┬──────┘
        │                        │
        └────────┬───────────────┘
                 │
         gRPC/HTTP (跨网络)
                 │
        ┌────────▼────────┐
        │ Runtime Server  │
        │   (Shared)      │
        └─────────────────┘

优势:
✅ 多个应用共享一个Runtime Server
✅ 集中管理Sandbox资源
✅ 文件系统统一管理
✅ 成本降低

1.3 Runtime Sandbox架构详解

java 复制代码
/**
 * Runtime Sandbox核心架构
 */

// 1. Sandbox配置
public class SandboxConfig {
    private String sandboxId;           // 沙箱唯一ID
    private String workspaceRoot;       // 工作目录 /tmp/skill_workspace
    private long maxMemory;             // 最大内存 512MB
    private long maxCpuTime;            // 最大CPU时间 30s
    private List<String> allowedTools;  // 允许的工具列表
    private Map<String, String> env;    // 环境变量
}

// 2. 文件系统隔离
public class IsolatedFileSystem {
    private File workspaceRoot;
    private Map<String, SkillFileSystemLayout> skillLayouts;
    
    // 每个Skill有独立的文件结构
    public static class SkillFileSystemLayout {
        public File scriptsDir;      // /workspace/skill_name/scripts/
        public File toolsDir;        // /workspace/skill_name/tools/
        public File cacheDir;        // /workspace/skill_name/cache/
        public File outputDir;       // /workspace/skill_name/output/
        public File configDir;       // /workspace/skill_name/config/
    }
}

// 3. 脚本执行器
public class ScriptExecutor {
    public ExecutionResult execute(
        String sandboxId,
        String scriptPath,
        Map<String, String> env,
        long timeout
    ) {
        // 在Sandbox中执行脚本
        // 返回:exitCode, stdout, stderr, executionTime
    }
}

// 4. 工具注册表
public class ToolRegistry {
    private Map<String, ToolDefinition> tools;
    
    // 系统工具:grep, cat, awk, sed, tail等
    // 自定义工具:custom_analyzer, log_parser等
    
    public ToolDefinition registerTool(String name, String path, String version) {
        // 注册工具到Sandbox
    }
}

// 5. 资源监控
public class ResourceMonitor {
    public void monitorExecution(
        String sandboxId,
        long timeout,
        long maxMemory
    ) {
        // 监控CPU、内存、磁盘等资源
        // 超过限制时自动杀死进程
    }
}

2. Skill的文件系统组织

2.1 标准的Skill文件结构

bash 复制代码
/tmp/skill_workspace/
│
├─ skill_aiops_diagnostics/           # AIOps诊断Skill
│  ├─ skill_manifest.yaml              # Skill元数据
│  ├─ scripts/                         # 诊断脚本
│  │  ├─ cpu_diagnostics.sh           # CPU诊断
│  │  ├─ memory_diagnostics.sh         # 内存诊断
│  │  ├─ disk_diagnostics.sh           # 磁盘诊断
│  │  ├─ network_diagnostics.sh        # 网络诊断
│  │  ├─ log_analysis.sh               # 日志分析
│  │  └─ correlation_analysis.py       # 故障关联分析
│  │
│  ├─ tools/                           # 工具集
│  │  ├─ system/                       # 系统工具
│  │  │  ├─ grep                       # 文本搜索
│  │  │  ├─ awk                        # 文本处理
│  │  │  ├─ sed                        # 流编辑器
│  │  │  └─ tail                       # 文件尾部
│  │  │
│  │  └─ custom/                       # 自定义工具
│  │     ├─ log_parser                 # 日志解析器
│  │     ├─ metric_aggregator          # 指标聚合器
│  │     └─ anomaly_detector            # 异常检测器
│  │
│  ├─ cache/                           # 缓存目录
│  │  ├─ metrics_cache/                # 指标缓存
│  │  └─ analysis_cache/               # 分析缓存
│  │
│  ├─ reports/                         # 报告输出
│  │  ├─ diagnosis_reports/            # 诊断报告
│  │  └─ analysis_results/             # 分析结果
│  │
│  └─ config/                          # 配置文件
│     ├─ diagnostic_rules.yaml          # 诊断规则
│     ├─ thresholds.yaml                # 告警阈值
│     └─ tool_config.yaml               # 工具配置
│
├─ skill_report_generation/             # 报告生成Skill
│  ├─ skill_manifest.yaml
│  ├─ templates/                       # 报告模板
│  │  ├─ executive_summary.md          # 执行摘要模板
│  │  ├─ detailed_analysis.md          # 详细分析模板
│  │  ├─ recommendations.md            # 建议模板
│  │  └─ styles/
│  │     └─ markdown.css               # 样式
│  │
│  ├─ generators/                      # 报告生成器
│  │  ├─ pdf_generator.py              # PDF生成
│  │  ├─ html_generator.py             # HTML生成
│  │  └─ markdown_generator.py          # Markdown生成
│  │
│  ├─ outputs/                         # 生成的报告
│  │  ├─ diagnosis_report_2025_01_04.pdf
│  │  ├─ diagnosis_report_2025_01_04.html
│  │  └─ diagnosis_report_2025_01_04.md
│  │
│  └─ config/
│     ├─ template_config.yaml           # 模板配置
│     └─ style_config.yaml              # 样式配置
│
└─ shared_tools/                        # 共享工具
   ├─ log_processor                     # 日志处理工具
   ├─ data_analyzer                     # 数据分析工具
   └─ report_formatter                  # 报告格式化工具

2.2 Skill Manifest定义

yaml 复制代码
# skill_manifest.yaml - Skill元数据

apiVersion: v1
kind: SkillManifest
metadata:
  name: aiops_diagnostics
  version: "1.0.0"
  description: "AIOps故障诊断Skill"
  author: "Platform Team"
  lastModified: "2025-01-04"

spec:
  # Skill基本信息
  skillType: "diagnostic"  # diagnostic / reporting / analysis
  
  # 文件系统需求
  filesystem:
    workspaceSize: "1GB"
    persistentDirs:
      - cache/
      - config/
    tempDirs:
      - reports/
  
  # 脚本定义
  scripts:
    - name: "cpu_diagnostics"
      path: "scripts/cpu_diagnostics.sh"
      language: "bash"
      timeout: "30s"
      inputs:
        - name: "duration"
          type: "integer"
          description: "诊断时长(秒)"
      outputs:
        - name: "cpu_report"
          type: "json"
          description: "CPU诊断结果"
    
    - name: "log_analysis"
      path: "scripts/log_analysis.sh"
      language: "bash"
      timeout: "60s"
      inputs:
        - name: "log_file"
          type: "string"
          description: "日志文件路径"
        - name: "keywords"
          type: "array"
          description: "关键词列表"
      outputs:
        - name: "analysis_result"
          type: "json"
          description: "分析结果"
    
    - name: "correlation_analysis"
      path: "scripts/correlation_analysis.py"
      language: "python"
      timeout: "120s"
      inputs:
        - name: "metrics_data"
          type: "json"
          description: "指标数据"
      outputs:
        - name: "correlation_report"
          type: "json"
          description: "关联分析报告"
  
  # 工具依赖
  tools:
    system:
      - name: "grep"
        version: "*"
        required: true
      - name: "awk"
        version: "*"
        required: true
      - name: "sed"
        version: "*"
        required: false
      - name: "tail"
        version: "*"
        required: true
    
    custom:
      - name: "log_parser"
        path: "tools/custom/log_parser"
        version: "1.0.0"
        required: true
      - name: "metric_aggregator"
        path: "tools/custom/metric_aggregator"
        version: "1.0.0"
        required: true
  
  # 环境变量
  environment:
    JAVA_HOME: "/usr/lib/jvm/java-17"
    LOG_LEVEL: "INFO"
    CACHE_DIR: "${WORKSPACE}/cache"
    REPORT_DIR: "${WORKSPACE}/reports"
  
  # 资源限制
  resources:
    cpu: "2"           # 最多2个核心
    memory: "512Mi"     # 最多512MB
    timeout: "300s"     # 最多5分钟
    diskSpace: "1Gi"    # 最多1GB磁盘
  
  # 安全权限
  security:
    readPaths:
      - "/var/log/"
      - "/proc/stat"
      - "/proc/meminfo"
    writePaths:
      - "${WORKSPACE}/cache/"
      - "${WORKSPACE}/reports/"
    networkAccess: false  # 不允许网络访问
    executablePaths:
      - "scripts/"
      - "tools/"

3. AIOps故障诊断Skill实现

3.1 Skill基础框架

java 复制代码
/**
 * AIOps故障诊断Skill基类
 */
public abstract class AIOpsDiagnosticSkill extends AgentSkill {
    
    protected static final Logger logger = LoggerFactory.getLogger(AIOpsDiagnosticSkill.class);
    
    // Runtime Sandbox客户端
    protected final SandboxClient sandboxClient;
    
    // Skill文件系统
    protected final SkillFileSystem fileSystem;
    
    // 工具注册表
    protected final ToolRegistry toolRegistry;
    
    // 诊断规则
    protected final DiagnosticRuleEngine ruleEngine;
    
    public AIOpsDiagnosticSkill(
            String skillName,
            SandboxClient sandboxClient,
            SkillFileSystem fileSystem,
            ToolRegistry toolRegistry) {
        super(skillName);
        this.sandboxClient = sandboxClient;
        this.fileSystem = fileSystem;
        this.toolRegistry = toolRegistry;
        this.ruleEngine = new DiagnosticRuleEngine(this.fileSystem);
    }
    
    /**
     * 执行诊断脚本
     */
    protected ExecutionResult executeScript(
            String scriptName,
            Map<String, String> inputs,
            long timeout) throws Exception {
        
        logger.info("执行诊断脚本: {}", scriptName);
        
        // 获取脚本路径
        String scriptPath = fileSystem.getScriptPath(scriptName);
        
        // 构建脚本命令
        String command = buildCommand(scriptPath, inputs);
        
        // 在Sandbox中执行
        ExecutionResult result = sandboxClient.executeScript(
            getSandboxId(),
            command,
            timeout,
            getEnvironmentVariables()
        );
        
        logger.info("脚本执行完成: exitCode={}, duration={}ms",
            result.getExitCode(), result.getExecutionTime());
        
        return result;
    }
    
    /**
     * 调用系统工具
     */
    protected String invokeTool(
            String toolName,
            List<String> args) throws Exception {
        
        logger.debug("调用工具: {} with args: {}", toolName, args);
        
        ToolDefinition tool = toolRegistry.getTool(toolName);
        if (tool == null) {
            throw new ToolNotFoundException(toolName);
        }
        
        // 构建完整命令
        List<String> command = new ArrayList<>();
        command.add(tool.getPath());
        command.addAll(args);
        
        // 执行命令
        ExecutionResult result = sandboxClient.execute(
            getSandboxId(),
            String.join(" ", command),
            30000  // 30秒超时
        );
        
        if (result.getExitCode() != 0) {
            logger.error("工具执行失败: {}", result.getStderr());
            throw new ToolExecutionException(toolName, result.getStderr());
        }
        
        return result.getStdout();
    }
    
    /**
     * 读取文件
     */
    protected String readFile(String filePath) throws Exception {
        String fullPath = fileSystem.resolvePath(filePath);
        
        // 使用cat工具读取文件
        return invokeTool("cat", List.of(fullPath));
    }
    
    /**
     * 分析日志文件
     */
    protected LogAnalysisResult analyzeLog(
            String logFilePath,
            List<String> keywords) throws Exception {
        
        logger.info("分析日志文件: {}", logFilePath);
        
        // 执行日志分析脚本
        Map<String, String> inputs = Map.of(
            "log_file", logFilePath,
            "keywords", String.join(",", keywords)
        );
        
        ExecutionResult result = executeScript("log_analysis", inputs, 60000);
        
        // 解析结果
        return parseAnalysisResult(result.getStdout());
    }
    
    protected abstract String buildCommand(String scriptPath, Map<String, String> inputs);
    protected abstract Map<String, String> getEnvironmentVariables();
    protected abstract String getSandboxId();
    protected abstract LogAnalysisResult parseAnalysisResult(String output);
}

/**
 * 执行结果
 */
@Data
@Builder
class ExecutionResult {
    private String sandboxId;
    private int exitCode;
    private String stdout;
    private String stderr;
    private long executionTime;
    private Map<String, String> metrics;  // CPU、内存等
}

/**
 * 日志分析结果
 */
@Data
@Builder
class LogAnalysisResult {
    private List<LogEntry> errorLines;
    private List<LogEntry> warningLines;
    private Map<String, Integer> keywordCounts;
    private String summary;
    private LocalDateTime analysisTime;
}

@Data
@Builder
class LogEntry {
    private int lineNumber;
    private String timestamp;
    private String level;
    private String message;
    private String context;
}

3.2 CPU诊断脚本

bash 复制代码
#!/bin/bash
# scripts/cpu_diagnostics.sh - CPU诊断脚本

set -e

DURATION=${1:-30}
OUTPUT_FILE="/workspace/reports/cpu_diagnostics_$(date +%s).json"

echo "开始CPU诊断(${DURATION}秒)..."

# 创建JSON报告
{
    echo "{"
    echo "  \"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\","
    echo "  \"diagnostics\": {"
    
    # 1. 获取CPU核心数
    CPU_CORES=$(grep -c "^processor" /proc/cpuinfo)
    echo "    \"cpu_cores\": $CPU_CORES,"
    
    # 2. 获取CPU型号
    CPU_MODEL=$(grep "^model name" /proc/cpuinfo | head -1 | cut -d: -f2 | xargs)
    echo "    \"cpu_model\": \"$CPU_MODEL\","
    
    # 3. 获取当前CPU使用率
    echo "    \"current_usage\": {"
    
    # 使用top命令获取平均负载
    IFS=' ' read -r LA1 LA5 LA15 < /proc/loadavg
    echo "      \"load_average\": {"
    echo "        \"1m\": $LA1,"
    echo "        \"5m\": $LA5,"
    echo "        \"15m\": $LA15"
    echo "      },"
    
    # 获取每个CPU核的使用率
    echo "      \"per_cpu\": ["
    
    FIRST=true
    while IFS= read -r line; do
        if [[ $line == cpu[0-9]* ]]; then
            if [ "$FIRST" = false ]; then echo ","; fi
            FIRST=false
            
            # 解析CPU统计信息
            FIELDS=($line)
            USER=${FIELDS[1]}
            NICE=${FIELDS[2]}
            SYSTEM=${FIELDS[3]}
            IDLE=${FIELDS[4]}
            
            TOTAL=$((USER + NICE + SYSTEM + IDLE))
            USAGE=$((100 * (TOTAL - IDLE) / TOTAL))
            
            echo "        {"
            echo "          \"cpu\": \"${FIELDS[0]}\","
            echo "          \"usage_percent\": $USAGE"
            echo "        }" | tr -d '\n'
        fi
    done < /proc/stat
    
    echo ""
    echo "      ]"
    echo "    },"
    
    # 4. 诊断高CPU使用的进程
    echo "    \"top_processes\": ["
    
    FIRST=true
    ps aux --sort=-%cpu | tail -n +2 | head -5 | while read -r line; do
        if [ "$FIRST" = false ]; then echo ","; fi
        FIRST=false
        
        IFS=' ' read -r USER PID PCPU PMEM VSZ RSS TTY STAT START TIME CMD <<< "$line"
        
        echo "        {"
        echo "          \"pid\": $PID,"
        echo "          \"user\": \"$USER\","
        echo "          \"cpu_percent\": $PCPU,"
        echo "          \"memory_percent\": $PMEM,"
        echo "          \"command\": \"$(echo $CMD | cut -d' ' -f1)\""
        echo "        }" | tr -d '\n'
    done
    
    echo ""
    echo "      ]"
    echo "    }"
    echo "  },"
    
    # 5. 诊断建议
    echo "  \"recommendations\": ["
    
    if (( $(echo "$LA1 > $CPU_CORES" | bc -l) )); then
        echo "    \"警告:1分钟平均负载高于CPU核心数,系统可能过载\","
    fi
    
    if (( $(echo "$LA5 > $CPU_CORES * 0.8" | bc -l) )); then
        echo "    \"提示:5分钟平均负载较高,建议检查后台任务\","
    fi
    
    echo "    \"建议定期监控CPU使用率\"" 
    echo "  ]"
    echo "}"
    
} > "$OUTPUT_FILE"

# 输出结果路径
echo "诊断完成,结果保存到: $OUTPUT_FILE"
cat "$OUTPUT_FILE"

3.3 内存诊断Skill实现

java 复制代码
/**
 * 内存诊断Skill
 */
public class MemoryDiagnosticSkill extends AIOpsDiagnosticSkill {
    
    private static final Logger logger = LoggerFactory.getLogger(MemoryDiagnosticSkill.class);
    
    private final MetricAggregator metricAggregator;
    
    public MemoryDiagnosticSkill(
            SandboxClient sandboxClient,
            SkillFileSystem fileSystem,
            ToolRegistry toolRegistry) {
        super("memory_diagnostic", sandboxClient, fileSystem, toolRegistry);
        this.metricAggregator = new MetricAggregator();
    }
    
    /**
     * 执行内存诊断
     */
    public MemoryDiagnosticReport diagnose() throws Exception {
        logger.info("开始内存诊断...");
        
        MemoryDiagnosticReport.Builder reportBuilder = MemoryDiagnosticReport.builder()
            .timestamp(LocalDateTime.now());
        
        try {
            // Step 1: 获取内存统计
            MemoryStatistics stats = getMemoryStatistics();
            reportBuilder.statistics(stats);
            logger.info("内存统计: 总内存={}MB, 已用={}MB, 可用={}MB",
                stats.getTotalMemory() / 1024 / 1024,
                stats.getUsedMemory() / 1024 / 1024,
                stats.getAvailableMemory() / 1024 / 1024);
            
            // Step 2: 分析内存使用趋势
            List<MemoryMetric> trends = analyzeMemoryTrends();
            reportBuilder.trends(trends);
            
            // Step 3: 识别内存泄漏
            MemoryLeakDetection leakDetection = detectMemoryLeaks();
            reportBuilder.leakDetection(leakDetection);
            
            // Step 4: 诊断高内存进程
            List<ProcessMemoryInfo> topProcesses = getTopMemoryConsumers(5);
            reportBuilder.topProcesses(topProcesses);
            
            // Step 5: 生成建议
            List<String> recommendations = generateRecommendations(stats, leakDetection);
            reportBuilder.recommendations(recommendations);
            
            reportBuilder.status("SUCCESS");
            
        } catch (Exception e) {
            logger.error("内存诊断失败", e);
            reportBuilder.status("FAILED").error(e.getMessage());
        }
        
        return reportBuilder.build();
    }
    
    /**
     * 获取内存统计信息
     */
    private MemoryStatistics getMemoryStatistics() throws Exception {
        // 调用系统工具获取内存信息
        String meminfo = readFile("/proc/meminfo");
        
        // 解析/proc/meminfo
        Map<String, Long> memStats = new HashMap<>();
        for (String line : meminfo.split("\n")) {
            if (line.trim().isEmpty()) continue;
            
            // 格式: MemTotal:        16334732 kB
            String[] parts = line.split(":");
            if (parts.length == 2) {
                String key = parts[0].trim();
                String value = parts[1].trim().split("\\s+")[0];
                memStats.put(key, Long.parseLong(value) * 1024); // 转换为字节
            }
        }
        
        return MemoryStatistics.builder()
            .totalMemory(memStats.getOrDefault("MemTotal", 0L))
            .usedMemory(memStats.getOrDefault("MemTotal", 0L) - 
                       memStats.getOrDefault("MemAvailable", 0L))
            .availableMemory(memStats.getOrDefault("MemAvailable", 0L))
            .buffers(memStats.getOrDefault("Buffers", 0L))
            .cached(memStats.getOrDefault("Cached", 0L))
            .swapTotal(memStats.getOrDefault("SwapTotal", 0L))
            .swapUsed(memStats.getOrDefault("SwapTotal", 0L) - 
                     memStats.getOrDefault("SwapFree", 0L))
            .timestamp(LocalDateTime.now())
            .build();
    }
    
    /**
     * 分析内存使用趋势
     */
    private List<MemoryMetric> analyzeMemoryTrends() throws Exception {
        logger.info("分析内存使用趋势...");
        
        // 获取缓存中的历史数据
        File cacheDir = fileSystem.getCacheDir();
        List<MemoryMetric> trends = new ArrayList<>();
        
        // 读取历史指标(JSON格式)
        File[] cacheFiles = cacheDir.listFiles((dir, name) -> name.startsWith("memory_") && name.endsWith(".json"));
        
        if (cacheFiles != null) {
            Arrays.sort(cacheFiles, Comparator.comparingLong(File::lastModified));
            
            for (File cacheFile : cacheFiles) {
                MemoryMetric metric = parseMemoryMetric(readFile(cacheFile.getAbsolutePath()));
                trends.add(metric);
            }
        }
        
        // 保存当前指标到缓存
        MemoryMetric currentMetric = MemoryMetric.builder()
            .timestamp(LocalDateTime.now())
            .memoryUsagePercent(getCurrentMemoryUsagePercent())
            .build();
        
        String cacheFile = cacheDir.getAbsolutePath() + "/memory_" + System.currentTimeMillis() + ".json";
        saveMetricToCache(cacheFile, currentMetric);
        
        trends.add(currentMetric);
        return trends;
    }
    
    /**
     * 检测内存泄漏
     */
    private MemoryLeakDetection detectMemoryLeaks() throws Exception {
        logger.info("检测内存泄漏...");
        
        // 执行自定义的内存泄漏检测工具
        String result = invokeTool("memory_leak_detector", List.of(
            "--duration", "60",
            "--threshold", "80"
        ));
        
        // 解析结果
        return parseLeakDetectionResult(result);
    }
    
    /**
     * 获取内存使用最高的进程
     */
    private List<ProcessMemoryInfo> getTopMemoryConsumers(int topN) throws Exception {
        // 使用grep和awk处理ps输出
        String psOutput = invokeTool("ps", Arrays.asList(
            "aux", "--sort=-rss"
        ));
        
        List<ProcessMemoryInfo> processes = new ArrayList<>();
        
        String[] lines = psOutput.split("\n");
        for (int i = 1; i < Math.min(lines.length, topN + 1); i++) {
            String[] fields = lines[i].trim().split("\\s+");
            
            ProcessMemoryInfo info = ProcessMemoryInfo.builder()
                .pid(Integer.parseInt(fields[1]))
                .user(fields[0])
                .memoryPercent(Double.parseDouble(fields[3]))
                .memoryRss(Long.parseLong(fields[5]) * 1024) // 转换为字节
                .command(fields[10])
                .build();
            
            processes.add(info);
        }
        
        return processes;
    }
    
    /**
     * 生成诊断建议
     */
    private List<String> generateRecommendations(
            MemoryStatistics stats,
            MemoryLeakDetection leakDetection) {
        
        List<String> recommendations = new ArrayList<>();
        
        double usagePercent = (double) stats.getUsedMemory() / stats.getTotalMemory() * 100;
        
        if (usagePercent > 90) {
            recommendations.add("⚠️  严重警告:内存使用率超过90%,系统可能面临OOM风险");
            recommendations.add("   建议:立即释放不必要的内存,考虑增加物理内存");
        } else if (usagePercent > 80) {
            recommendations.add("⚠️  警告:内存使用率超过80%,性能可能下降");
            recommendations.add("   建议:监控内存使用情况,识别大内存应用");
        } else if (usagePercent > 70) {
            recommendations.add("📌 提示:内存使用率超过70%,建议定期清理");
        }
        
        if (leakDetection.isDetected()) {
            recommendations.add("⚠️  检测到可能的内存泄漏:" + leakDetection.getDescription());
            recommendations.add("   建议:分析相关应用的日志和堆转储");
        }
        
        if (stats.getSwapUsed() > 0) {
            double swapPercent = (double) stats.getSwapUsed() / stats.getSwapTotal() * 100;
            if (swapPercent > 50) {
                recommendations.add("⚠️  警告:Swap使用率过高(" + String.format("%.1f", swapPercent) + "%),性能严重下降");
                recommendations.add("   建议:检查是否存在内存泄漏或不合理的内存占用");
            }
        }
        
        recommendations.add("💡 建议:定期运行内存诊断,及时发现潜在问题");
        
        return recommendations;
    }
    
    @Override
    protected String buildCommand(String scriptPath, Map<String, String> inputs) {
        return scriptPath;  // 直接执行脚本
    }
    
    @Override
    protected Map<String, String> getEnvironmentVariables() {
        return Map.of(
            "JAVA_HOME", "/usr/lib/jvm/java-17",
            "LOG_LEVEL", "INFO"
        );
    }
    
    @Override
    protected String getSandboxId() {
        return "aiops_sandbox_memory";
    }
    
    @Override
    protected LogAnalysisResult parseAnalysisResult(String output) {
        // 实现日志分析结果解析
        return null;
    }
    
    // 辅助方法...
    private double getCurrentMemoryUsagePercent() throws Exception {
        MemoryStatistics stats = getMemoryStatistics();
        return (double) stats.getUsedMemory() / stats.getTotalMemory() * 100;
    }
    
    private MemoryMetric parseMemoryMetric(String json) {
        // 解析JSON格式的内存指标
        return MemoryMetric.builder().build();
    }
    
    private void saveMetricToCache(String path, MemoryMetric metric) throws Exception {
        // 保存指标到缓存
    }
    
    private MemoryLeakDetection parseLeakDetectionResult(String result) {
        return MemoryLeakDetection.builder()
            .detected(result.contains("leak"))
            .description(result)
            .build();
    }
}

// 数据模型
@Data
@Builder
class MemoryDiagnosticReport {
    private LocalDateTime timestamp;
    private MemoryStatistics statistics;
    private List<MemoryMetric> trends;
    private MemoryLeakDetection leakDetection;
    private List<ProcessMemoryInfo> topProcesses;
    private List<String> recommendations;
    private String status;
    private String error;
}

@Data
@Builder
class MemoryStatistics {
    private long totalMemory;
    private long usedMemory;
    private long availableMemory;
    private long buffers;
    private long cached;
    private long swapTotal;
    private long swapUsed;
    private LocalDateTime timestamp;
}

@Data
@Builder
class MemoryMetric {
    private LocalDateTime timestamp;
    private double memoryUsagePercent;
}

@Data
@Builder
class MemoryLeakDetection {
    private boolean detected;
    private String description;
}

@Data
@Builder
class ProcessMemoryInfo {
    private int pid;
    private String user;
    private double memoryPercent;
    private long memoryRss;
    private String command;
}

4. 自定义工具的开发与集成

4.1 自定义工具框架

java 复制代码
/**
 * 自定义工具基类
 */
public abstract class CustomTool {
    
    protected final Logger logger = LoggerFactory.getLogger(getClass());
    protected final ToolContext context;
    
    public CustomTool(ToolContext context) {
        this.context = context;
    }
    
    /**
     * 工具元数据
     */
    public abstract ToolMetadata getMetadata();
    
    /**
     * 执行工具
     */
    public abstract ToolResult execute(Map<String, String> args) throws Exception;
    
    /**
     * 工具元数据
     */
    @Data
    @Builder
    public static class ToolMetadata {
        private String name;
        private String version;
        private String description;
        private List<ParameterSpec> parameters;
        private String outputFormat;  // json / text / binary
    }
    
    @Data
    @Builder
    public static class ParameterSpec {
        private String name;
        private String type;  // string / int / boolean / array
        private String description;
        private boolean required;
        private String defaultValue;
    }
    
    @Data
    @Builder
    public static class ToolResult {
        private int exitCode;
        private String output;
        private String error;
        private Map<String, Object> metrics;  // 执行指标
    }
}

/**
 * 工具上下文
 */
@Data
public class ToolContext {
    private String toolId;
    private String sandboxId;
    private File workspaceRoot;
    private Map<String, String> environment;
    private long timeout;
}

/**
 * 日志解析工具
 */
public class LogParserTool extends CustomTool {
    
    public LogParserTool(ToolContext context) {
        super(context);
    }
    
    @Override
    public ToolMetadata getMetadata() {
        return ToolMetadata.builder()
            .name("log_parser")
            .version("1.0.0")
            .description("高性能日志解析和分析工具")
            .parameters(Arrays.asList(
                ParameterSpec.builder()
                    .name("log_file")
                    .type("string")
                    .description("日志文件路径")
                    .required(true)
                    .build(),
                ParameterSpec.builder()
                    .name("pattern")
                    .type("string")
                    .description("匹配模式(正则表达式)")
                    .required(false)
                    .build(),
                ParameterSpec.builder()
                    .name("level")
                    .type("string")
                    .description("日志级别(ERROR/WARN/INFO)")
                    .required(false)
                    .defaultValue("*")
                    .build(),
                ParameterSpec.builder()
                    .name("max_lines")
                    .type("int")
                    .description("最多返回行数")
                    .required(false)
                    .defaultValue("1000")
                    .build()
            ))
            .outputFormat("json")
            .build();
    }
    
    @Override
    public ToolResult execute(Map<String, String> args) throws Exception {
        logger.info("执行日志解析: file={}", args.get("log_file"));
        
        String logFile = args.get("log_file");
        String pattern = args.getOrDefault("pattern", ".*");
        String level = args.getOrDefault("level", "*");
        int maxLines = Integer.parseInt(args.getOrDefault("max_lines", "1000"));
        
        // 读取日志文件
        List<String> lines = readLogFile(logFile);
        
        // 解析日志
        List<LogEntry> entries = parseLogLines(lines, pattern, level, maxLines);
        
        // 统计信息
        Map<String, Integer> statistics = computeStatistics(entries);
        
        // 生成JSON输出
        String output = generateJsonOutput(entries, statistics);
        
        return ToolResult.builder()
            .exitCode(0)
            .output(output)
            .metrics(Map.of("lines_processed", lines.size(), "entries_matched", entries.size()))
            .build();
    }
    
    private List<String> readLogFile(String filePath) throws Exception {
        Path path = Paths.get(filePath);
        return Files.readAllLines(path);
    }
    
    private List<LogEntry> parseLogLines(
            List<String> lines,
            String pattern,
            String level,
            int maxLines) throws Exception {
        
        List<LogEntry> entries = new ArrayList<>();
        Pattern regex = Pattern.compile(pattern);
        
        for (String line : lines) {
            if (entries.size() >= maxLines) break;
            
            Matcher matcher = regex.matcher(line);
            if (matcher.find()) {
                // 解析日志格式(支持常见格式)
                LogEntry entry = parseSingleLogEntry(line);
                
                if ("*".equals(level) || line.contains(level)) {
                    entries.add(entry);
                }
            }
        }
        
        return entries;
    }
    
    private LogEntry parseSingleLogEntry(String line) {
        // 简单的日志解析实现
        // 支持格式: [2025-01-04 10:30:45] ERROR [com.example.Service] Message
        
        String[] parts = line.split("\\]");
        String timestamp = parts.length > 0 ? parts[0].substring(1) : "";
        String levelAndMessage = parts.length > 1 ? parts[1].trim() : "";
        
        return LogEntry.builder()
            .timestamp(timestamp)
            .message(levelAndMessage)
            .build();
    }
    
    private Map<String, Integer> computeStatistics(List<LogEntry> entries) {
        Map<String, Integer> stats = new HashMap<>();
        stats.put("total_entries", entries.size());
        
        // 统计各级别日志
        entries.stream()
            .filter(e -> e.getMessage().contains("ERROR"))
            .count();
        
        return stats;
    }
    
    private String generateJsonOutput(List<LogEntry> entries, Map<String, Integer> stats) {
        // 返回JSON格式的分析结果
        return new ObjectMapper().writeValueAsString(Map.of(
            "entries", entries,
            "statistics", stats
        ));
    }
}

/**
 * 指标聚合工具
 */
public class MetricAggregatorTool extends CustomTool {
    
    public MetricAggregatorTool(ToolContext context) {
        super(context);
    }
    
    @Override
    public ToolMetadata getMetadata() {
        return ToolMetadata.builder()
            .name("metric_aggregator")
            .version("1.0.0")
            .description("系统指标聚合和统计工具")
            .parameters(Arrays.asList(
                ParameterSpec.builder()
                    .name("metric_file")
                    .type("string")
                    .description("指标文件路径")
                    .required(true)
                    .build(),
                ParameterSpec.builder()
                    .name("aggregate_type")
                    .type("string")
                    .description("聚合类型(sum/avg/min/max/percentile)")
                    .required(true)
                    .build()
            ))
            .outputFormat("json")
            .build();
    }
    
    @Override
    public ToolResult execute(Map<String, String> args) throws Exception {
        String metricFile = args.get("metric_file");
        String aggregateType = args.get("aggregate_type");
        
        logger.info("聚合指标: file={}, type={}", metricFile, aggregateType);
        
        // 读取指标文件
        List<Double> metrics = readMetricFile(metricFile);
        
        // 执行聚合
        Map<String, Double> result = aggregate(metrics, aggregateType);
        
        String output = new ObjectMapper().writeValueAsString(result);
        
        return ToolResult.builder()
            .exitCode(0)
            .output(output)
            .metrics(Map.of("metrics_count", metrics.size()))
            .build();
    }
    
    private List<Double> readMetricFile(String filePath) throws Exception {
        // 读取指标文件
        return new ArrayList<>();
    }
    
    private Map<String, Double> aggregate(List<Double> metrics, String type) {
        Map<String, Double> result = new HashMap<>();
        
        if ("sum".equals(type)) {
            result.put("sum", metrics.stream().mapToDouble(Double::doubleValue).sum());
        } else if ("avg".equals(type)) {
            result.put("average", metrics.stream().mapToDouble(Double::doubleValue).average().orElse(0));
        } else if ("min".equals(type)) {
            result.put("min", metrics.stream().mapToDouble(Double::doubleValue).min().orElse(0));
        } else if ("max".equals(type)) {
            result.put("max", metrics.stream().mapToDouble(Double::doubleValue).max().orElse(0));
        }
        
        return result;
    }
}

4.2 工具注册和管理

java 复制代码
/**
 * 工具注册表和管理器
 */
public class ToolRegistry {
    
    private final Map<String, ToolDefinition> systemTools = new ConcurrentHashMap<>();
    private final Map<String, CustomTool> customTools = new ConcurrentHashMap<>();
    
    public ToolRegistry() {
        initializeSystemTools();
    }
    
    /**
     * 初始化系统工具
     */
    private void initializeSystemTools() {
        // 注册grep
        registerSystemTool(ToolDefinition.builder()
            .name("grep")
            .path("/bin/grep")
            .type("system")
            .version("2.28")
            .description("文本搜索工具")
            .build());
        
        // 注册cat
        registerSystemTool(ToolDefinition.builder()
            .name("cat")
            .path("/bin/cat")
            .type("system")
            .version("8.32")
            .description("文件查看工具")
            .build());
        
        // 注册awk
        registerSystemTool(ToolDefinition.builder()
            .name("awk")
            .path("/usr/bin/awk")
            .type("system")
            .version("5.1.0")
            .description("文本处理工具")
            .build());
        
        // 注册sed
        registerSystemTool(ToolDefinition.builder()
            .name("sed")
            .path("/bin/sed")
            .type("system")
            .version("4.7")
            .description("流编辑器")
            .build());
        
        // 注册tail
        registerSystemTool(ToolDefinition.builder()
            .name("tail")
            .path("/usr/bin/tail")
            .type("system")
            .version("8.32")
            .description("查看文件末尾")
            .build());
        
        // 注册ps
        registerSystemTool(ToolDefinition.builder()
            .name("ps")
            .path("/bin/ps")
            .type("system")
            .version("3.3.17")
            .description("进程信息查看")
            .build());
    }
    
    /**
     * 注册系统工具
     */
    public void registerSystemTool(ToolDefinition toolDef) {
        systemTools.put(toolDef.getName(), toolDef);
        logger.info("已注册系统工具: {} ({})", toolDef.getName(), toolDef.getPath());
    }
    
    /**
     * 注册自定义工具
     */
    public void registerCustomTool(String name, CustomTool tool) {
        customTools.put(name, tool);
        ToolMetadata metadata = tool.getMetadata();
        logger.info("已注册自定义工具: {} v{}", metadata.getName(), metadata.getVersion());
    }
    
    /**
     * 获取工具
     */
    public ToolDefinition getTool(String name) {
        return systemTools.get(name);
    }
    
    /**
     * 获取自定义工具
     */
    public CustomTool getCustomTool(String name) {
        return customTools.get(name);
    }
    
    /**
     * 验证工具是否存在
     */
    public boolean hasTool(String name) {
        return systemTools.containsKey(name) || customTools.containsKey(name);
    }
    
    /**
     * 列出所有工具
     */
    public List<String> listAllTools() {
        List<String> tools = new ArrayList<>();
        tools.addAll(systemTools.keySet());
        tools.addAll(customTools.keySet());
        return tools;
    }
}

@Data
@Builder
class ToolDefinition {
    private String name;
    private String path;
    private String type;  // system / custom
    private String version;
    private String description;
}

5. 报告生成Skill

5.1 报告模板管理

java 复制代码
/**
 * 报告模板系统
 */
public class ReportTemplateEngine {
    
    private final SkillFileSystem fileSystem;
    private final TemplateResolver resolver;
    
    public ReportTemplateEngine(SkillFileSystem fileSystem) {
        this.fileSystem = fileSystem;
        this.resolver = new TemplateResolver();
    }
    
    /**
     * 生成报告
     */
    public String generateReport(
            String templateName,
            Map<String, Object> data,
            ReportFormat format) throws Exception {
        
        logger.info("生成报告: template={}, format={}", templateName, format);
        
        // 加载模板
        String templateContent = loadTemplate(templateName);
        
        // 渲染模板
        String renderedContent = renderTemplate(templateContent, data);
        
        // 转换格式
        return convertFormat(renderedContent, format);
    }
    
    /**
     * 加载模板
     */
    private String loadTemplate(String templateName) throws Exception {
        File templateFile = new File(
            fileSystem.getTemplateDir(),
            templateName + ".md"
        );
        
        return new String(Files.readAllBytes(templateFile.toPath()));
    }
    
    /**
     * 渲染模板
     */
    private String renderTemplate(String template, Map<String, Object> data) {
        // 使用Freemarker或Velocity进行模板渲染
        // 这里简化为简单的字符串替换
        
        String result = template;
        for (Map.Entry<String, Object> entry : data.entrySet()) {
            String placeholder = "${" + entry.getKey() + "}";
            result = result.replace(placeholder, String.valueOf(entry.getValue()));
        }
        
        return result;
    }
    
    /**
     * 格式转换
     */
    private String convertFormat(String content, ReportFormat format) throws Exception {
        switch (format) {
            case MARKDOWN:
                return content;
            case HTML:
                return convertMarkdownToHtml(content);
            case PDF:
                return convertMarkdownToPdf(content);
            default:
                return content;
        }
    }
    
    private String convertMarkdownToHtml(String markdown) {
        // 使用commonmark库进行转换
        return "<html><body>" + markdown + "</body></html>";
    }
    
    private String convertMarkdownToPdf(String markdown) throws Exception {
        // 使用iText或wkhtmltopdf进行转换
        return "PDF内容";
    }
}

public enum ReportFormat {
    MARKDOWN,
    HTML,
    PDF
}

5.2 诊断报告生成Skill

java 复制代码
/**
 * 诊断报告生成Skill
 */
public class DiagnosticReportGenerationSkill extends AgentSkill {
    
    private static final Logger logger = LoggerFactory.getLogger(DiagnosticReportGenerationSkill.class);
    
    private final ReportTemplateEngine templateEngine;
    private final MemoryDiagnosticSkill memoryDiagnosticSkill;
    private final SkillFileSystem fileSystem;
    
    public DiagnosticReportGenerationSkill(
            ReportTemplateEngine templateEngine,
            MemoryDiagnosticSkill memoryDiagnosticSkill,
            SkillFileSystem fileSystem) {
        super("diagnostic_report_generation");
        this.templateEngine = templateEngine;
        this.memoryDiagnosticSkill = memoryDiagnosticSkill;
        this.fileSystem = fileSystem;
    }
    
    /**
     * 生成完整的诊断报告
     */
    public DiagnosticReport generateFullReport() throws Exception {
        logger.info("生成诊断报告...");
        
        DiagnosticReport.Builder reportBuilder = DiagnosticReport.builder()
            .timestamp(LocalDateTime.now())
            .version("1.0");
        
        try {
            // Step 1: 执行诊断
            MemoryDiagnosticReport memoryReport = memoryDiagnosticSkill.diagnose();
            reportBuilder.memoryReport(memoryReport);
            
            // Step 2: 准备报告数据
            Map<String, Object> reportData = prepareReportData(memoryReport);
            
            // Step 3: 生成Markdown报告
            String markdownContent = templateEngine.generateReport(
                "diagnostic_template",
                reportData,
                ReportFormat.MARKDOWN
            );
            reportBuilder.markdownContent(markdownContent);
            
            // Step 4: 转换为其他格式
            String htmlContent = templateEngine.generateReport(
                "diagnostic_template",
                reportData,
                ReportFormat.HTML
            );
            reportBuilder.htmlContent(htmlContent);
            
            // Step 5: 保存报告文件
            String reportPath = saveReport(markdownContent, htmlContent);
            reportBuilder.reportPath(reportPath);
            
            reportBuilder.status("SUCCESS");
            
        } catch (Exception e) {
            logger.error("报告生成失败", e);
            reportBuilder.status("FAILED").error(e.getMessage());
        }
        
        return reportBuilder.build();
    }
    
    /**
     * 准备报告数据
     */
    private Map<String, Object> prepareReportData(MemoryDiagnosticReport memoryReport) {
        Map<String, Object> data = new HashMap<>();
        
        // 执行摘要
        data.put("report_title", "AIOps诊断报告");
        data.put("report_date", LocalDate.now().format(DateTimeFormatter.ISO_DATE));
        data.put("report_time", LocalTime.now().format(DateTimeFormatter.ofPattern("HH:mm:ss")));
        
        // 内存诊断结果
        MemoryStatistics stats = memoryReport.getStatistics();
        data.put("total_memory", formatBytes(stats.getTotalMemory()));
        data.put("used_memory", formatBytes(stats.getUsedMemory()));
        data.put("memory_usage_percent", String.format("%.1f%%", 
            (double) stats.getUsedMemory() / stats.getTotalMemory() * 100));
        
        // 诊断建议
        data.put("recommendations", memoryReport.getRecommendations());
        
        // 进程信息
        List<Map<String, Object>> processesData = new ArrayList<>();
        for (ProcessMemoryInfo proc : memoryReport.getTopProcesses()) {
            Map<String, Object> procData = new HashMap<>();
            procData.put("pid", proc.getPid());
            procData.put("user", proc.getUser());
            procData.put("memory", formatBytes(proc.getMemoryRss()));
            procData.put("command", proc.getCommand());
            processesData.add(procData);
        }
        data.put("top_processes", processesData);
        
        return data;
    }
    
    /**
     * 保存报告文件
     */
    private String saveReport(String markdownContent, String htmlContent) throws Exception {
        String timestamp = LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss"));
        String filename = "diagnostic_report_" + timestamp;
        
        File reportsDir = fileSystem.getOutputDir();
        
        // 保存Markdown
        Files.write(
            new File(reportsDir, filename + ".md").toPath(),
            markdownContent.getBytes(StandardCharsets.UTF_8)
        );
        
        // 保存HTML
        Files.write(
            new File(reportsDir, filename + ".html").toPath(),
            htmlContent.getBytes(StandardCharsets.UTF_8)
        );
        
        logger.info("报告已保存: {}", reportsDir.getAbsolutePath());
        
        return reportsDir.getAbsolutePath() + "/" + filename;
    }
    
    /**
     * 格式化字节大小
     */
    private String formatBytes(long bytes) {
        if (bytes <= 0) return "0 B";
        final String[] units = new String[]{"B", "KB", "MB", "GB", "TB"};
        int digitGroups = (int) (Math.log10(bytes) / Math.log10(1024));
        return String.format("%.1f %s", bytes / Math.pow(1024, digitGroups), units[digitGroups]);
    }
}

@Data
@Builder
class DiagnosticReport {
    private LocalDateTime timestamp;
    private String version;
    private MemoryDiagnosticReport memoryReport;
    private String markdownContent;
    private String htmlContent;
    private String reportPath;
    private String status;
    private String error;
}

6. AIOps诊断和报告生成的集成示例

java 复制代码
/**
 * AIOps诊断和报告生成的集成
 */
public class AIOpsDiagnosticSystem {
    
    private static final Logger logger = LoggerFactory.getLogger(AIOpsDiagnosticSystem.class);
    
    private final SandboxClient sandboxClient;
    private final SkillFileSystem fileSystem;
    private final ToolRegistry toolRegistry;
    private final MemoryDiagnosticSkill memoryDiagnosticSkill;
    private final DiagnosticReportGenerationSkill reportGenerationSkill;
    
    public AIOpsDiagnosticSystem(String runtimeServerHost, int runtimeServerPort) throws Exception {
        logger.info("初始化AIOps诊断系统...");
        
        // 初始化Sandbox客户端
        this.sandboxClient = new SandboxClient(runtimeServerHost, runtimeServerPort);
        
        // 初始化文件系统
        this.fileSystem = new SkillFileSystem("/tmp/skill_workspace");
        fileSystem.initializeSkillDirs("aiops_diagnostics", "diagnostic_reporting");
        
        // 初始化工具注册表
        this.toolRegistry = new ToolRegistry();
        registerCustomTools();
        
        // 初始化诊断Skill
        this.memoryDiagnosticSkill = new MemoryDiagnosticSkill(
            sandboxClient, fileSystem, toolRegistry
        );
        
        // 初始化报告生成Skill
        ReportTemplateEngine templateEngine = new ReportTemplateEngine(fileSystem);
        this.reportGenerationSkill = new DiagnosticReportGenerationSkill(
            templateEngine, memoryDiagnosticSkill, fileSystem
        );
        
        logger.info("✓ AIOps诊断系统初始化完成");
    }
    
    /**
     * 注册自定义工具
     */
    private void registerCustomTools() throws Exception {
        ToolContext context = new ToolContext();
        context.setWorkspaceRoot(fileSystem.getWorkspaceRoot());
        context.setTimeout(30000);
        
        // 注册日志解析工具
        toolRegistry.registerCustomTool("log_parser", new LogParserTool(context));
        
        // 注册指标聚合工具
        toolRegistry.registerCustomTool("metric_aggregator", new MetricAggregatorTool(context));
        
        logger.info("✓ 自定义工具注册完成");
    }
    
    /**
     * 执行完整的诊断和报告流程
     */
    public DiagnosticReport runFullDiagnostics() throws Exception {
        logger.info("\n" + "=".repeat(60));
        logger.info("开始AIOps诊断和报告生成");
        logger.info("=".repeat(60));
        
        try {
            // Step 1: 执行诊断
            logger.info("\n【Step 1】执行诊断...");
            MemoryDiagnosticReport memoryReport = memoryDiagnosticSkill.diagnose();
            logger.info("✓ 诊断完成,状态: {}", memoryReport.getStatus());
            
            // Step 2: 生成报告
            logger.info("\n【Step 2】生成报告...");
            DiagnosticReport report = reportGenerationSkill.generateFullReport();
            logger.info("✓ 报告生成完成,已保存到: {}", report.getReportPath());
            
            // Step 3: 输出总结
            logger.info("\n【Step 3】诊断总结");
            logger.info("总内存: {}", formatBytes(memoryReport.getStatistics().getTotalMemory()));
            logger.info("已用内存: {}", formatBytes(memoryReport.getStatistics().getUsedMemory()));
            logger.info("内存使用率: {:.1f}%",
                (double) memoryReport.getStatistics().getUsedMemory() / 
                memoryReport.getStatistics().getTotalMemory() * 100);
            
            logger.info("\n诊断建议:");
            for (String recommendation : memoryReport.getRecommendations()) {
                logger.info("  {}", recommendation);
            }
            
            logger.info("\n" + "=".repeat(60));
            logger.info("诊断和报告生成成功");
            logger.info("=".repeat(60));
            
            return report;
            
        } catch (Exception e) {
            logger.error("诊断失败", e);
            throw e;
        }
    }
    
    private String formatBytes(long bytes) {
        if (bytes <= 0) return "0 B";
        final String[] units = new String[]{"B", "KB", "MB", "GB", "TB"};
        int digitGroups = (int) (Math.log10(bytes) / Math.log10(1024));
        return String.format("%.1f %s", bytes / Math.pow(1024, digitGroups), units[digitGroups]);
    }
    
    public static void main(String[] args) throws Exception {
        // 连接到Runtime Server
        AIOpsDiagnosticSystem system = new AIOpsDiagnosticSystem(
            "localhost",  // Runtime Server主机
            50051         // Runtime Server端口
        );
        
        // 运行诊断
        DiagnosticReport report = system.runFullDiagnostics();
        
        // 可以进一步处理报告...
    }
}
相关推荐
User_芊芊君子13 小时前
HCCL高性能通信库编程指南:构建多卡并行训练系统
人工智能·游戏·ai·agent·测评
爱喝白开水a19 小时前
前端AI自动化测试:brower-use调研让大模型帮你做网页交互与测试
前端·人工智能·大模型·prompt·交互·agent·rag
落霞的思绪19 小时前
GIS大模型RAG知识库
agent·rag
香芋Yu1 天前
【LangChain1.0】第九篇 Agent 架构设计
langchain·agent·架构设计
组合缺一1 天前
Solon AI (Java) v3.9 正式发布:全能 Skill 爆发,Agent 协作更专业!仍然支持 java8!
java·人工智能·ai·llm·agent·solon·mcp
User_芊芊君子1 天前
AI Agent工业化落地避坑指南:从技术卡点到量产,脉脉AMA给我的实战启示
人工智能·ai·agent·脉脉测评
韦东东2 天前
RAGFlow v0.20的Agent重大更新:text2sql的Agent案例测试
人工智能·大模型·agent·text2sql·ragflow
带刺的坐椅2 天前
用 10 行 Java8 代码,开发一个自己的 ClaudeCodeCLI?你信吗?
java·ai·llm·agent·solon·mcp·claudecode·skills
技术狂人1682 天前
2026 智能体深度解析:落地真相、红利赛道与实操全指南(调研 100 + 案例干货)
人工智能·职场和发展·agent·商机
熊猫钓鱼>_>2 天前
OpenClaw技术分析报告
ai·agent·skill·clawdbot·openclaw·meltbot·wise