Day 8：自主狩猎循环 —— 打造智能体执行引擎

🚀 Day 8：自主狩猎循环 ------ 打造智能体执行引擎

今日目标 ：让 AI 真正长出"手脚"，并且带上"秒表"和"账本"！在 Day 7，我们已经能让 AI 写出包含多轮下钻查询（SPL）的 JSON 蓝图。今天，我们要编写一段强壮的 Python 循环结构，让插件能自动读懂这个 JSON，并且直接唤醒 Splunk 底层引擎去静默执行这些 SPL 代码 。更重要的是，为了满足未来"高管大屏（Dashboard）"对安全态势与性价比的严苛监控需求，我们将在代码中精准埋点，记录每一次查询的证据命中数（定性基础） 、**执行耗时（系统性能）**以及 Token 消耗（FinOps 成本）。

💻 Step 1: 进入 AOB 代码实验室

在我们编写自动化引擎之前，需要准确定位到前端的代码编写区域。

详细操作步骤：

打开浏览器，输入并登录你的 Splunk Enterprise Web 界面。
在左侧菜单栏点击 Splunk Add-on Builder (AOB) 应用图标。
在项目列表中点击进入我们创建的 PEAK-llm-analyzer 插件项目。
在顶部导航栏，点击 Configure Data Collection（配置数据收集）。
在数据收集中找到你之前建立的 PEAK AI Hunter 任务，点击右侧的 Edit（编辑）按钮。
在弹出的编辑向导中，直接点击顶部的第三个标签页：Define & Test（定义与测试）。
现在，你的目光锁定在右侧黑色的 Code Editor（代码编辑器）。

💻 Step 2: 注入"结构化执行引擎"代码

请将右侧代码编辑器中原有的 collect_events 函数及其上方的导入内容全部删除 ，然后直接完整复制并粘贴以下代码。

(⚠️ 极客注：以下代码中的所有注释均已严格遵循企业规范替换为英文，以保证跨平台编码的绝对兼容性。)

python 复制代码

import os
import sys
import time
import datetime
import json
import splunklib.client as client
import splunklib.results as results

def execute_ai_spl(helper, service, spl_query):
    """
    Geek Helper Function: Execute SPL generated by AI and return the raw result data
    """
    spl_query = spl_query.strip()
    
    # Force the 'search' prefix on AI-generated SPL to prevent syntax errors
    if not spl_query.startswith("search") and not spl_query.startswith("|"):
        spl_query = "search " + spl_query
        
    kwargs_oneshot = {"output_mode": "json"}
    helper.log_info(f"[Agentic Engine] Running SPL: {spl_query}")
    
    try:
        # Execute oneshot synchronous search using Splunk Python SDK
        search_results = service.jobs.oneshot(spl_query, **kwargs_oneshot)
        reader = results.JSONResultsReader(search_results)
        
        result_data = []
        for result in reader:
            if isinstance(result, dict):
                result_data.append(result)
                
        helper.log_info(f"[Agentic Engine] SUCCESS: Found {len(result_data)} events.")
        return result_data
    except Exception as e:
        helper.log_error(f"[Agentic Engine] FAILED: SPL execution error: {str(e)}")
        return []

def collect_events(helper, ew):
    """
    Day 8: Dashboard-Ready Autonomous Hunting Loop (Execute Phase)
    """
    helper.log_info("PEAK AI Hunter started Execution cycle.")
    
    # Start the master stopwatch for the entire execution cycle
    cycle_start_time = time.time()

    try:
        # 1. Securely acquire the Session Key and target index
        session_key = getattr(helper, 'session_key', None)
        if not session_key and hasattr(helper, '_input_definition'):
            session_key = getattr(helper._input_definition, 'metadata', {}).get('session_key')
            
        if not session_key:
            helper.log_error("Failed to acquire session_key. Authentication failed.")
            return

        service = client.Service(token=session_key)
        target_index = helper.get_output_index() or "main"

        # 2. Mock the LLM Response JSON (including Dashboard-ready Token metrics)
        # In Day 9, this string will be replaced by the real API response.text
        mock_llm_json_string = """
        {
            "analysis": "发现极低频的 MySQL 1045 错误，存在暴力破解嫌疑。",
            "usage": { "total_tokens": 850 }, 
            "hypotheses": [
                {
                    "hypothesis_id": 1,
                    "ABLE": {
                        "Actor": "外部攻击者",
                        "Behavior": "暴力破解数据库凭证 (T1110)",
                        "Location": "MySQL 数据库服务器",
                        "Evidence": "短时间内出现大量 status=1045 日志"
                    },
                    "spl_round_1_validation": "search index=main status=1045 | stats count by src_ip | sort -count",
                    "spl_round_2_drilldown": "search index=main src_ip=192.168.1.10 | transaction maxspan=5m"
                }
            ]
        }
        """

        # 3. Parse the JSON Output and extract overall metrics
        helper.log_info("Parsing AI generated JSON hunting plan...")
        ai_hunting_plan = json.loads(mock_llm_json_string)
        total_tokens = ai_hunting_plan.get("usage", {}).get("total_tokens", 0)
        hypotheses = ai_hunting_plan.get("hypotheses", [])
        
        # Array to collect all quantitative evidence for Day 9's Act Phase
        all_hunt_evidence = [] 
        
        # 4. The Agentic Loop: Iterate, Execute, and Measure!
        for i, hyp in enumerate(hypotheses):
            # Start the stopwatch for this specific hypothesis
            hyp_start_time = time.time() 
            helper.log_info(f"=== Executing Hunt for Hypothesis [{i+1}] ===")
            
            # Dynamically inject the user-configured target index into the SPL
            spl_r1 = hyp.get("spl_round_1_validation", "").replace("{target_index}", target_index)
            spl_r2 = hyp.get("spl_round_2_drilldown", "").replace("{target_index}", target_index)
            
            # Execute Round 1 (Global Validation)
            r1_results = execute_ai_spl(helper, service, spl_r1)
            
            # Execute Round 2 (Deep Drill-down)
            r2_results = execute_ai_spl(helper, service, spl_r2)
            
            # Calculate the exact duration for this hypothesis execution
            hyp_duration = round(time.time() - hyp_start_time, 2) 
            
            # Calculate the averaged token cost per hypothesis
            token_share = round(total_tokens / len(hypotheses)) if len(hypotheses) > 0 else 0
            
            # 5. Assemble the "Dashboard-Ready" Evidence Package
            evidence_package = {
                "hypothesis_id": hyp.get("hypothesis_id", i+1),
                "threat_behavior": hyp['ABLE'].get('Behavior'),
                "round_1_hit_count": len(r1_results),
                "round_2_hit_count": len(r2_results),
                "execution_duration_sec": hyp_duration,
                "token_cost_share": token_share
            }
            all_hunt_evidence.append(evidence_package)
            helper.log_info(f"[Metrics] Hypothesis {evidence_package['hypothesis_id']} completed in {hyp_duration}s. Hits: R1={len(r1_results)}, R2={len(r2_results)}.")

        # Calculate total time taken for the complete cycle
        total_cycle_time = round(time.time() - cycle_start_time, 2)
        helper.log_info(f"Agentic Execution cycle completed in {total_cycle_time}s. Total Tokens: {total_tokens}.")

        # NOTE: Day 9 code (Act Phase) will be appended below this line to use 'all_hunt_evidence'

    except json.JSONDecodeError as e:
        helper.log_error(f"FATAL: AI output is not valid JSON. Detail: {str(e)}")
    except Exception as e:
        helper.log_error(f"Execution error during Agentic Loop: {str(e)}")

🧪 Step 3: 深度解析：代码中的极客巧思 (Geek Tips)

这段代码看似简单，但藏着三个极具企业级视角的架构设计：

为什么需要 mock_llm_json_string？
- 在 Day 8，我们的核心任务是验证 Splunk SDK 的执行稳定性 。通过硬编码一个绝对正确的 JSON 字符串（模拟大模型的成功返回），我们彻底隔离了网络超时、API 限流、AI 格式幻觉等外部干扰变量。这保证了底层执行引擎能被百分之百跑通，而在未来的正式代码中，这段字符串将被替换为真实的 API 调用返回内容（response.text）。
time.time() 的"双层秒表"设计：
- 我们不仅测量了整个任务的 total_cycle_time（用于评估是否会导致系统调度拥堵），还精确测量了单条假设的 hyp_duration。这能让我们未来在仪表盘中一眼看出哪条 AI 生成的 SPL 写得最"烂"、最拖垮引擎性能。
all_hunt_evidence 的使命：
- 这个列表把"模糊的日志记录"转化为了一组组"精密的结构化字典"。它不仅包含了第一轮和第二轮的命中数量（hit_count），更是 Day 9 喂给大模型做最终威胁定性的决定性定量证据。

⚡ Step 4: 一键防崩测试与固化代码

现在我们必须验证 Python 代码的语法是否正确，以及 Splunk SDK 是否能成功唤醒底层的检索引擎。

详细操作步骤：

在 AOB 界面的左侧的 Event Parameters 输入框中，随便填写一些占位参数（例如输入 Test-Execute-Loop）。
点击页面右上角的绿色 Test（测试） 按钮。
你的目光请移至页面底部的 Output（输出） 面板，等待大约 2 到 5 秒。
关键确认 ：如果 Output 面板末尾显示了绿色的 Done，并且上面没有出现一大堆红色的 Traceback (most recent call last) 报错，这就意味着代码语法完全通过！
极其重要 ：测试成功后，必须立刻点击右上角的 Save（保存） 按钮。只有点击保存，这段代码才会被真正写入到你的系统磁盘中。

📊 Step 5: 全局验证 ------ 预览未来大屏的底层数据态势

代码虽然在后台跑通了，但我们还需要去系统深处，用高级 SPL 来模拟未来 Dashboard 的图表效果，验证我们的数据结构是否达到了"大屏友好"的标准。

详细操作步骤：

点击左上角的 Splunk>enterprise 徽标，回到主页，进入 Search & Reporting 应用。
在搜索输入框中输入以下极客搜索语句，将时间范围设定为 Last 15 minutes：

spl 复制代码

index=_internal "PEAK AI Hunter" OR "[Metrics] Hypothesis"
| rex field=_raw "completed in (?<duration>\d+\.\d+)s\. Hits: R1=(?<r1_hits>\d+), R2=(?<r2_hits>\d+)"
| stats sum(duration) as Total_Execution_Time_Sec, sum(r2_hits) as High_Risk_Evidence_Count

🎉 终极成功标志： 此时，表格中将不再是一堆凌乱的内部日志，而是直接为您输出了两个异常清晰的 KPI 核心指标：总执行耗时 (Total_Execution_Time_Sec) 与 高危命中数 (High_Risk_Evidence_Count)。

如果你看到了这两个数字，恭喜你！！你在底层代码中埋入的安全量化、性能监控和成本追踪基因已经彻底成活。这就是为第 16 天"高管大屏"铺设的最强基石！