Day10:直面深水区——总结系统痛点与底层架构重塑

🚀 Day 10:直面深水区 ------ 总结系统痛点与底层架构重塑

今日目标 :在经历了前几天的狂欢,系统成功跑通了 AI 闭环后,我们必须立刻脱下"安全分析师"的外套,换上"底层架构师 "的思维。在真实的企业级环境中,海量数据与 API 限制会轻易摧毁我们在 Day 9 构建的"完美结构"。今天,我们将直面大模型开发中最致命的两大痛点:Token 额度熔断JSON 超长嵌套截断。通过引入防弹兜底逻辑与"扁平化革命",我们将完成从"能跑通的玩具脚本"到"坚不可摧的企业级引擎"的蜕变!


🚨 架构师复盘:为什么 Day 9 的代码在实战中必定会崩溃?

在 Day 9,我们把"初始蓝图"、"执行证据"和"最终战报"塞进了一个巨大的 master_payload 字典中,最后一次性写入 Splunk。这种设计在软件工程中被称为"上帝对象(God Object)反模式"。

它会带来两个致命的灾难:

  1. 大模型 Token 额度熔断(Token Storm) : 如果 Day 8 收集到的证据极为庞大,或者 AI 在 Day 9 分析时"长篇大论",生成的 JSON 极易超过 API 的单次上下文输出上限(例如 4000 Tokens)。一旦触顶,API 会硬生生切断输出,导致返回的 JSON 缺失结尾的 } 括号。Python 在执行 json.loads() 时会当场抛出 JSONDecodeError 异常,导致整个 5 分钟的调度任务直接挂掉。
  2. Splunk 嵌套解析崩溃(The Nested JSON Trap): 把数百行、嵌套了四五层的巨型 JSON 单行日志强行塞给 Splunk,不仅会严重拖垮底层全文检索(Text-search)的 I/O 性能,还会导致 Splunk 前端的字段自动提取器(Auto KV Extraction)罢工,让你在大屏上根本无法提取深层字段。

🛠️ 代码优化 1:防弹衣 (Bulletproof Fallback)

优化方案

  1. 在未来接入真实网络请求时,必须在 API 参数中强制指定 max_tokens
  2. 在代码中编写极其强壮的 try...except json.JSONDecodeError 兜底逻辑。核心原则:即便 JSON 断裂,程序也不能死,必须将残缺的原始文本(Raw Text)作为"错误事件"写入日志,保留现场!

🛠️ 代码优化 2:扁平化革命 (The Flattening Revolution)

优化方案 : 引入全局唯一标识符(UUID)。我们在单次调度周期开始时生成一个 session_id,然后彻底打碎那个巨型的 master_payload。 把整个过程拆分为三个独立且扁平的事件 写入 Splunk,它们通过 session_id 像链条一样相互关联:

  1. 第一条日志:event_type="PEAK_Plan"(刚拿到计划就立刻落盘,防止后续崩溃导致计划也丢失)。
  2. 第二条日志:event_type="PEAK_Evidence"(收集完证据立刻落盘)。
  3. 第三条日志:event_type="PEAK_Final_Report"(拿到最终打分后落盘)。

💻 终极实战:Day 10 全量架构重构代码

请打开 Add-on Builder 的 Define & Test 编辑器。清空原有代码,复制并粘贴这套经过终极架构优化的全量代码。

python 复制代码
import os
import sys
import time
import datetime
import json
import uuid
import splunklib.client as client
import splunklib.results as results

def execute_ai_spl(helper, service, spl_query):
    """
    Geek Helper Function: Execute SPL generated by AI and return the raw result data.
    """
    spl_query = spl_query.strip()
    
    # Force the 'search' prefix on AI-generated SPL to prevent syntax errors
    if not spl_query.startswith("search") and not spl_query.startswith("|"):
        spl_query = "search " + spl_query
        
    kwargs_oneshot = {"output_mode": "json"}
    helper.log_info(f"[Agentic Engine] Running SPL: {spl_query}")
    
    try:
        # Execute oneshot synchronous search using Splunk Python SDK
        search_results = service.jobs.oneshot(spl_query, **kwargs_oneshot)
        reader = results.JSONResultsReader(search_results)
        
        result_data = []
        for result in reader:
            if isinstance(result, dict):
                result_data.append(result)
                
        helper.log_info(f"[Agentic Engine] SUCCESS: Found {len(result_data)} events.")
        return result_data
    except Exception as e:
        helper.log_error(f"[Agentic Engine] FAILED: SPL execution error: {str(e)}")
        return []

def collect_events(helper, ew):
    """
    Day 10: Enterprise-Grade Flattened Architecture with Resilience.
    Handles Token Storms, JSON Truncation, and Splunk God-Object Anti-patterns.
    """
    helper.log_info("PEAK AI Hunter started the flattened execution cycle.")
    cycle_start_time = time.time()

    # Generate a unique Session ID to bind all flattened events together
    hunt_session_id = str(uuid.uuid4())
    helper.log_info(f"Initialized new Hunt Session ID: {hunt_session_id}")

    try:
        # ==========================================
        # STEP 1: Secure Configuration Setup
        # ==========================================
        session_key = getattr(helper, 'session_key', None)
        if not session_key and hasattr(helper, '_input_definition'):
            session_key = getattr(helper._input_definition, 'metadata', {}).get('session_key')
            
        if not session_key:
            helper.log_error("Failed to acquire session_key. Authentication failed.")
            return

        service = client.Service(token=session_key)
        target_index = helper.get_output_index() or "main"
        timestamp_now = datetime.datetime.utcnow().isoformat()

        # ==========================================
        # STEP 2: The Prepare Phase Blueprint & Immediate Ingestion
        # ==========================================
        mock_llm_json_string = """
        {
            "analysis": "Discovered rare MySQL 1045 errors indicating potential brute-force activity.",
            "hypotheses": [
                {
                    "hypothesis_id": 1,
                    "ABLE": {
                        "Actor": "External attacker",
                        "Behavior": "Database credential brute-forcing (T1110)",
                        "Location": "MySQL Database Server",
                        "Evidence": "High volume of status=1045 logs in a short timeframe"
                    },
                    "spl_round_1_validation": "search index=main status=1045 | stats count by src_ip | sort -count",
                    "spl_round_2_drilldown": "search index=main src_ip=192.168.1.10 | transaction maxspan=5m"
                }
            ]
        }
        """

        # Parse blueprint
        ai_hunting_plan = json.loads(mock_llm_json_string)
        hypotheses = ai_hunting_plan.get("hypotheses", [])

        # FLATTENING REVOLUTION 1: Write the plan to Splunk IMMEDIATELY
        plan_event = helper.new_event(
            source=helper.get_input_type(),
            index=target_index,
            sourcetype="_json",
            data=json.dumps({
                "session_id": hunt_session_id,
                "event_type": "PEAK_Plan",
                "timestamp": timestamp_now,
                "content": ai_hunting_plan
            }, ensure_ascii=False)
        )
        ew.write_event(plan_event)
        
        # ==========================================
        # STEP 3: The Execute Phase (Agentic Loop)
        # ==========================================
        all_hunt_evidence = [] 
        
        for i, hyp in enumerate(hypotheses):
            hyp_start_time = time.time() 
            helper.log_info(f"=== Executing Hunt for Hypothesis [{i+1}] ===")
            
            spl_r1 = hyp.get("spl_round_1_validation", "").replace("{target_index}", target_index)
            spl_r2 = hyp.get("spl_round_2_drilldown", "").replace("{target_index}", target_index)
            
            r1_results = execute_ai_spl(helper, service, spl_r1)
            r2_results = execute_ai_spl(helper, service, spl_r2)
            
            evidence_package = {
                "hypothesis_id": hyp.get("hypothesis_id", i+1),
                "threat_behavior": hyp['ABLE'].get('Behavior'),
                "round_1_hit_count": len(r1_results),
                "round_2_hit_count": len(r2_results),
                "execution_duration_sec": round(time.time() - hyp_start_time, 2)
            }
            all_hunt_evidence.append(evidence_package)

        # FLATTENING REVOLUTION 2: Write the collected evidence to Splunk
        evidence_event = helper.new_event(
            source=helper.get_input_type(),
            index=target_index,
            sourcetype="_json",
            data=json.dumps({
                "session_id": hunt_session_id,
                "event_type": "PEAK_Evidence",
                "timestamp": timestamp_now,
                "content": all_hunt_evidence
            }, ensure_ascii=False)
        )
        ew.write_event(evidence_event)

        # ==========================================
        # STEP 4: The Act Phase & Bulletproof Fallback
        # ==========================================
        helper.log_info("Triggering LLM API for Final Assessment...")
        
        # Mocking an LLM response (Imagine this was generated via requests.post with max_tokens)
        mock_act_response = """
        {
            "executive_summary": "Confirmed Threat: High volume of credential brute-forcing detected.",
            "threat_qualification": "Confirmed Threat",
            "risk_score": 92
        }
        """
        
        final_report = {}
        # BULLETPROOF FALLBACK: Handle JSON Truncation gracefully
        try:
            final_report = json.loads(mock_act_response.strip())
        except json.JSONDecodeError as e:
            helper.log_error(f"Token Storm / JSON Truncation detected! Preserving raw text. Detail: {str(e)}")
            # Rescue the raw, broken string instead of crashing the pipeline
            final_report = {
                "executive_summary": "SYSTEM ERROR: LLM Token Limit Exceeded or Output Truncated.",
                "threat_qualification": "Unknown",
                "risk_score": -1,
                "raw_unparsed_text": mock_act_response
            }

        # FLATTENING REVOLUTION 3: Write the final report as a standalone event
        report_event = helper.new_event(
            source=helper.get_input_type(),
            index=target_index,
            sourcetype="_json",
            data=json.dumps({
                "session_id": hunt_session_id,
                "event_type": "PEAK_Final_Report",
                "timestamp": timestamp_now,
                "content": final_report
            }, ensure_ascii=False)
        )
        ew.write_event(report_event)
        
        total_cycle_time = round(time.time() - cycle_start_time, 2)
        helper.log_info(f"SUCCESS: Flattened architecture pipeline completed in {total_cycle_time} seconds. Session ID: {hunt_session_id}")

    except Exception as e:
        helper.log_error(f"Critical Failure in Enterprise Architecture Workflow: {str(e)}")

🔍 极客验证:体验扁平化数据的"拼接魔法"

点击 AOB 右上角的 Test 按钮,并且点击 Save 保存代码。

以前你是在一条巨型 JSON 里找数据,现在数据被拆成了三条轻巧的日志。未来到了 Day 16 做大屏时,你要怎么把它们拼回来呢?

请前往 Splunk 的 Search & Reporting,执行这段极具艺术感的企业级查询:

spl 复制代码
index=main sourcetype="_json" event_type="PEAK_Plan" OR event_type="PEAK_Evidence" OR event_type="PEAK_Final_Report"
| stats 
    latest(content.risk_score) as Risk_Score,
    latest(content.executive_summary) as Summary,
    sum(content{}.round_1_hit_count) as Total_R1_Hits,
    sum(content{}.round_2_hit_count) as Total_R2_Hits
  by session_id
| sort - Risk_Score

🎉 架构重塑成功标志: 此时,你会看到 Splunk 以极其丝滑的速度瞬间返回结果!利用强大的 stats by session_id 机制,Splunk 在毫秒间就将这三个独立的日志重新拼接在了一起。

  • 如果大模型崩溃了Risk_Score 会显示为 -1,并且你能看到错误详情,但计划和证据依然安全地保存在硬盘里!
  • 性能飞跃:日志体积大幅减小,极大释放了 Splunk 索引器的内存压力,再也不用担心复杂 JSON 导致系统假死。

大功告成! 你的底层 Python 引擎现在不仅能跑,而且具备了"容错(Resilience)"与"解耦(Decoupling)"的企业级血统。我们随时准备迎接下一个挑战!

相关推荐
MediaTea1 小时前
人工智能通识课:深度学习
人工智能·深度学习
2601_949936961 小时前
2026电商运营个人能力提升计划进阶指南
大数据·人工智能
Surpass-HC1 小时前
添加CLAUDE.md规则
人工智能
Slow菜鸟1 小时前
AI 代码知识图谱 教程(一)| Codegraph(纯代码)
人工智能·知识图谱
薛定猫AI1 小时前
【深度解析】Claude Opus 4.8:高推理强度、Agentic Coding 与长任务工作流实战
人工智能
谁似人间西林客1 小时前
告别“手搓”时代:工艺智能如何解放工程师双手
人工智能
凯丨1 小时前
200 行 Python 训练一个 GPT:Karpathy 的极简主义 AI 教育实验
人工智能·python·gpt
波动几何2 小时前
工作流重构与社会生产关系的再组织——基于AI能力模型和第一性原理的分析框架
人工智能
2501_927283582 小时前
堆垛机立体库:告别人工翻找与货物堆压
大数据·人工智能·低代码·自动化·区块链