服务器又崩了?用 Python 自动分析 Windows 事件日志,3 分钟定位根因

凌晨 3 点,告警响了------服务器无响应。你 SSH 上去一看,日志文件几 GB,手动翻到天亮也没找到原因。如果我告诉你,Python 3 分钟就能搞定呢?

为什么需要自动化日志分析?

Windows 事件日志(Event Log)记录了系统中几乎所有重要事件:

日志类型 内容 重要性
System 系统启动/关闭、驱动错误、服务状态 ⭐⭐⭐⭐⭐
Application 应用程序崩溃、.NET 异常 ⭐⭐⭐⭐
Security 登录尝试、权限变更、审计事件 ⭐⭐⭐⭐⭐
Setup 系统更新、补丁安装 ⭐⭐⭐
ForwardedEvents 从其他机器转发的事件 ⭐⭐⭐⭐

手动用"事件查看器"(Event Viewer)翻日志的痛点:

  • 日志量巨大,一次蓝屏可能产生上万条事件
  • 无法跨多台机器关联分析
  • 没有智能过滤,关键信息淹没在噪音中
  • 无法做趋势分析和预警

方案对比:三种读取方式

python 复制代码
# 方案一:win32evtlog(速度最快,但需要 pywin32)
# 方案二:PowerShell Get-WinEvent(无需额外库)
# 方案三:wevtutil 命令行(最通用,但解析麻烦)

我们三种都讲,但推荐方案一(win32evtlog)用于生产环境。

方案一:win32evtlog 直接读取(推荐)

python 复制代码
import win32evtlog
import win32evtlogutil
import datetime

# 事件级别常量
EVENT_LEVELS = {
    1: "CRITICAL",
    2: "ERROR",
    3: "WARNING",
    4: "INFORMATION",
    5: "VERBOSE",
}

def read_event_log(
    log_name="System",
    server=None,
    count=100,
    level=None,
    event_id=None,
    source=None
):
    """
    读取 Windows 事件日志
    log_name: System / Application / Security / Setup
    server: 远程计算机名(None=本地)
    count: 最多读取条数
    level: 过滤级别 1-5
    """
    hand = win32evtlog.OpenEventLog(server, log_name)
    flags = (
        win32evtlog.EVENTLOG_BACKWARDS_READ
        | win32evtlog.EVENTLOG_SEQUENTIAL_READ
    )

    events = []
    total = 0
    read_count = 0

    while True:
        events_batch = win32evtlog.ReadEventLog(hand, flags, 0)
        if not events_batch:
            break

        for event in events_batch:
            total += 1
            evt_level = event.EventType  # 1=Error, 2=Warning, 4=Info

            # 过滤条件
            if level and evt_level != level:
                continue
            if event_id and event.EventID & 0xFFFF != event_id:
                continue
            if source and event.SourceName != source:
                continue

            # 解析时间
            time_generated = event.TimeGenerated.Format()
            # 转为 datetime
            evt_time = datetime.datetime.strptime(
                time_generated, "%m/%d/%Y %H:%M:%S"
            )

            record = {
                "time": evt_time,
                "level": EVENT_LEVELS.get(evt_level, f"UNKNOWN({evt_level})"),
                "source": event.SourceName,
                "event_id": event.EventID & 0xFFFF,
                "computer": event.ComputerName,
                "message": win32evtlogutil.SafeFormatMessage(event, log_name),
            }
            events.append(record)
            read_count += 1

            if read_count >= count:
                break

        if read_count >= count:
            break

    win32evtlog.CloseEventLog(hand)
    return events


# 示例:读取最近 50 条系统错误
errors = read_event_log("System", count=50, level=1)
print(f"发现 {len(errors)} 条系统错误:\n")
for e in errors[:10]:
    print(f"[{e['time']}] {e['level']} | {e['source']} | "
          f"EventID: {e['event_id']}")
    print(f"  {e['message'][:120]}...")
    print()

方案二:PowerShell 桥接(无需安装 pywin32)

如果你的环境不方便安装 pywin32,可以通过 PowerShell 桥接:

python 复制代码
import subprocess
import json

def get_events_via_powershell(
    log_name="System",
    max_events=100,
    level=None,
    start_time=None
):
    """
    通过 PowerShell Get-WinEvent 读取事件日志
    返回结构化字典列表
    """
    # 构建 PowerShell 命令
    filter_parts = [f"LogName='{log_name}'"]
    if level:
        level_map = {1: 1, 2: 2, 3: 3, 4: 4}  # Error, Warning, Info
        filter_parts.append(f"Level={level_map.get(level, level)}")
    if start_time:
        filter_parts.append(f"StartTime='{start_time}'")

    filter_xml = f"<QueryList><Query Id='0' Path='{log_name}'>" \
                 f"<Select>*[{f' and '.join(filter_parts)}]</Select>" \
                 f"</Query></QueryList>"

    ps_cmd = f'''
    Get-WinEvent -FilterXml "{filter_xml}" -MaxEvents {max_events} |
    Select-Object TimeCreated, LevelDisplayName, ProviderName,
                  Id, MachineName, Message |
    ConvertTo-Json -Depth 3
    '''

    result = subprocess.run(
        ["powershell", "-Command", ps_cmd],
        capture_output=True, text=True, timeout=60
    )

    if result.returncode != 0:
        print(f"PowerShell 错误: {result.stderr}")
        return []

    try:
        data = json.loads(result.stdout)
        if isinstance(data, dict):
            data = [data]
        return data
    except json.JSONDecodeError:
        return []


# 示例:获取最近的系统错误
events = get_events_via_powershell("System", max_events=20, level=1)
for e in events:
    print(f"[{e['TimeCreated']}] {e['LevelDisplayName']} | "
          f"{e['ProviderName']} | ID: {e['Id']}")

方案三:wevtutil 命令行(最通用)

python 复制代码
def export_events_wevtutil(
    log_name="System",
    output_file="events.xml"
):
    """
    使用 wevtutil 导出事件日志为 XML
    """
    cmd = f'wevtutil epl {log_name} "{output_file}"'

    result = subprocess.run(
        cmd, shell=True, capture_output=True, text=True
    )

    if result.returncode == 0:
        print(f"日志已导出到: {output_file}")
        return True
    else:
        print(f"导出失败: {result.stderr}")
        return False

def query_events_wevtutil(log_name="System", count=50):
    """
    使用 wevtutil 查询事件(纯文本格式)
    """
    cmd = f'wevtutil qe {log_name} /c:{count} /f:text /rd:true'

    result = subprocess.run(
        cmd, shell=True, capture_output=True, text=True
    )
    return result.stdout

实战一:日志异常检测引擎

手动翻日志效率太低,我们需要一个能自动发现异常的引擎:

python 复制代码
from collections import Counter, defaultdict
from datetime import datetime, timedelta
import re

class EventLogAnalyzer:
    """Windows 事件日志智能分析引擎"""

    # 已知的关键事件 ID 和含义
    KNOWN_EVENTS = {
        # 系统崩溃相关
        41: "内核电源错误(意外断电/蓝屏)",
        1001: "Windows Error Reporting 故障存储",
        1074: "系统重启/关机事件",
        6008: "意外关机(上次关闭不干净)",
        6009: "系统启动(记录系统版本)",
        6013: "系统运行时间统计",

        # 服务相关
        7036: "服务状态变更",
        7040: "服务启动类型变更",
        7034: "服务意外终止",
        7031: "服务崩溃后自动重启",

        # 磁盘相关
        7: "坏扇区检测",
        9: "设备超时",
        11: "驱动器控制器错误",
        51: "磁盘页文件错误",
        57: "磁盘扇区修复",

        # 网络相关
        2020: "网络连接中断后恢复",

        # DNS 客户端
        1014: "DNS 解析失败",

        # 安全相关
        4625: "登录失败",
        4624: "登录成功",
        4634: "注销",
        4648: "使用显式凭据登录",
        4720: "创建用户账户",
        4732: "添加本地组成员",
        4740: "账户被锁定",
        6416: "新外部设备检测",
    }

    def __init__(self):
        self.events = []
        self.stats = defaultdict(lambda: Counter())

    def load_events(self, events):
        """加载事件列表"""
        self.events = events

    def analyze(self, hours=24):
        """执行全面分析,返回分析报告"""
        now = datetime.now()
        cutoff = now - timedelta(hours=hours)
        recent = [
            e for e in self.events
            if isinstance(e.get("time"), datetime) and e["time"] >= cutoff
        ]

        report = {
            "时间范围": f"最近 {hours} 小时",
            "总事件数": len(recent),
            "summary": {},
            "alerts": [],
            "top_errors": [],
            "top_sources": [],
        }

        # 1. 按级别统计
        level_counts = Counter(e.get("level", "UNKNOWN") for e in recent)
        report["summary"]["按级别统计"] = dict(level_counts)

        # 2. 检查关键事件
        for e in recent:
            eid = e.get("event_id")
            if eid in self.KNOWN_EVENTS and e.get("level") in ("ERROR", "WARNING"):
                report["alerts"].append({
                    "time": e["time"],
                    "event_id": eid,
                    "description": self.KNOWN_EVENTS[eid],
                    "source": e.get("source", ""),
                    "message": e.get("message", "")[:200],
                })

        # 3. TOP 错误事件
        error_events = [e for e in recent if e.get("level") == "ERROR"]
        error_counter = Counter(
            (e.get("event_id"), e.get("source"))
            for e in error_events
        )
        report["top_errors"] = [
            {"event_id": eid, "source": src, "count": cnt, "description": self.KNOWN_EVENTS.get(eid, "未知")}
            for (eid, src), cnt in error_counter.most_common(10)
        ]

        # 4. TOP 错误来源
        source_counter = Counter(e.get("source", "Unknown") for e in error_events)
        report["top_sources"] = source_counter.most_common(10)

        # 5. 检测异常模式
        report["patterns"] = self._detect_patterns(recent)

        return report

    def _detect_patterns(self, events):
        """检测异常模式"""
        patterns = []

        # 模式1:短时间大量同类错误(可能是攻击或故障)
        error_times = [
            e["time"] for e in events
            if e.get("level") == "ERROR" and isinstance(e.get("time"), datetime)
        ]
        if len(error_times) >= 10:
            error_times.sort()
            # 检查 1 分钟内是否有 10+ 错误
            for i in range(len(error_times) - 10):
                if (error_times[i + 9] - error_times[i]).total_seconds() < 60:
                    patterns.append({
                        "type": "错误风暴",
                        "severity": "HIGH",
                        "detail": f"1分钟内出现 {10}+ 条错误事件",
                    })
                    break

        # 模式2:服务反复重启
        service_restarts = Counter()
        for e in events:
            if e.get("event_id") == 7031 and isinstance(e.get("time"), datetime):
                # 从消息中提取服务名
                msg = e.get("message", "")
                match = re.search(r"服务\s+(.+?)\s+意外终止", msg)
                if match:
                    service_restarts[match.group(1)] += 1

        for svc, cnt in service_restarts.items():
            if cnt >= 3:
                patterns.append({
                    "type": "服务频繁重启",
                    "severity": "HIGH",
                    "detail": f"服务 '{svc}' 在分析期间重启了 {cnt} 次",
                })

        # 模式3:连续登录失败(可能的暴力破解)
        login_failures = Counter()
        for e in events:
            if e.get("event_id") == 4625 and isinstance(e.get("time"), datetime):
                login_failures[e.get("computer", "Unknown")] += 1

        for comp, cnt in login_failures.items():
            if cnt >= 5:
                patterns.append({
                    "type": "暴力破解嫌疑",
                    "severity": "CRITICAL",
                    "detail": f"{comp} 出现 {cnt} 次登录失败",
                })

        # 模式4:意外关机
        unexpected_shutdowns = sum(
            1 for e in events
            if e.get("event_id") == 6008 and isinstance(e.get("time"), datetime)
        )
        if unexpected_shutdowns > 0:
            patterns.append({
                "type": "意外关机",
                "severity": "HIGH",
                "detail": f"检测到 {unexpected_shutdowns} 次意外关机",
            })

        return patterns

    def print_report(self, report):
        """打印分析报告"""
        print("=" * 60)
        print(f"  Windows 事件日志分析报告")
        print(f"  {report['时间范围']}")
        print("=" * 60)

        print(f"\n📊 总事件数: {report['总事件数']}")
        print("\n📋 按级别统计:")
        for level, count in report["summary"]["按级别统计"].items():
            icon = {"ERROR": "🔴", "WARNING": "🟡", "INFORMATION": "🟢"}.get(level, "⚪")
            print(f"  {icon} {level}: {count}")

        if report["alerts"]:
            print(f"\n🚨 关键事件 ({len(report['alerts'])} 条):")
            for alert in report["alerts"][:10]:
                print(f"  [{alert['time']}] EventID {alert['event_id']}")
                print(f"  {alert['description']}")
                print()

        if report["patterns"]:
            print(f"\n⚡ 检测到 {len(report['patterns'])} 个异常模式:")
            for p in report["patterns"]:
                severity_icon = {"CRITICAL": "🔴", "HIGH": "🟠", "MEDIUM": "🟡"}.get(p["severity"], "⚪")
                print(f"  {severity_icon} [{p['type']}] {p['detail']}")

        if report["top_errors"]:
            print("\n📈 TOP 错误事件:")
            for item in report["top_errors"][:5]:
                print(f"  EventID {item['event_id']} ({item['source']}): "
                      f"{item['count']} 次 - {item['description']}")

        print("\n" + "=" * 60)


# 使用示例
events = read_event_log("System", count=500)
analyzer = EventLogAnalyzer()
analyzer.load_events(events)
report = analyzer.analyze(hours=24)
analyzer.print_report(report)

实战二:跨机器日志聚合

生产环境往往需要同时分析多台服务器:

python 复制代码
import win32evtlog
from concurrent.futures import ThreadPoolExecutor, as_completed

def collect_logs_from_servers(
    servers, log_name="System", hours=24, max_workers=10
):
    """
    从多台服务器收集事件日志
    返回 {server_name: [events]}
    """
    results = {}

    def fetch_from_server(server):
        try:
            events = read_event_log(log_name, server=server, count=200)
            # 过滤时间范围
            cutoff = datetime.datetime.now() - datetime.timedelta(hours=hours)
            filtered = [
                e for e in events
                if isinstance(e.get("time"), datetime) and e["time"] >= cutoff
            ]
            return server, filtered
        except Exception as e:
            return server, []

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {
            executor.submit(fetch_from_server, s): s
            for s in servers
        }

        for future in as_completed(futures):
            server, events = future.result()
            results[server] = events
            print(f"  ✓ {server}: 获取 {len(events)} 条事件")

    return results


def find_common_errors(server_events_map):
    """
    找出多台服务器共有的错误
    """
    from collections import Counter

    # 统计每台服务器的错误
    error_sets = {}
    for server, events in server_events_map.items():
        errors = Counter(
            (e.get("event_id"), e.get("source"))
            for e in events if e.get("level") == "ERROR"
        )
        if errors:
            error_sets[server] = errors

    # 找共有的错误
    servers = list(error_sets.keys())
    common = None

    for server in servers:
        current_keys = set(error_sets[server].keys())
        if common is None:
            common = current_keys
        else:
            common &= current_keys

    if common:
        print("\n🔍 多台服务器共有的错误:")
        for eid, src in sorted(common):
            desc = EventLogAnalyzer.KNOWN_EVENTS.get(eid, "未知")
            print(f"  EventID {eid} ({src}): {desc}")
            for server in servers:
                count = error_sets[server].get((eid, src), 0)
                print(f"    {server}: {count} 次")

    return common

实战三:日志趋势分析与预警

python 复制代码
import time

class LogMonitor:
    """实时日志监控器"""

    def __init__(self, log_name="System", check_interval=60):
        self.log_name = log_name
        self.check_interval = check_interval
        self.last_check_time = datetime.datetime.now()
        self.baseline = Counter()  # 基线:正常情况下的错误频率
        self.alert_threshold = 3  # 超过基线 N 倍则告警

    def calibrate(self, duration_hours=24):
        """
        校准基线:统计正常情况下的错误频率
        建议在系统正常运行时执行
        """
        print(f"正在校准基线(分析过去 {duration_hours} 小时的日志)...")
        events = read_event_log(self.log_name, count=5000)
        cutoff = datetime.datetime.now() - datetime.timedelta(hours=duration_hours)

        for e in events:
            if isinstance(e.get("time"), datetime) and e["time"] >= cutoff:
                if e.get("level") == "ERROR":
                    key = (e.get("event_id"), e.get("source"))
                    self.baseline[key] += 1

        print(f"基线校准完成,记录了 {sum(self.baseline.values())} 条错误")
        return self.baseline

    def check_new_events(self):
        """检查自上次检查以来的新事件"""
        hand = win32evtlog.OpenEventLog(None, self.log_name)
        flags = (
            win32evtlog.EVENTLOG_BACKWARDS_READ
            | win32evtlog.EVENTLOG_SEQUENTIAL_READ
        )

        new_events = []
        while True:
            batch = win32evtlog.ReadEventLog(hand, flags, 0)
            if not batch:
                break

            for event in batch:
                evt_time = datetime.datetime.strptime(
                    event.TimeGenerated.Format(), "%m/%d/%Y %H:%M:%S"
                )
                if evt_time > self.last_check_time:
                    new_events.append({
                        "time": evt_time,
                        "level": EVENT_LEVELS.get(event.EventType, "INFO"),
                        "source": event.SourceName,
                        "event_id": event.EventID & 0xFFFF,
                    })
                else:
                    break

            break

        win32evtlog.CloseEventLog(hand)

        if new_events:
            self.last_check_time = new_events[0]["time"]

        return new_events

    def evaluate(self, new_events):
        """评估新事件是否异常"""
        errors = [
            e for e in new_events if e.get("level") == "ERROR"
        ]
        current = Counter(
            (e.get("event_id"), e.get("source")) for e in errors
        )

        alerts = []
        for key, count in current.items():
            baseline_count = self.baseline.get(key, 0)
            if baseline_count > 0 and count > baseline_count * self.alert_threshold:
                alerts.append({
                    "severity": "HIGH",
                    "key": key,
                    "current": count,
                    "baseline_avg": baseline_count / 24,  # 每小时平均
                    "description": (
                        f"EventID {key[0]} ({key[1]}) 当前 {count} 次,"
                        f"基线平均 {baseline_count/24:.1f} 次/小时"
                    ),
                })

        return alerts

    def run_once(self):
        """执行一次检查"""
        new_events = self.check_new_events()
        if not new_events:
            print("  无新事件")
            return []

        print(f"  发现 {len(new_events)} 条新事件,"
              f"{sum(1 for e in new_events if e['level']=='ERROR')} 条错误")

        alerts = self.evaluate(new_events)
        if alerts:
            print("  ⚠️ 异常检测:")
            for a in alerts:
                print(f"    {a['description']}")

        return alerts

    def start_monitoring(self):
        """启动持续监控"""
        print("开始持续监控...")
        print("按 Ctrl+C 停止\n")

        try:
            while True:
                now = datetime.datetime.now().strftime("%H:%M:%S")
                print(f"[{now}] 检查 {self.log_name} 日志...")
                self.run_once()
                time.sleep(self.check_interval)
        except KeyboardInterrupt:
            print("\n监控已停止")


# 使用示例
monitor = LogMonitor("System", check_interval=60)
monitor.calibrate(duration_hours=24)
# monitor.start_monitoring()  # 生产环境启用
monitor.run_once()  # 单次检查

实战四:IIS 日志分析

Web 服务器运维中,IIS 日志分析是刚需:

python 复制代码
from pathlib import Path
from collections import Counter
import re

class IISLogAnalyzer:
    """IIS 日志分析器"""

    # IIS 日志字段(常见格式)
    FIELDS = [
        "date", "time", "s-ip", "cs-method", "cs-uri-stem",
        "cs-uri-query", "s-port", "cs-username", "c-ip",
        "cs(User-Agent)", "cs(Referer)", "sc-status",
        "sc-substatus", "sc-win32-status", "time-taken",
    ]

    def parse_log(self, log_file):
        """解析 IIS 日志文件"""
        events = []
        in_header = True

        with open(log_file, "r", encoding="utf-8", errors="ignore") as f:
            for line in f:
                line = line.strip()
                if not line or line.startswith("#"):
                    continue

                parts = line.split()
                if len(parts) != len(self.FIELDS):
                    continue

                event = dict(zip(self.FIELDS, parts))
                events.append(event)

        return events

    def analyze(self, events):
        """分析 IIS 日志"""
        report = {}

        # 1. 状态码分布
        status_codes = Counter(e.get("sc-status", "0") for e in events)
        report["状态码分布"] = status_codes.most_common()

        # 2. TOP 请求路径
        top_urls = Counter(
            e.get("cs-uri-stem", "") for e in events
        )
        report["TOP 请求路径"] = top_urls.most_common(20)

        # 3. TOP 客户端 IP
        top_ips = Counter(e.get("c-ip", "") for e in events)
        report["TOP 客户端 IP"] = top_ips.most_common(20)

        # 4. 平均响应时间
        response_times = []
        for e in events:
            try:
                rt = int(e.get("time-taken", 0))
                if rt > 0:
                    response_times.append(rt)
            except (ValueError, TypeError):
                pass

        if response_times:
            response_times.sort()
            report["响应时间"] = {
                "平均": sum(response_times) / len(response_times),
                "中位数": response_times[len(response_times) // 2],
                "P95": response_times[int(len(response_times) * 0.95)],
                "最大": max(response_times),
                "请求数": len(response_times),
            }

        # 5. 慢请求(超过 3 秒)
        slow_requests = [
            e for e in events
            if e.get("time-taken") and int(e["time-taken"]) > 3000
        ]
        report["慢请求 (>3s)"] = len(slow_requests)

        # 6. 4xx/5xx 错误详情
        error_pages = Counter(
            (e.get("sc-status"), e.get("cs-uri-stem"))
            for e in events
            if e.get("sc-status", "").startswith(("4", "5"))
        )
        report["错误页面"] = error_pages.most_common(10)

        return report

    def print_report(self, report):
        """打印分析报告"""
        print("=" * 60)
        print("  IIS 日志分析报告")
        print("=" * 60)

        print("\n📊 状态码分布:")
        for code, count in report["状态码分布"][:10]:
            print(f"  {code}: {count}")

        if "响应时间" in report:
            rt = report["响应时间"]
            print(f"\n⏱️ 响应时间 (ms):")
            print(f"  平均: {rt['平均']:.0f}ms | "
                  f"中位数: {rt['中位数']}ms | "
                  f"P95: {rt['P95']}ms")

        print(f"\n🐢 慢请求: {report['慢请求 (>3s)']} 个")

        print("\n📈 TOP 5 请求路径:")
        for url, count in report["TOP 请求路径"][:5]:
            print(f"  {url}: {count} 次")

        print("\n🔴 TOP 5 错误页面:")
        for (code, url), count in report["错误页面"][:5]:
            print(f"  [{code}] {url}: {count} 次")

        print("\n🌐 TOP 5 客户端 IP:")
        for ip, count in report["TOP 客户端 IP"][:5]:
            print(f"  {ip}: {count} 次")


# 使用
analyzer = IISLogAnalyzer()
# events = analyzer.parse_log(r"C:\inetpub\logs\LogFiles\W3SVC1\u_ex260506.log")
# report = analyzer.analyze(events)
# analyzer.print_report(report)

自动化报告生成

把分析结果导出为 HTML 报告:

python 复制代码
def generate_html_report(report, output_file="event_report.html"):
    """生成 HTML 格式的分析报告"""
    html = f"""<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Windows 事件日志分析报告</title>
    <style>
        body {{ font-family: 'Segoe UI', sans-serif; background: #1a1a2e; color: #eee; padding: 20px; }}
        h1 {{ color: #00d4ff; border-bottom: 2px solid #00d4ff; padding-bottom: 10px; }}
        .section {{ background: #16213e; border-radius: 8px; padding: 20px; margin: 20px 0; }}
        .alert {{ background: #2d1f1f; border-left: 4px solid #ff4444; padding: 10px 15px; margin: 5px 0; }}
        .warning {{ background: #2d2d1f; border-left: 4px solid #ffaa00; padding: 10px 15px; margin: 5px 0; }}
        table {{ width: 100%; border-collapse: collapse; }}
        th, td {{ padding: 8px 12px; text-align: left; border-bottom: 1px solid #333; }}
        th {{ color: #00d4ff; }}
        .stat {{ display: inline-block; background: #0f3460; padding: 10px 20px; border-radius: 5px; margin: 5px; }}
        .stat .num {{ font-size: 24px; font-weight: bold; color: #00d4ff; }}
    </style>
</head>
<body>
    <h1>Windows 事件日志分析报告</h1>
    <p>分析时间: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>

    <div class="section">
        <h2>概览</h2>
        <div class="stat"><div class="num">{report['总事件数']}</div>总事件数</div>
"""
    # 添加级别统计
    level_icons = {"ERROR": "🔴", "WARNING": "🟡", "INFORMATION": "🟢"}
    for level, count in report["summary"]["按级别统计"].items():
        icon = level_icons.get(level, "⚪")
        html += f'        <div class="stat"><div class="num">{icon} {count}</div>{level}</div>\n'

    # 添加异常模式
    if report["patterns"]:
        html += '    <div class="section"><h2>异常模式</h2>\n'
        for p in report["patterns"]:
            css = "alert" if p["severity"] in ("HIGH", "CRITICAL") else "warning"
            html += f'    <div class="{css}"><strong>[{p["type"]}]</strong> {p["detail"]}</div>\n'
        html += '</div>\n'

    html += '</body></html>'

    with open(output_file, "w", encoding="utf-8") as f:
        f.write(html)

    print(f"报告已生成: {output_file}")
    return output_file


# 生成报告
# generate_html_report(report)

小结

需求 方案 优势
读取本地日志 win32evtlog 速度最快,功能最全
无 pywin32 PowerShell 桥接 零依赖
跨机器收集 WMI + 多线程 批量高效
异常检测 统计基线 + 阈值 自动发现故障
IIS 日志 文件解析 + 统计 网站排障必备
报告输出 HTML 自动生成 可视化分享

日志分析是 IT 运维的核心技能。Windows 事件日志包含了排障所需的大部分信息,关键是你能不能快速找到那条关键事件。

下一篇预告:Windows 补丁管理是安全合规的命脉。我们来看看如何用 Python 实现自动化补丁扫描、安装和合规报告。