Kafka 消息队列监控:Topic 积压、吞吐量、Broker 负载及消费者组全观测
一句话先读懂本文 :某电商大促当天,订单 Topic 积压 200 万条,三个消费者组全部卡死,但 Kafka 集群 CPU 只有 30%------监控"正常",业务已经崩了。
kafka_exporter告诉你:问题不在机器,在消息。
一、大促当天,订单 Topic 积压了 200 万条
某电商平台双十一大促,凌晨两点,值班人员收到大量用户投诉:"支付成功但订单状态未更新"。
运维登录 Kafka 集群查看------三个 Broker 的 CPU 使用率均在 30% 左右,内存正常,磁盘未满。从服务器监控看,集群"非常健康"。但打开业务监控,订单 Topic order_paid 的消费积压(Lag)已经达到 200 万条,且还在以每秒 5000 条的速度增长。
查消费者组状态:order_status_consumer 组的三个消费者全部离线,但没有任何告警触发。Kafka 集群本身不会告诉你"消费端挂了"------它只会帮你把消息存着,直到磁盘爆满。
#mermaid-svg-uAwCsARQJgwj65U3{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-uAwCsARQJgwj65U3 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-uAwCsARQJgwj65U3 .error-icon{fill:#552222;}#mermaid-svg-uAwCsARQJgwj65U3 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uAwCsARQJgwj65U3 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uAwCsARQJgwj65U3 .marker.cross{stroke:#333333;}#mermaid-svg-uAwCsARQJgwj65U3 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uAwCsARQJgwj65U3 p{margin:0;}#mermaid-svg-uAwCsARQJgwj65U3 .mermaid-main-font{font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .exclude-range{fill:#eeeeee;}#mermaid-svg-uAwCsARQJgwj65U3 .section{stroke:none;opacity:0.2;}#mermaid-svg-uAwCsARQJgwj65U3 .section0{fill:rgba(102, 102, 255, 0.49);}#mermaid-svg-uAwCsARQJgwj65U3 .section2{fill:#fff400;}#mermaid-svg-uAwCsARQJgwj65U3 .section1,#mermaid-svg-uAwCsARQJgwj65U3 .section3{fill:white;opacity:0.2;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle0{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle1{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle2{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle3{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle{text-anchor:start;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .grid .tick{stroke:lightgrey;opacity:0.8;shape-rendering:crispEdges;}#mermaid-svg-uAwCsARQJgwj65U3 .grid .tick text{font-family:"trebuchet ms",verdana,arial,sans-serif;fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .grid path{stroke-width:0;}#mermaid-svg-uAwCsARQJgwj65U3 .today{fill:none;stroke:red;stroke-width:2px;}#mermaid-svg-uAwCsARQJgwj65U3 .task{stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .taskText{text-anchor:middle;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideRight{fill:black;text-anchor:start;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideLeft{fill:black;text-anchor:end;}#mermaid-svg-uAwCsARQJgwj65U3 .task.clickable{cursor:pointer;}#mermaid-svg-uAwCsARQJgwj65U3 .taskText.clickable{cursor:pointer;fill:#003163!important;font-weight:bold;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163!important;font-weight:bold;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163!important;font-weight:bold;}#mermaid-svg-uAwCsARQJgwj65U3 .taskText0,#mermaid-svg-uAwCsARQJgwj65U3 .taskText1,#mermaid-svg-uAwCsARQJgwj65U3 .taskText2,#mermaid-svg-uAwCsARQJgwj65U3 .taskText3{fill:white;}#mermaid-svg-uAwCsARQJgwj65U3 .task0,#mermaid-svg-uAwCsARQJgwj65U3 .task1,#mermaid-svg-uAwCsARQJgwj65U3 .task2,#mermaid-svg-uAwCsARQJgwj65U3 .task3{fill:#8a90dd;stroke:#534fbc;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside0,#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside2{fill:black;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside1,#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside3{fill:black;}#mermaid-svg-uAwCsARQJgwj65U3 .active0,#mermaid-svg-uAwCsARQJgwj65U3 .active1,#mermaid-svg-uAwCsARQJgwj65U3 .active2,#mermaid-svg-uAwCsARQJgwj65U3 .active3{fill:#bfc7ff;stroke:#534fbc;}#mermaid-svg-uAwCsARQJgwj65U3 .activeText0,#mermaid-svg-uAwCsARQJgwj65U3 .activeText1,#mermaid-svg-uAwCsARQJgwj65U3 .activeText2,#mermaid-svg-uAwCsARQJgwj65U3 .activeText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .done0,#mermaid-svg-uAwCsARQJgwj65U3 .done1,#mermaid-svg-uAwCsARQJgwj65U3 .done2,#mermaid-svg-uAwCsARQJgwj65U3 .done3{stroke:grey;fill:lightgrey;stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .doneText0,#mermaid-svg-uAwCsARQJgwj65U3 .doneText1,#mermaid-svg-uAwCsARQJgwj65U3 .doneText2,#mermaid-svg-uAwCsARQJgwj65U3 .doneText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .doneText0.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText0.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneText1.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText1.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneText2.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText2.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneText3.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText3.taskTextOutsideRight{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .crit0,#mermaid-svg-uAwCsARQJgwj65U3 .crit1,#mermaid-svg-uAwCsARQJgwj65U3 .crit2,#mermaid-svg-uAwCsARQJgwj65U3 .crit3{stroke:#ff8888;fill:red;stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit0,#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit1,#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit2,#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit3{stroke:#ff8888;fill:#bfc7ff;stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit0,#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit1,#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit2,#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit3{stroke:#ff8888;fill:lightgrey;stroke-width:2;cursor:pointer;shape-rendering:crispEdges;}#mermaid-svg-uAwCsARQJgwj65U3 .milestone{transform:rotate(45deg) scale(0.8,0.8);}#mermaid-svg-uAwCsARQJgwj65U3 .milestoneText{font-style:italic;}#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText0,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText1,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText2,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText0.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText0.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText1.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText1.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText2.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText2.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText3.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText3.taskTextOutsideRight{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .vert{stroke:navy;}#mermaid-svg-uAwCsARQJgwj65U3 .vertText{font-size:15px;text-anchor:middle;fill:navy!important;}#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText0,#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText1,#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText2,#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .titleText{text-anchor:middle;font-size:18px;fill:#333;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 消费者组全部离线 消息开始积压 Lag 达到 10 万条 Lag 达到 100 万条 人工发现并介入 消费者重启 Lag 下降 故障阶段修复阶段 故障时间线
这个故障持续了 47 分钟 才被发现和修复。
如果有这样一套监控体系------能实时展示每个 Topic 的积压量、每个消费者组的存活状态、生产和消费的速率对比------这个故障的发现时间可以从 47 分钟缩短到 3 分钟。
下面这套基于 kafka_exporter 的监控方案,正是为了解决这类问题。
二、Kafka 原理
2.1 消息流转与积压原理
#mermaid-svg-m2nonpJMECVwsurZ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-m2nonpJMECVwsurZ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-m2nonpJMECVwsurZ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-m2nonpJMECVwsurZ .error-icon{fill:#552222;}#mermaid-svg-m2nonpJMECVwsurZ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-m2nonpJMECVwsurZ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-m2nonpJMECVwsurZ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-m2nonpJMECVwsurZ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-m2nonpJMECVwsurZ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-m2nonpJMECVwsurZ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-m2nonpJMECVwsurZ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-m2nonpJMECVwsurZ .marker.cross{stroke:#333333;}#mermaid-svg-m2nonpJMECVwsurZ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-m2nonpJMECVwsurZ p{margin:0;}#mermaid-svg-m2nonpJMECVwsurZ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster-label text{fill:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster-label span{color:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster-label span p{background-color:transparent;}#mermaid-svg-m2nonpJMECVwsurZ .label text,#mermaid-svg-m2nonpJMECVwsurZ span{fill:#333;color:#333;}#mermaid-svg-m2nonpJMECVwsurZ .node rect,#mermaid-svg-m2nonpJMECVwsurZ .node circle,#mermaid-svg-m2nonpJMECVwsurZ .node ellipse,#mermaid-svg-m2nonpJMECVwsurZ .node polygon,#mermaid-svg-m2nonpJMECVwsurZ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .rough-node .label text,#mermaid-svg-m2nonpJMECVwsurZ .node .label text,#mermaid-svg-m2nonpJMECVwsurZ .image-shape .label,#mermaid-svg-m2nonpJMECVwsurZ .icon-shape .label{text-anchor:middle;}#mermaid-svg-m2nonpJMECVwsurZ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .rough-node .label,#mermaid-svg-m2nonpJMECVwsurZ .node .label,#mermaid-svg-m2nonpJMECVwsurZ .image-shape .label,#mermaid-svg-m2nonpJMECVwsurZ .icon-shape .label{text-align:center;}#mermaid-svg-m2nonpJMECVwsurZ .node.clickable{cursor:pointer;}#mermaid-svg-m2nonpJMECVwsurZ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-m2nonpJMECVwsurZ .arrowheadPath{fill:#333333;}#mermaid-svg-m2nonpJMECVwsurZ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-m2nonpJMECVwsurZ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-m2nonpJMECVwsurZ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-m2nonpJMECVwsurZ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-m2nonpJMECVwsurZ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-m2nonpJMECVwsurZ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-m2nonpJMECVwsurZ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .cluster text{fill:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster span{color:#333;}#mermaid-svg-m2nonpJMECVwsurZ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-m2nonpJMECVwsurZ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-m2nonpJMECVwsurZ rect.text{fill:none;stroke-width:0;}#mermaid-svg-m2nonpJMECVwsurZ .icon-shape,#mermaid-svg-m2nonpJMECVwsurZ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-m2nonpJMECVwsurZ .icon-shape p,#mermaid-svg-m2nonpJMECVwsurZ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-m2nonpJMECVwsurZ .icon-shape .label rect,#mermaid-svg-m2nonpJMECVwsurZ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-m2nonpJMECVwsurZ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-m2nonpJMECVwsurZ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-m2nonpJMECVwsurZ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 消费端
Kafka Broker
生产端
积压 Lag
Lag = 已生产 - 已消费
积压 > 0 且持续增长
= 消费者跟不上生产者
Producer 生产者
Topic
Partition 0
Partition 1
Partition 2
Consumer 0
Consumer 1
Consumer 2
2.2 使用场景与监控侧重点
在不同业务场景中,Kafka 扮演着不同角色,监控的侧重点也完全不同。
| 企业场景 | 典型 Topic | 最关心的指标 | 为什么 |
|---|---|---|---|
| 电商订单 | order_created, order_paid | 消费者组 Lag、消费速率 | 积压直接导致订单状态不同步 |
| 日志采集 | app_logs, access_log | 生产吞吐量、Broker 磁盘 | 日志量突增会打满磁盘 |
| 金融交易 | transaction_raw | 端到端延迟、分区 Leader 分布 | 延迟敏感,Leader 倾斜影响性能 |
| 数据同步 | cdc_user, cdc_order | Lag 增长趋势、消费者组成员 | 同步中断会导致数仓数据延迟 |
某头部电商平台,未配置 Kafka 监控前,每月平均发生 3 次因消费者组离线导致的积压故障,平均恢复时间 45 分钟。配置完整监控告警后,同类故障的发现时间缩短至 2 分钟,恢复时间降至 8 分钟。
2.3 监控架构图

#mermaid-svg-FKOXH79Q6hVFQMei{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-FKOXH79Q6hVFQMei .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-FKOXH79Q6hVFQMei .error-icon{fill:#552222;}#mermaid-svg-FKOXH79Q6hVFQMei .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-FKOXH79Q6hVFQMei .marker{fill:#333333;stroke:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei .marker.cross{stroke:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-FKOXH79Q6hVFQMei p{margin:0;}#mermaid-svg-FKOXH79Q6hVFQMei .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster-label text{fill:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster-label span{color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster-label span p{background-color:transparent;}#mermaid-svg-FKOXH79Q6hVFQMei .label text,#mermaid-svg-FKOXH79Q6hVFQMei span{fill:#333;color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .node rect,#mermaid-svg-FKOXH79Q6hVFQMei .node circle,#mermaid-svg-FKOXH79Q6hVFQMei .node ellipse,#mermaid-svg-FKOXH79Q6hVFQMei .node polygon,#mermaid-svg-FKOXH79Q6hVFQMei .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .rough-node .label text,#mermaid-svg-FKOXH79Q6hVFQMei .node .label text,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape .label,#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape .label{text-anchor:middle;}#mermaid-svg-FKOXH79Q6hVFQMei .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .rough-node .label,#mermaid-svg-FKOXH79Q6hVFQMei .node .label,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape .label,#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape .label{text-align:center;}#mermaid-svg-FKOXH79Q6hVFQMei .node.clickable{cursor:pointer;}#mermaid-svg-FKOXH79Q6hVFQMei .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei .arrowheadPath{fill:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-FKOXH79Q6hVFQMei .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-FKOXH79Q6hVFQMei .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-FKOXH79Q6hVFQMei .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-FKOXH79Q6hVFQMei .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-FKOXH79Q6hVFQMei .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-FKOXH79Q6hVFQMei .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster text{fill:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster span{color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-FKOXH79Q6hVFQMei .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-FKOXH79Q6hVFQMei rect.text{fill:none;stroke-width:0;}#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape p,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape .label rect,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-FKOXH79Q6hVFQMei .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-FKOXH79Q6hVFQMei .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-FKOXH79Q6hVFQMei :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 可视化
存储与告警
指标采集层
Kafka 集群
Broker 1
Broker 2
Broker 3
kafka_exporter
Lag / 吞吐量 / 消费者组
JMX Exporter
JVM / 磁盘 / 请求率
Prometheus
指标存储 + 告警
Grafana 仪表盘
告警通知
- 橙色 :两类采集器------
kafka_exporter(消费者组 Lag 和 Topic 吞吐)+JMX Exporter(Broker 内部 JVM 和磁盘指标) - 绿色:Prometheus + Grafana 的存储展示层
两个 Exporter 配合,既能看到"队列里有多少消息没消费",也能看到"Broker 是不是快扛不住了"。
三、kafka_exporter 实战部署
3.1 容器化部署
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-exporter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: kafka-exporter
template:
metadata:
labels:
app: kafka-exporter
spec:
containers:
- name: exporter
image: danielqsj/kafka-exporter:latest
args:
- --kafka.server=kafka-broker-1:9092
- --kafka.server=kafka-broker-2:9092
- --kafka.server=kafka-broker-3:9092
- --kafka.version=2.8.0
- --topic.filter=.* # 监控所有 Topic
- --group.filter=.* # 监控所有消费者组
- --sasl.enabled=false
ports:
- containerPort: 9308
name: metrics
参数说明:
| 参数 | 说明 |
|---|---|
--kafka.server |
可以指定多个 Broker,Exporter 会自动发现集群 |
--topic.filter |
用正则过滤 Topic,生产环境建议排除内部 Topic(如 __consumer_offsets) |
--group.filter |
同上,过滤不需要监控的消费者组 |
3.2 Prometheus 采集配置
yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kafka-exporter-monitor
spec:
selector:
matchLabels:
app: kafka-exporter
endpoints:
- port: metrics
interval: 30s # Lag 指标 30 秒采集足够
scrapeTimeout: 10s
3.3 验证指标
在 Prometheus UI 中查询:
promql
# 查看所有消费者组的 Lag
kafka_consumer_lag
# 应该返回类似:
# {consumergroup="order-processor", topic="order-events", partition="0"} 12500
# {consumergroup="order-processor", topic="order-events", partition="1"} 8700
四、核心指标与 PromQL
4.1 Topic 积压量------最关键的告警指标
promql
# 单个消费者组的总 Lag(所有分区求和)
sum by (consumergroup, topic) (kafka_consumer_lag)
# Lag 增长速度(每秒积压多少条)
sum by (consumergroup, topic) (rate(kafka_consumer_lag[1m]))
Lag 不等于 0 不代表有问题,长期稳定的少量积压是正常的。告警应该关注:
- 单个 Partition 的 Lag > 10000
- Lag 在 5 分钟内增长了 5000 以上
4.2 吞吐量监控------生产-消费速率对比
#mermaid-svg-MkuCt4cDr33cszSB{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-MkuCt4cDr33cszSB .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-MkuCt4cDr33cszSB .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-MkuCt4cDr33cszSB .error-icon{fill:#552222;}#mermaid-svg-MkuCt4cDr33cszSB .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-MkuCt4cDr33cszSB .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-MkuCt4cDr33cszSB .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-MkuCt4cDr33cszSB .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-MkuCt4cDr33cszSB .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-MkuCt4cDr33cszSB .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-MkuCt4cDr33cszSB .marker{fill:#333333;stroke:#333333;}#mermaid-svg-MkuCt4cDr33cszSB .marker.cross{stroke:#333333;}#mermaid-svg-MkuCt4cDr33cszSB svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-MkuCt4cDr33cszSB p{margin:0;}#mermaid-svg-MkuCt4cDr33cszSB .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster-label text{fill:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster-label span{color:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster-label span p{background-color:transparent;}#mermaid-svg-MkuCt4cDr33cszSB .label text,#mermaid-svg-MkuCt4cDr33cszSB span{fill:#333;color:#333;}#mermaid-svg-MkuCt4cDr33cszSB .node rect,#mermaid-svg-MkuCt4cDr33cszSB .node circle,#mermaid-svg-MkuCt4cDr33cszSB .node ellipse,#mermaid-svg-MkuCt4cDr33cszSB .node polygon,#mermaid-svg-MkuCt4cDr33cszSB .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .rough-node .label text,#mermaid-svg-MkuCt4cDr33cszSB .node .label text,#mermaid-svg-MkuCt4cDr33cszSB .image-shape .label,#mermaid-svg-MkuCt4cDr33cszSB .icon-shape .label{text-anchor:middle;}#mermaid-svg-MkuCt4cDr33cszSB .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .rough-node .label,#mermaid-svg-MkuCt4cDr33cszSB .node .label,#mermaid-svg-MkuCt4cDr33cszSB .image-shape .label,#mermaid-svg-MkuCt4cDr33cszSB .icon-shape .label{text-align:center;}#mermaid-svg-MkuCt4cDr33cszSB .node.clickable{cursor:pointer;}#mermaid-svg-MkuCt4cDr33cszSB .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-MkuCt4cDr33cszSB .arrowheadPath{fill:#333333;}#mermaid-svg-MkuCt4cDr33cszSB .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-MkuCt4cDr33cszSB .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-MkuCt4cDr33cszSB .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MkuCt4cDr33cszSB .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-MkuCt4cDr33cszSB .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MkuCt4cDr33cszSB .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-MkuCt4cDr33cszSB .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .cluster text{fill:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster span{color:#333;}#mermaid-svg-MkuCt4cDr33cszSB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-MkuCt4cDr33cszSB .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-MkuCt4cDr33cszSB rect.text{fill:none;stroke-width:0;}#mermaid-svg-MkuCt4cDr33cszSB .icon-shape,#mermaid-svg-MkuCt4cDr33cszSB .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MkuCt4cDr33cszSB .icon-shape p,#mermaid-svg-MkuCt4cDr33cszSB .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-MkuCt4cDr33cszSB .icon-shape .label rect,#mermaid-svg-MkuCt4cDr33cszSB .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MkuCt4cDr33cszSB .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-MkuCt4cDr33cszSB .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-MkuCt4cDr33cszSB :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 异常状态
消费者离线
生产速率 5000 msg/s
消费速率 0 msg/s
Lag 持续增长 ❌
正常状态
生产速率 5000 msg/s
消费速率 5000 msg/s
Lag 稳定 ✅
promql
# Topic 生产速率(消息/秒)
sum by (topic) (rate(kafka_topic_partition_current_offset[1m]))
# 消费速率
sum by (consumergroup, topic) (rate(kafka_consumer_group_current_offset[1m]))
判断逻辑 :当 生产速率 - 消费速率 > 0 且持续 10 分钟,说明消费者跟不上生产者,积压会持续增长。
4.3 Broker 负载
promql
# 磁盘使用量(字节)
kafka_log_log_size
# 集群 Broker 数量(正常应该等于预期节点数)
count(kafka_brokers)
# Leader 分布是否均匀(理想情况各 Broker 的 Leader 数相近)
count by (broker) (kafka_topic_partition_leader)
告警 :count(kafka_brokers) 突然减少 → 有 Broker 掉线。
4.4 消费者组状态与 Rebalance
#mermaid-svg-s9Jzo5kORch1HGTb{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-s9Jzo5kORch1HGTb .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-s9Jzo5kORch1HGTb .error-icon{fill:#552222;}#mermaid-svg-s9Jzo5kORch1HGTb .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-s9Jzo5kORch1HGTb .marker{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .marker.cross{stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-s9Jzo5kORch1HGTb p{margin:0;}#mermaid-svg-s9Jzo5kORch1HGTb defs #statediagram-barbEnd{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup text{fill:#9370DB;stroke:none;font-size:10px;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup text{fill:#333;stroke:none;font-size:10px;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup .state-title{font-weight:bolder;fill:#131300;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup rect{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup line{stroke:#333333;stroke-width:1;}#mermaid-svg-s9Jzo5kORch1HGTb .transition{stroke:#333333;stroke-width:1;fill:none;}#mermaid-svg-s9Jzo5kORch1HGTb .stateGroup .composit{fill:white;border-bottom:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .state-note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-s9Jzo5kORch1HGTb .state-note text{fill:black;stroke:none;font-size:10px;}#mermaid-svg-s9Jzo5kORch1HGTb .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5;}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel .label rect{fill:#ECECFF;opacity:0.5;}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel .label text{fill:#333;}#mermaid-svg-s9Jzo5kORch1HGTb .label div .edgeLabel{color:#333;}#mermaid-svg-s9Jzo5kORch1HGTb .stateLabel text{fill:#131300;font-size:10px;font-weight:bold;}#mermaid-svg-s9Jzo5kORch1HGTb .node circle.state-start{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .node .fork-join{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .node circle.state-end{fill:#9370DB;stroke:white;stroke-width:1.5;}#mermaid-svg-s9Jzo5kORch1HGTb .end-state-inner{fill:white;stroke-width:1.5;}#mermaid-svg-s9Jzo5kORch1HGTb .node rect{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .node polygon{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb #statediagram-barbEnd{fill:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster rect{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .cluster-label,#mermaid-svg-s9Jzo5kORch1HGTb .nodeLabel{color:#131300;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster rect.outer{rx:5px;ry:5px;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state .divider{stroke:#9370DB;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state .title-state{rx:5px;ry:5px;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster.statediagram-cluster .inner{fill:white;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster.statediagram-cluster-alt .inner{fill:#f0f0f0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster .inner{rx:0;ry:0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state rect.basic{rx:5px;ry:5px;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#f0f0f0;}#mermaid-svg-s9Jzo5kORch1HGTb .note-edge{stroke-dasharray:5;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note rect{fill:#fff5ad;stroke:#aaaa33;stroke-width:1px;rx:0;ry:0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note rect{fill:#fff5ad;stroke:#aaaa33;stroke-width:1px;rx:0;ry:0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note text{fill:black;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note .nodeLabel{color:black;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram .edgeLabel{color:red;}#mermaid-svg-s9Jzo5kORch1HGTb #dependencyStart,#mermaid-svg-s9Jzo5kORch1HGTb #dependencyEnd{fill:#333333;stroke:#333333;stroke-width:1;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagramTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-s9Jzo5kORch1HGTb :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 消费者组稳定运行
消费者加入/离开
重新分配完成
所有消费者离线
Stable
Rebalancing
Dead
正常消费中
暂停消费(重新分配分区)
Lag 无限增长(需要人工介入)
promql
# 消费者组成员数(0 表示没有活跃消费者)
kafka_consumer_group_members
# 消费者组是否完成 Rebalance
kafka_consumer_group_consumed_percent
最危险的信号 :kafka_consumer_group_members 突然从 3 降到 0 → 所有消费者下线,积压会无限增长。
五、Grafana 仪表盘关键配置
5.1 积压排行榜(单值图)
展示当前 Lag 最高的 5 个 Topic-消费者组组合:
promql
topk(5, sum by (consumergroup, topic) (kafka_consumer_lag))
做成表格,按 Lag 降序排列,一眼看到"谁在拖后腿"。
5.2 生产-消费速率对比(折线图)
用同一张图画两条线:
promql
# 生产速率(绿色)
sum(rate(kafka_topic_partition_current_offset{topic="order-events"}[1m]))
# 消费速率(红色)
sum(rate(kafka_consumer_group_current_offset{consumergroup="order-processor"}[1m]))
#mermaid-svg-E7e4FtHYfWytsxiA{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-E7e4FtHYfWytsxiA .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-E7e4FtHYfWytsxiA .error-icon{fill:#552222;}#mermaid-svg-E7e4FtHYfWytsxiA .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-E7e4FtHYfWytsxiA .marker{fill:#333333;stroke:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA .marker.cross{stroke:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-E7e4FtHYfWytsxiA p{margin:0;}#mermaid-svg-E7e4FtHYfWytsxiA .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster-label text{fill:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster-label span{color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster-label span p{background-color:transparent;}#mermaid-svg-E7e4FtHYfWytsxiA .label text,#mermaid-svg-E7e4FtHYfWytsxiA span{fill:#333;color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .node rect,#mermaid-svg-E7e4FtHYfWytsxiA .node circle,#mermaid-svg-E7e4FtHYfWytsxiA .node ellipse,#mermaid-svg-E7e4FtHYfWytsxiA .node polygon,#mermaid-svg-E7e4FtHYfWytsxiA .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .rough-node .label text,#mermaid-svg-E7e4FtHYfWytsxiA .node .label text,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape .label,#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape .label{text-anchor:middle;}#mermaid-svg-E7e4FtHYfWytsxiA .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .rough-node .label,#mermaid-svg-E7e4FtHYfWytsxiA .node .label,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape .label,#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape .label{text-align:center;}#mermaid-svg-E7e4FtHYfWytsxiA .node.clickable{cursor:pointer;}#mermaid-svg-E7e4FtHYfWytsxiA .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA .arrowheadPath{fill:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-E7e4FtHYfWytsxiA .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-E7e4FtHYfWytsxiA .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-E7e4FtHYfWytsxiA .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-E7e4FtHYfWytsxiA .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-E7e4FtHYfWytsxiA .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-E7e4FtHYfWytsxiA .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster text{fill:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster span{color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-E7e4FtHYfWytsxiA .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-E7e4FtHYfWytsxiA rect.text{fill:none;stroke-width:0;}#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape p,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape .label rect,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-E7e4FtHYfWytsxiA .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-E7e4FtHYfWytsxiA .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-E7e4FtHYfWytsxiA :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Grafana 折线图示例
📈 速率对比图
🟢 生产速率线
正常时与消费线持平
🔴 消费速率线
下降 = 消费者异常
⚠️ 两线分离
= 积压正在增长
当红线低于绿线时,积压会增长------这个图比单纯的 Lag 更直观。
5.3 Broker Leader 分布(柱状图)
promql
count by (broker) (kafka_topic_partition_leader)
用柱状图展示每个 Broker 上的 Leader Partition 数量。如果某个 Broker 明显高于其他,说明 Leader 分布不均,需要执行 kafka-preferred-replica-election。
5.4 消费者组存活状态(状态灯)
promql
kafka_consumer_group_members{consumergroup="order-processor"}
大于 0 亮绿灯,等于 0 亮红灯------比盯着 Lag 更早发现问题。
六、告警规则
#mermaid-svg-vu0nLNaNd123luSl{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-vu0nLNaNd123luSl .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-vu0nLNaNd123luSl .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-vu0nLNaNd123luSl .error-icon{fill:#552222;}#mermaid-svg-vu0nLNaNd123luSl .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-vu0nLNaNd123luSl .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-vu0nLNaNd123luSl .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-vu0nLNaNd123luSl .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-vu0nLNaNd123luSl .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-vu0nLNaNd123luSl .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-vu0nLNaNd123luSl .marker{fill:#333333;stroke:#333333;}#mermaid-svg-vu0nLNaNd123luSl .marker.cross{stroke:#333333;}#mermaid-svg-vu0nLNaNd123luSl svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-vu0nLNaNd123luSl p{margin:0;}#mermaid-svg-vu0nLNaNd123luSl .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster-label text{fill:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster-label span{color:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster-label span p{background-color:transparent;}#mermaid-svg-vu0nLNaNd123luSl .label text,#mermaid-svg-vu0nLNaNd123luSl span{fill:#333;color:#333;}#mermaid-svg-vu0nLNaNd123luSl .node rect,#mermaid-svg-vu0nLNaNd123luSl .node circle,#mermaid-svg-vu0nLNaNd123luSl .node ellipse,#mermaid-svg-vu0nLNaNd123luSl .node polygon,#mermaid-svg-vu0nLNaNd123luSl .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .rough-node .label text,#mermaid-svg-vu0nLNaNd123luSl .node .label text,#mermaid-svg-vu0nLNaNd123luSl .image-shape .label,#mermaid-svg-vu0nLNaNd123luSl .icon-shape .label{text-anchor:middle;}#mermaid-svg-vu0nLNaNd123luSl .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .rough-node .label,#mermaid-svg-vu0nLNaNd123luSl .node .label,#mermaid-svg-vu0nLNaNd123luSl .image-shape .label,#mermaid-svg-vu0nLNaNd123luSl .icon-shape .label{text-align:center;}#mermaid-svg-vu0nLNaNd123luSl .node.clickable{cursor:pointer;}#mermaid-svg-vu0nLNaNd123luSl .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-vu0nLNaNd123luSl .arrowheadPath{fill:#333333;}#mermaid-svg-vu0nLNaNd123luSl .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-vu0nLNaNd123luSl .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-vu0nLNaNd123luSl .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vu0nLNaNd123luSl .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-vu0nLNaNd123luSl .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vu0nLNaNd123luSl .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-vu0nLNaNd123luSl .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .cluster text{fill:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster span{color:#333;}#mermaid-svg-vu0nLNaNd123luSl div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-vu0nLNaNd123luSl .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-vu0nLNaNd123luSl rect.text{fill:none;stroke-width:0;}#mermaid-svg-vu0nLNaNd123luSl .icon-shape,#mermaid-svg-vu0nLNaNd123luSl .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vu0nLNaNd123luSl .icon-shape p,#mermaid-svg-vu0nLNaNd123luSl .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-vu0nLNaNd123luSl .icon-shape .label rect,#mermaid-svg-vu0nLNaNd123luSl .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vu0nLNaNd123luSl .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-vu0nLNaNd123luSl .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-vu0nLNaNd123luSl :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是
否
是
否
是
否
是
否
是
否
Kafka 指标采集
Lag > 50000?
🔔 KafkaHighLag
Topic 积压告警
Lag 增长速度 > 1000/s?
🔔 KafkaLagIncreasing
积压增速告警
消费者组成员 = 0?
🔔 KafkaNoConsumer
消费者全部离线
Broker 数量减少?
🔔 KafkaBrokerDown
Broker 掉线
磁盘使用 > 90%?
🔔 KafkaDiskFull
磁盘空间告警
✅ 集群健康
yaml
groups:
- name: kafka-alerts
rules:
# 积压超过阈值
- alert: KafkaHighLag
expr: sum by (consumergroup, topic) (kafka_consumer_lag) > 50000
for: 5m
annotations:
summary: "Topic {{ $labels.topic }} 积压 {{ $value }} 条"
# 积压增长速度过快
- alert: KafkaLagIncreasing
expr: sum by (consumergroup) (rate(kafka_consumer_lag[5m])) > 1000
for: 3m
annotations:
summary: "消费者组 {{ $labels.consumergroup }} 每秒积压 {{ $value }} 条"
# 消费者全部下线
- alert: KafkaNoConsumer
expr: kafka_consumer_group_members == 0
for: 1m
annotations:
summary: "消费者组 {{ $labels.consumergroup }} 没有活跃消费者"
# Broker 掉线
- alert: KafkaBrokerDown
expr: count(kafka_brokers) < 3
for: 1m
annotations:
summary: "Kafka Brokers 数量不足,当前 {{ $value }}"
# 磁盘快满了
- alert: KafkaDiskFull
expr: kafka_log_log_size / 1024^3 > 450 # 假设单盘 500GB
for: 10m
annotations:
summary: "Broker {{ $labels.broker }} 磁盘使用超过 450GB"
七、一次积压故障的完整排查
7.1 故障排查流程
#mermaid-svg-rsiLPzrBzyWGSTKF{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-rsiLPzrBzyWGSTKF .error-icon{fill:#552222;}#mermaid-svg-rsiLPzrBzyWGSTKF .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-rsiLPzrBzyWGSTKF .marker{fill:#333333;stroke:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF .marker.cross{stroke:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-rsiLPzrBzyWGSTKF p{margin:0;}#mermaid-svg-rsiLPzrBzyWGSTKF .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster-label text{fill:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster-label span{color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster-label span p{background-color:transparent;}#mermaid-svg-rsiLPzrBzyWGSTKF .label text,#mermaid-svg-rsiLPzrBzyWGSTKF span{fill:#333;color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .node rect,#mermaid-svg-rsiLPzrBzyWGSTKF .node circle,#mermaid-svg-rsiLPzrBzyWGSTKF .node ellipse,#mermaid-svg-rsiLPzrBzyWGSTKF .node polygon,#mermaid-svg-rsiLPzrBzyWGSTKF .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .rough-node .label text,#mermaid-svg-rsiLPzrBzyWGSTKF .node .label text,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape .label,#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape .label{text-anchor:middle;}#mermaid-svg-rsiLPzrBzyWGSTKF .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .rough-node .label,#mermaid-svg-rsiLPzrBzyWGSTKF .node .label,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape .label,#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape .label{text-align:center;}#mermaid-svg-rsiLPzrBzyWGSTKF .node.clickable{cursor:pointer;}#mermaid-svg-rsiLPzrBzyWGSTKF .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF .arrowheadPath{fill:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-rsiLPzrBzyWGSTKF .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-rsiLPzrBzyWGSTKF .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-rsiLPzrBzyWGSTKF .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-rsiLPzrBzyWGSTKF .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-rsiLPzrBzyWGSTKF .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster text{fill:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster span{color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-rsiLPzrBzyWGSTKF .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF rect.text{fill:none;stroke-width:0;}#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape p,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape .label rect,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-rsiLPzrBzyWGSTKF .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-rsiLPzrBzyWGSTKF .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-rsiLPzrBzyWGSTKF :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 告警触发
order-events 积压暴涨
Step 1: 积压排行榜
sum by (consumergroup) (kafka_consumer_lag)
发现: order-processor 贡献 83000
Step 2: 看消费速率
rate(kafka_consumer_group_current_offset...)
发现: 消费速率为 0
Step 3: 看消费者组成员
kafka_consumer_group_members
发现: 成员数为 0
结论: 消费者全部离线
不是消费慢的问题
Step 4: 检查下游 Pod 状态
发现: Pod 因 OOM 全部重启
解决: 重启下游服务
消费者组自动 Rebalance
结果: Lag 开始下降
→ 恢复正常
7.2 排查步骤明细
| 步骤 | 操作 | 发现 |
|---|---|---|
| Step 1 | 看积压排行榜:sum by (consumergroup) (kafka_consumer_lag) |
order-processor 贡献了 83000 |
| Step 2 | 看消费速率:rate(kafka_consumer_group_current_offset...) |
速率为 0,完全没有消费 |
| Step 3 | 看消费者组成员:kafka_consumer_group_members{...} |
结果为 0 |
| 结论 | 消费者全部下线,不是消费慢的问题 | --- |
| Step 4 | 检查下游服务 | 该服务的 Pod 因 OOM 全部重启,消费者来不及重新加入组 |
7.3 解决
重启下游服务后,消费者组自动 Rebalance,Lag 开始下降。
7.4 复盘
如果有消费者组成员的告警,这个问题在消费者下线的那一刻就会触发,而不是等到积压到 8 万条。
八、总结
监控 Kafka,服务器 CPU 和内存是最不重要的指标------它们正常,积压照样能到几十万。
#mermaid-svg-uJWK7Pm7qJ8s61w2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .error-icon{fill:#552222;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .marker.cross{stroke:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 p{margin:0;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster-label text{fill:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster-label span{color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster-label span p{background-color:transparent;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .label text,#mermaid-svg-uJWK7Pm7qJ8s61w2 span{fill:#333;color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node rect,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node circle,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node ellipse,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node polygon,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .rough-node .label text,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .label text,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .rough-node .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape .label{text-align:center;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node.clickable{cursor:pointer;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .arrowheadPath{fill:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster text{fill:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster span{color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape p,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape .label rect,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-uJWK7Pm7qJ8s61w2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} kafka_exporter 核心能力
📊 Lag 监控
谁在积压?积压了多少?
⚡ 吞吐量对比
生产和消费哪个快?
👥 消费者组状态
消费者还活着吗?
🖥️ Broker 负载
集群均衡吗?
Lag > 50000 → 告警
消费速率为 0 → 告警
成员数为 0 → 紧急告警
Broker 掉线 → 告警
kafka_exporter 带来的核心能力:
| 维度 | 说明 |
|---|---|
| Lag | 谁在积压、积压了多少 |
| 吞吐量对比 | 生产和消费哪个快 |
| 消费者组状态 | 消费者是不是还活着 |
| Broker 负载 | 集群是不是均衡 |
记住:Kafka"活着"不等于"健康" 。消费者下线了,集群本身不会报警------它只会默默帮你把消息存着,直到你的磁盘爆满。让
kafka_exporter成为你的第一道防线。