Kafka消息队列监控:Topic积压、吞吐量、Broker负载及消费者组全观测

Kafka 消息队列监控:Topic 积压、吞吐量、Broker 负载及消费者组全观测

一句话先读懂本文 :某电商大促当天,订单 Topic 积压 200 万条,三个消费者组全部卡死,但 Kafka 集群 CPU 只有 30%------监控"正常",业务已经崩了。kafka_exporter 告诉你:问题不在机器,在消息。


一、大促当天,订单 Topic 积压了 200 万条

某电商平台双十一大促,凌晨两点,值班人员收到大量用户投诉:"支付成功但订单状态未更新"。

运维登录 Kafka 集群查看------三个 Broker 的 CPU 使用率均在 30% 左右,内存正常,磁盘未满。从服务器监控看,集群"非常健康"。但打开业务监控,订单 Topic order_paid 的消费积压(Lag)已经达到 200 万条,且还在以每秒 5000 条的速度增长。

查消费者组状态:order_status_consumer 组的三个消费者全部离线,但没有任何告警触发。Kafka 集群本身不会告诉你"消费端挂了"------它只会帮你把消息存着,直到磁盘爆满。
#mermaid-svg-uAwCsARQJgwj65U3{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-uAwCsARQJgwj65U3 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-uAwCsARQJgwj65U3 .error-icon{fill:#552222;}#mermaid-svg-uAwCsARQJgwj65U3 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uAwCsARQJgwj65U3 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uAwCsARQJgwj65U3 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uAwCsARQJgwj65U3 .marker.cross{stroke:#333333;}#mermaid-svg-uAwCsARQJgwj65U3 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uAwCsARQJgwj65U3 p{margin:0;}#mermaid-svg-uAwCsARQJgwj65U3 .mermaid-main-font{font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .exclude-range{fill:#eeeeee;}#mermaid-svg-uAwCsARQJgwj65U3 .section{stroke:none;opacity:0.2;}#mermaid-svg-uAwCsARQJgwj65U3 .section0{fill:rgba(102, 102, 255, 0.49);}#mermaid-svg-uAwCsARQJgwj65U3 .section2{fill:#fff400;}#mermaid-svg-uAwCsARQJgwj65U3 .section1,#mermaid-svg-uAwCsARQJgwj65U3 .section3{fill:white;opacity:0.2;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle0{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle1{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle2{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle3{fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .sectionTitle{text-anchor:start;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .grid .tick{stroke:lightgrey;opacity:0.8;shape-rendering:crispEdges;}#mermaid-svg-uAwCsARQJgwj65U3 .grid .tick text{font-family:"trebuchet ms",verdana,arial,sans-serif;fill:#333;}#mermaid-svg-uAwCsARQJgwj65U3 .grid path{stroke-width:0;}#mermaid-svg-uAwCsARQJgwj65U3 .today{fill:none;stroke:red;stroke-width:2px;}#mermaid-svg-uAwCsARQJgwj65U3 .task{stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .taskText{text-anchor:middle;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideRight{fill:black;text-anchor:start;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideLeft{fill:black;text-anchor:end;}#mermaid-svg-uAwCsARQJgwj65U3 .task.clickable{cursor:pointer;}#mermaid-svg-uAwCsARQJgwj65U3 .taskText.clickable{cursor:pointer;fill:#003163!important;font-weight:bold;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163!important;font-weight:bold;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163!important;font-weight:bold;}#mermaid-svg-uAwCsARQJgwj65U3 .taskText0,#mermaid-svg-uAwCsARQJgwj65U3 .taskText1,#mermaid-svg-uAwCsARQJgwj65U3 .taskText2,#mermaid-svg-uAwCsARQJgwj65U3 .taskText3{fill:white;}#mermaid-svg-uAwCsARQJgwj65U3 .task0,#mermaid-svg-uAwCsARQJgwj65U3 .task1,#mermaid-svg-uAwCsARQJgwj65U3 .task2,#mermaid-svg-uAwCsARQJgwj65U3 .task3{fill:#8a90dd;stroke:#534fbc;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside0,#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside2{fill:black;}#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside1,#mermaid-svg-uAwCsARQJgwj65U3 .taskTextOutside3{fill:black;}#mermaid-svg-uAwCsARQJgwj65U3 .active0,#mermaid-svg-uAwCsARQJgwj65U3 .active1,#mermaid-svg-uAwCsARQJgwj65U3 .active2,#mermaid-svg-uAwCsARQJgwj65U3 .active3{fill:#bfc7ff;stroke:#534fbc;}#mermaid-svg-uAwCsARQJgwj65U3 .activeText0,#mermaid-svg-uAwCsARQJgwj65U3 .activeText1,#mermaid-svg-uAwCsARQJgwj65U3 .activeText2,#mermaid-svg-uAwCsARQJgwj65U3 .activeText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .done0,#mermaid-svg-uAwCsARQJgwj65U3 .done1,#mermaid-svg-uAwCsARQJgwj65U3 .done2,#mermaid-svg-uAwCsARQJgwj65U3 .done3{stroke:grey;fill:lightgrey;stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .doneText0,#mermaid-svg-uAwCsARQJgwj65U3 .doneText1,#mermaid-svg-uAwCsARQJgwj65U3 .doneText2,#mermaid-svg-uAwCsARQJgwj65U3 .doneText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .doneText0.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText0.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneText1.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText1.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneText2.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText2.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneText3.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneText3.taskTextOutsideRight{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .crit0,#mermaid-svg-uAwCsARQJgwj65U3 .crit1,#mermaid-svg-uAwCsARQJgwj65U3 .crit2,#mermaid-svg-uAwCsARQJgwj65U3 .crit3{stroke:#ff8888;fill:red;stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit0,#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit1,#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit2,#mermaid-svg-uAwCsARQJgwj65U3 .activeCrit3{stroke:#ff8888;fill:#bfc7ff;stroke-width:2;}#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit0,#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit1,#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit2,#mermaid-svg-uAwCsARQJgwj65U3 .doneCrit3{stroke:#ff8888;fill:lightgrey;stroke-width:2;cursor:pointer;shape-rendering:crispEdges;}#mermaid-svg-uAwCsARQJgwj65U3 .milestone{transform:rotate(45deg) scale(0.8,0.8);}#mermaid-svg-uAwCsARQJgwj65U3 .milestoneText{font-style:italic;}#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText0,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText1,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText2,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText0.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText0.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText1.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText1.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText2.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText2.taskTextOutsideRight,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText3.taskTextOutsideLeft,#mermaid-svg-uAwCsARQJgwj65U3 .doneCritText3.taskTextOutsideRight{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .vert{stroke:navy;}#mermaid-svg-uAwCsARQJgwj65U3 .vertText{font-size:15px;text-anchor:middle;fill:navy!important;}#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText0,#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText1,#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText2,#mermaid-svg-uAwCsARQJgwj65U3 .activeCritText3{fill:black!important;}#mermaid-svg-uAwCsARQJgwj65U3 .titleText{text-anchor:middle;font-size:18px;fill:#333;font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-svg-uAwCsARQJgwj65U3 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 消费者组全部离线 消息开始积压 Lag 达到 10 万条 Lag 达到 100 万条 人工发现并介入 消费者重启 Lag 下降 故障阶段修复阶段 故障时间线

这个故障持续了 47 分钟 才被发现和修复。

如果有这样一套监控体系------能实时展示每个 Topic 的积压量、每个消费者组的存活状态、生产和消费的速率对比------这个故障的发现时间可以从 47 分钟缩短到 3 分钟

下面这套基于 kafka_exporter 的监控方案,正是为了解决这类问题。


二、Kafka 原理

2.1 消息流转与积压原理

#mermaid-svg-m2nonpJMECVwsurZ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-m2nonpJMECVwsurZ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-m2nonpJMECVwsurZ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-m2nonpJMECVwsurZ .error-icon{fill:#552222;}#mermaid-svg-m2nonpJMECVwsurZ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-m2nonpJMECVwsurZ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-m2nonpJMECVwsurZ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-m2nonpJMECVwsurZ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-m2nonpJMECVwsurZ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-m2nonpJMECVwsurZ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-m2nonpJMECVwsurZ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-m2nonpJMECVwsurZ .marker.cross{stroke:#333333;}#mermaid-svg-m2nonpJMECVwsurZ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-m2nonpJMECVwsurZ p{margin:0;}#mermaid-svg-m2nonpJMECVwsurZ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster-label text{fill:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster-label span{color:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster-label span p{background-color:transparent;}#mermaid-svg-m2nonpJMECVwsurZ .label text,#mermaid-svg-m2nonpJMECVwsurZ span{fill:#333;color:#333;}#mermaid-svg-m2nonpJMECVwsurZ .node rect,#mermaid-svg-m2nonpJMECVwsurZ .node circle,#mermaid-svg-m2nonpJMECVwsurZ .node ellipse,#mermaid-svg-m2nonpJMECVwsurZ .node polygon,#mermaid-svg-m2nonpJMECVwsurZ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .rough-node .label text,#mermaid-svg-m2nonpJMECVwsurZ .node .label text,#mermaid-svg-m2nonpJMECVwsurZ .image-shape .label,#mermaid-svg-m2nonpJMECVwsurZ .icon-shape .label{text-anchor:middle;}#mermaid-svg-m2nonpJMECVwsurZ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .rough-node .label,#mermaid-svg-m2nonpJMECVwsurZ .node .label,#mermaid-svg-m2nonpJMECVwsurZ .image-shape .label,#mermaid-svg-m2nonpJMECVwsurZ .icon-shape .label{text-align:center;}#mermaid-svg-m2nonpJMECVwsurZ .node.clickable{cursor:pointer;}#mermaid-svg-m2nonpJMECVwsurZ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-m2nonpJMECVwsurZ .arrowheadPath{fill:#333333;}#mermaid-svg-m2nonpJMECVwsurZ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-m2nonpJMECVwsurZ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-m2nonpJMECVwsurZ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-m2nonpJMECVwsurZ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-m2nonpJMECVwsurZ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-m2nonpJMECVwsurZ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-m2nonpJMECVwsurZ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-m2nonpJMECVwsurZ .cluster text{fill:#333;}#mermaid-svg-m2nonpJMECVwsurZ .cluster span{color:#333;}#mermaid-svg-m2nonpJMECVwsurZ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-m2nonpJMECVwsurZ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-m2nonpJMECVwsurZ rect.text{fill:none;stroke-width:0;}#mermaid-svg-m2nonpJMECVwsurZ .icon-shape,#mermaid-svg-m2nonpJMECVwsurZ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-m2nonpJMECVwsurZ .icon-shape p,#mermaid-svg-m2nonpJMECVwsurZ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-m2nonpJMECVwsurZ .icon-shape .label rect,#mermaid-svg-m2nonpJMECVwsurZ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-m2nonpJMECVwsurZ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-m2nonpJMECVwsurZ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-m2nonpJMECVwsurZ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 消费端
Kafka Broker
生产端
积压 Lag
Lag = 已生产 - 已消费
积压 > 0 且持续增长

= 消费者跟不上生产者
Producer 生产者
Topic
Partition 0
Partition 1
Partition 2
Consumer 0
Consumer 1
Consumer 2

2.2 使用场景与监控侧重点

在不同业务场景中,Kafka 扮演着不同角色,监控的侧重点也完全不同。

企业场景 典型 Topic 最关心的指标 为什么
电商订单 order_created, order_paid 消费者组 Lag、消费速率 积压直接导致订单状态不同步
日志采集 app_logs, access_log 生产吞吐量、Broker 磁盘 日志量突增会打满磁盘
金融交易 transaction_raw 端到端延迟、分区 Leader 分布 延迟敏感,Leader 倾斜影响性能
数据同步 cdc_user, cdc_order Lag 增长趋势、消费者组成员 同步中断会导致数仓数据延迟

某头部电商平台,未配置 Kafka 监控前,每月平均发生 3 次因消费者组离线导致的积压故障,平均恢复时间 45 分钟。配置完整监控告警后,同类故障的发现时间缩短至 2 分钟,恢复时间降至 8 分钟。

2.3 监控架构图


#mermaid-svg-FKOXH79Q6hVFQMei{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-FKOXH79Q6hVFQMei .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-FKOXH79Q6hVFQMei .error-icon{fill:#552222;}#mermaid-svg-FKOXH79Q6hVFQMei .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-FKOXH79Q6hVFQMei .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-FKOXH79Q6hVFQMei .marker{fill:#333333;stroke:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei .marker.cross{stroke:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-FKOXH79Q6hVFQMei p{margin:0;}#mermaid-svg-FKOXH79Q6hVFQMei .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster-label text{fill:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster-label span{color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster-label span p{background-color:transparent;}#mermaid-svg-FKOXH79Q6hVFQMei .label text,#mermaid-svg-FKOXH79Q6hVFQMei span{fill:#333;color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .node rect,#mermaid-svg-FKOXH79Q6hVFQMei .node circle,#mermaid-svg-FKOXH79Q6hVFQMei .node ellipse,#mermaid-svg-FKOXH79Q6hVFQMei .node polygon,#mermaid-svg-FKOXH79Q6hVFQMei .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .rough-node .label text,#mermaid-svg-FKOXH79Q6hVFQMei .node .label text,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape .label,#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape .label{text-anchor:middle;}#mermaid-svg-FKOXH79Q6hVFQMei .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .rough-node .label,#mermaid-svg-FKOXH79Q6hVFQMei .node .label,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape .label,#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape .label{text-align:center;}#mermaid-svg-FKOXH79Q6hVFQMei .node.clickable{cursor:pointer;}#mermaid-svg-FKOXH79Q6hVFQMei .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei .arrowheadPath{fill:#333333;}#mermaid-svg-FKOXH79Q6hVFQMei .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-FKOXH79Q6hVFQMei .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-FKOXH79Q6hVFQMei .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-FKOXH79Q6hVFQMei .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-FKOXH79Q6hVFQMei .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-FKOXH79Q6hVFQMei .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-FKOXH79Q6hVFQMei .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster text{fill:#333;}#mermaid-svg-FKOXH79Q6hVFQMei .cluster span{color:#333;}#mermaid-svg-FKOXH79Q6hVFQMei div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-FKOXH79Q6hVFQMei .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-FKOXH79Q6hVFQMei rect.text{fill:none;stroke-width:0;}#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape p,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-FKOXH79Q6hVFQMei .icon-shape .label rect,#mermaid-svg-FKOXH79Q6hVFQMei .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-FKOXH79Q6hVFQMei .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-FKOXH79Q6hVFQMei .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-FKOXH79Q6hVFQMei :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 可视化
存储与告警
指标采集层
Kafka 集群
Broker 1
Broker 2
Broker 3
kafka_exporter

Lag / 吞吐量 / 消费者组
JMX Exporter

JVM / 磁盘 / 请求率
Prometheus

指标存储 + 告警
Grafana 仪表盘
告警通知

  • 橙色 :两类采集器------kafka_exporter(消费者组 Lag 和 Topic 吞吐)+ JMX Exporter(Broker 内部 JVM 和磁盘指标)
  • 绿色:Prometheus + Grafana 的存储展示层

两个 Exporter 配合,既能看到"队列里有多少消息没消费",也能看到"Broker 是不是快扛不住了"。


三、kafka_exporter 实战部署

3.1 容器化部署

yaml 复制代码
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafka-exporter
  template:
    metadata:
      labels:
        app: kafka-exporter
    spec:
      containers:
      - name: exporter
        image: danielqsj/kafka-exporter:latest
        args:
        - --kafka.server=kafka-broker-1:9092
        - --kafka.server=kafka-broker-2:9092
        - --kafka.server=kafka-broker-3:9092
        - --kafka.version=2.8.0
        - --topic.filter=.*          # 监控所有 Topic
        - --group.filter=.*          # 监控所有消费者组
        - --sasl.enabled=false
        ports:
        - containerPort: 9308
          name: metrics

参数说明:

参数 说明
--kafka.server 可以指定多个 Broker,Exporter 会自动发现集群
--topic.filter 用正则过滤 Topic,生产环境建议排除内部 Topic(如 __consumer_offsets
--group.filter 同上,过滤不需要监控的消费者组

3.2 Prometheus 采集配置

yaml 复制代码
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kafka-exporter-monitor
spec:
  selector:
    matchLabels:
      app: kafka-exporter
  endpoints:
  - port: metrics
    interval: 30s        # Lag 指标 30 秒采集足够
    scrapeTimeout: 10s

3.3 验证指标

在 Prometheus UI 中查询:

promql 复制代码
# 查看所有消费者组的 Lag
kafka_consumer_lag

# 应该返回类似:
# {consumergroup="order-processor", topic="order-events", partition="0"} 12500
# {consumergroup="order-processor", topic="order-events", partition="1"} 8700

四、核心指标与 PromQL

4.1 Topic 积压量------最关键的告警指标

promql 复制代码
# 单个消费者组的总 Lag(所有分区求和)
sum by (consumergroup, topic) (kafka_consumer_lag)

# Lag 增长速度(每秒积压多少条)
sum by (consumergroup, topic) (rate(kafka_consumer_lag[1m]))

Lag 不等于 0 不代表有问题,长期稳定的少量积压是正常的。告警应该关注:

  • 单个 Partition 的 Lag > 10000
  • Lag 在 5 分钟内增长了 5000 以上

4.2 吞吐量监控------生产-消费速率对比

#mermaid-svg-MkuCt4cDr33cszSB{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-MkuCt4cDr33cszSB .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-MkuCt4cDr33cszSB .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-MkuCt4cDr33cszSB .error-icon{fill:#552222;}#mermaid-svg-MkuCt4cDr33cszSB .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-MkuCt4cDr33cszSB .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-MkuCt4cDr33cszSB .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-MkuCt4cDr33cszSB .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-MkuCt4cDr33cszSB .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-MkuCt4cDr33cszSB .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-MkuCt4cDr33cszSB .marker{fill:#333333;stroke:#333333;}#mermaid-svg-MkuCt4cDr33cszSB .marker.cross{stroke:#333333;}#mermaid-svg-MkuCt4cDr33cszSB svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-MkuCt4cDr33cszSB p{margin:0;}#mermaid-svg-MkuCt4cDr33cszSB .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster-label text{fill:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster-label span{color:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster-label span p{background-color:transparent;}#mermaid-svg-MkuCt4cDr33cszSB .label text,#mermaid-svg-MkuCt4cDr33cszSB span{fill:#333;color:#333;}#mermaid-svg-MkuCt4cDr33cszSB .node rect,#mermaid-svg-MkuCt4cDr33cszSB .node circle,#mermaid-svg-MkuCt4cDr33cszSB .node ellipse,#mermaid-svg-MkuCt4cDr33cszSB .node polygon,#mermaid-svg-MkuCt4cDr33cszSB .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .rough-node .label text,#mermaid-svg-MkuCt4cDr33cszSB .node .label text,#mermaid-svg-MkuCt4cDr33cszSB .image-shape .label,#mermaid-svg-MkuCt4cDr33cszSB .icon-shape .label{text-anchor:middle;}#mermaid-svg-MkuCt4cDr33cszSB .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .rough-node .label,#mermaid-svg-MkuCt4cDr33cszSB .node .label,#mermaid-svg-MkuCt4cDr33cszSB .image-shape .label,#mermaid-svg-MkuCt4cDr33cszSB .icon-shape .label{text-align:center;}#mermaid-svg-MkuCt4cDr33cszSB .node.clickable{cursor:pointer;}#mermaid-svg-MkuCt4cDr33cszSB .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-MkuCt4cDr33cszSB .arrowheadPath{fill:#333333;}#mermaid-svg-MkuCt4cDr33cszSB .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-MkuCt4cDr33cszSB .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-MkuCt4cDr33cszSB .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MkuCt4cDr33cszSB .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-MkuCt4cDr33cszSB .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MkuCt4cDr33cszSB .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-MkuCt4cDr33cszSB .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-MkuCt4cDr33cszSB .cluster text{fill:#333;}#mermaid-svg-MkuCt4cDr33cszSB .cluster span{color:#333;}#mermaid-svg-MkuCt4cDr33cszSB div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-MkuCt4cDr33cszSB .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-MkuCt4cDr33cszSB rect.text{fill:none;stroke-width:0;}#mermaid-svg-MkuCt4cDr33cszSB .icon-shape,#mermaid-svg-MkuCt4cDr33cszSB .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MkuCt4cDr33cszSB .icon-shape p,#mermaid-svg-MkuCt4cDr33cszSB .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-MkuCt4cDr33cszSB .icon-shape .label rect,#mermaid-svg-MkuCt4cDr33cszSB .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MkuCt4cDr33cszSB .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-MkuCt4cDr33cszSB .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-MkuCt4cDr33cszSB :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 异常状态
消费者离线
生产速率 5000 msg/s
消费速率 0 msg/s
Lag 持续增长 ❌
正常状态
生产速率 5000 msg/s
消费速率 5000 msg/s
Lag 稳定 ✅

promql 复制代码
# Topic 生产速率(消息/秒)
sum by (topic) (rate(kafka_topic_partition_current_offset[1m]))

# 消费速率
sum by (consumergroup, topic) (rate(kafka_consumer_group_current_offset[1m]))

判断逻辑 :当 生产速率 - 消费速率 > 0 且持续 10 分钟,说明消费者跟不上生产者,积压会持续增长。

4.3 Broker 负载

promql 复制代码
# 磁盘使用量(字节)
kafka_log_log_size

# 集群 Broker 数量(正常应该等于预期节点数)
count(kafka_brokers)

# Leader 分布是否均匀(理想情况各 Broker 的 Leader 数相近)
count by (broker) (kafka_topic_partition_leader)

告警count(kafka_brokers) 突然减少 → 有 Broker 掉线。

4.4 消费者组状态与 Rebalance

#mermaid-svg-s9Jzo5kORch1HGTb{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-s9Jzo5kORch1HGTb .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-s9Jzo5kORch1HGTb .error-icon{fill:#552222;}#mermaid-svg-s9Jzo5kORch1HGTb .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-s9Jzo5kORch1HGTb .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-s9Jzo5kORch1HGTb .marker{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .marker.cross{stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-s9Jzo5kORch1HGTb p{margin:0;}#mermaid-svg-s9Jzo5kORch1HGTb defs #statediagram-barbEnd{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup text{fill:#9370DB;stroke:none;font-size:10px;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup text{fill:#333;stroke:none;font-size:10px;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup .state-title{font-weight:bolder;fill:#131300;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup rect{fill:#ECECFF;stroke:#9370DB;}#mermaid-svg-s9Jzo5kORch1HGTb g.stateGroup line{stroke:#333333;stroke-width:1;}#mermaid-svg-s9Jzo5kORch1HGTb .transition{stroke:#333333;stroke-width:1;fill:none;}#mermaid-svg-s9Jzo5kORch1HGTb .stateGroup .composit{fill:white;border-bottom:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .state-note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-s9Jzo5kORch1HGTb .state-note text{fill:black;stroke:none;font-size:10px;}#mermaid-svg-s9Jzo5kORch1HGTb .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5;}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel .label rect{fill:#ECECFF;opacity:0.5;}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-s9Jzo5kORch1HGTb .edgeLabel .label text{fill:#333;}#mermaid-svg-s9Jzo5kORch1HGTb .label div .edgeLabel{color:#333;}#mermaid-svg-s9Jzo5kORch1HGTb .stateLabel text{fill:#131300;font-size:10px;font-weight:bold;}#mermaid-svg-s9Jzo5kORch1HGTb .node circle.state-start{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .node .fork-join{fill:#333333;stroke:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .node circle.state-end{fill:#9370DB;stroke:white;stroke-width:1.5;}#mermaid-svg-s9Jzo5kORch1HGTb .end-state-inner{fill:white;stroke-width:1.5;}#mermaid-svg-s9Jzo5kORch1HGTb .node rect{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .node polygon{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb #statediagram-barbEnd{fill:#333333;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster rect{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-s9Jzo5kORch1HGTb .cluster-label,#mermaid-svg-s9Jzo5kORch1HGTb .nodeLabel{color:#131300;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster rect.outer{rx:5px;ry:5px;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state .divider{stroke:#9370DB;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state .title-state{rx:5px;ry:5px;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster.statediagram-cluster .inner{fill:white;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster.statediagram-cluster-alt .inner{fill:#f0f0f0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-cluster .inner{rx:0;ry:0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state rect.basic{rx:5px;ry:5px;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#f0f0f0;}#mermaid-svg-s9Jzo5kORch1HGTb .note-edge{stroke-dasharray:5;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note rect{fill:#fff5ad;stroke:#aaaa33;stroke-width:1px;rx:0;ry:0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note rect{fill:#fff5ad;stroke:#aaaa33;stroke-width:1px;rx:0;ry:0;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note text{fill:black;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram-note .nodeLabel{color:black;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagram .edgeLabel{color:red;}#mermaid-svg-s9Jzo5kORch1HGTb #dependencyStart,#mermaid-svg-s9Jzo5kORch1HGTb #dependencyEnd{fill:#333333;stroke:#333333;stroke-width:1;}#mermaid-svg-s9Jzo5kORch1HGTb .statediagramTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-s9Jzo5kORch1HGTb :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 消费者组稳定运行
消费者加入/离开
重新分配完成
所有消费者离线
Stable
Rebalancing
Dead
正常消费中
暂停消费(重新分配分区)
Lag 无限增长(需要人工介入)

promql 复制代码
# 消费者组成员数(0 表示没有活跃消费者)
kafka_consumer_group_members

# 消费者组是否完成 Rebalance
kafka_consumer_group_consumed_percent

最危险的信号kafka_consumer_group_members 突然从 3 降到 0 → 所有消费者下线,积压会无限增长。


五、Grafana 仪表盘关键配置

5.1 积压排行榜(单值图)

展示当前 Lag 最高的 5 个 Topic-消费者组组合:

promql 复制代码
topk(5, sum by (consumergroup, topic) (kafka_consumer_lag))

做成表格,按 Lag 降序排列,一眼看到"谁在拖后腿"。

5.2 生产-消费速率对比(折线图)

用同一张图画两条线:

promql 复制代码
# 生产速率(绿色)
sum(rate(kafka_topic_partition_current_offset{topic="order-events"}[1m]))

# 消费速率(红色)
sum(rate(kafka_consumer_group_current_offset{consumergroup="order-processor"}[1m]))

#mermaid-svg-E7e4FtHYfWytsxiA{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-E7e4FtHYfWytsxiA .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-E7e4FtHYfWytsxiA .error-icon{fill:#552222;}#mermaid-svg-E7e4FtHYfWytsxiA .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-E7e4FtHYfWytsxiA .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-E7e4FtHYfWytsxiA .marker{fill:#333333;stroke:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA .marker.cross{stroke:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-E7e4FtHYfWytsxiA p{margin:0;}#mermaid-svg-E7e4FtHYfWytsxiA .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster-label text{fill:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster-label span{color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster-label span p{background-color:transparent;}#mermaid-svg-E7e4FtHYfWytsxiA .label text,#mermaid-svg-E7e4FtHYfWytsxiA span{fill:#333;color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .node rect,#mermaid-svg-E7e4FtHYfWytsxiA .node circle,#mermaid-svg-E7e4FtHYfWytsxiA .node ellipse,#mermaid-svg-E7e4FtHYfWytsxiA .node polygon,#mermaid-svg-E7e4FtHYfWytsxiA .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .rough-node .label text,#mermaid-svg-E7e4FtHYfWytsxiA .node .label text,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape .label,#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape .label{text-anchor:middle;}#mermaid-svg-E7e4FtHYfWytsxiA .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .rough-node .label,#mermaid-svg-E7e4FtHYfWytsxiA .node .label,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape .label,#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape .label{text-align:center;}#mermaid-svg-E7e4FtHYfWytsxiA .node.clickable{cursor:pointer;}#mermaid-svg-E7e4FtHYfWytsxiA .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA .arrowheadPath{fill:#333333;}#mermaid-svg-E7e4FtHYfWytsxiA .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-E7e4FtHYfWytsxiA .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-E7e4FtHYfWytsxiA .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-E7e4FtHYfWytsxiA .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-E7e4FtHYfWytsxiA .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-E7e4FtHYfWytsxiA .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-E7e4FtHYfWytsxiA .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster text{fill:#333;}#mermaid-svg-E7e4FtHYfWytsxiA .cluster span{color:#333;}#mermaid-svg-E7e4FtHYfWytsxiA div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-E7e4FtHYfWytsxiA .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-E7e4FtHYfWytsxiA rect.text{fill:none;stroke-width:0;}#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape p,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-E7e4FtHYfWytsxiA .icon-shape .label rect,#mermaid-svg-E7e4FtHYfWytsxiA .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-E7e4FtHYfWytsxiA .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-E7e4FtHYfWytsxiA .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-E7e4FtHYfWytsxiA :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Grafana 折线图示例
📈 速率对比图
🟢 生产速率线

正常时与消费线持平
🔴 消费速率线

下降 = 消费者异常
⚠️ 两线分离

= 积压正在增长

当红线低于绿线时,积压会增长------这个图比单纯的 Lag 更直观。

5.3 Broker Leader 分布(柱状图)

promql 复制代码
count by (broker) (kafka_topic_partition_leader)

用柱状图展示每个 Broker 上的 Leader Partition 数量。如果某个 Broker 明显高于其他,说明 Leader 分布不均,需要执行 kafka-preferred-replica-election

5.4 消费者组存活状态(状态灯)

promql 复制代码
kafka_consumer_group_members{consumergroup="order-processor"}

大于 0 亮绿灯,等于 0 亮红灯------比盯着 Lag 更早发现问题。


六、告警规则

#mermaid-svg-vu0nLNaNd123luSl{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-vu0nLNaNd123luSl .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-vu0nLNaNd123luSl .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-vu0nLNaNd123luSl .error-icon{fill:#552222;}#mermaid-svg-vu0nLNaNd123luSl .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-vu0nLNaNd123luSl .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-vu0nLNaNd123luSl .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-vu0nLNaNd123luSl .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-vu0nLNaNd123luSl .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-vu0nLNaNd123luSl .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-vu0nLNaNd123luSl .marker{fill:#333333;stroke:#333333;}#mermaid-svg-vu0nLNaNd123luSl .marker.cross{stroke:#333333;}#mermaid-svg-vu0nLNaNd123luSl svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-vu0nLNaNd123luSl p{margin:0;}#mermaid-svg-vu0nLNaNd123luSl .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster-label text{fill:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster-label span{color:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster-label span p{background-color:transparent;}#mermaid-svg-vu0nLNaNd123luSl .label text,#mermaid-svg-vu0nLNaNd123luSl span{fill:#333;color:#333;}#mermaid-svg-vu0nLNaNd123luSl .node rect,#mermaid-svg-vu0nLNaNd123luSl .node circle,#mermaid-svg-vu0nLNaNd123luSl .node ellipse,#mermaid-svg-vu0nLNaNd123luSl .node polygon,#mermaid-svg-vu0nLNaNd123luSl .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .rough-node .label text,#mermaid-svg-vu0nLNaNd123luSl .node .label text,#mermaid-svg-vu0nLNaNd123luSl .image-shape .label,#mermaid-svg-vu0nLNaNd123luSl .icon-shape .label{text-anchor:middle;}#mermaid-svg-vu0nLNaNd123luSl .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .rough-node .label,#mermaid-svg-vu0nLNaNd123luSl .node .label,#mermaid-svg-vu0nLNaNd123luSl .image-shape .label,#mermaid-svg-vu0nLNaNd123luSl .icon-shape .label{text-align:center;}#mermaid-svg-vu0nLNaNd123luSl .node.clickable{cursor:pointer;}#mermaid-svg-vu0nLNaNd123luSl .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-vu0nLNaNd123luSl .arrowheadPath{fill:#333333;}#mermaid-svg-vu0nLNaNd123luSl .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-vu0nLNaNd123luSl .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-vu0nLNaNd123luSl .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vu0nLNaNd123luSl .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-vu0nLNaNd123luSl .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vu0nLNaNd123luSl .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-vu0nLNaNd123luSl .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-vu0nLNaNd123luSl .cluster text{fill:#333;}#mermaid-svg-vu0nLNaNd123luSl .cluster span{color:#333;}#mermaid-svg-vu0nLNaNd123luSl div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-vu0nLNaNd123luSl .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-vu0nLNaNd123luSl rect.text{fill:none;stroke-width:0;}#mermaid-svg-vu0nLNaNd123luSl .icon-shape,#mermaid-svg-vu0nLNaNd123luSl .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vu0nLNaNd123luSl .icon-shape p,#mermaid-svg-vu0nLNaNd123luSl .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-vu0nLNaNd123luSl .icon-shape .label rect,#mermaid-svg-vu0nLNaNd123luSl .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vu0nLNaNd123luSl .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-vu0nLNaNd123luSl .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-vu0nLNaNd123luSl :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是









Kafka 指标采集
Lag > 50000?
🔔 KafkaHighLag

Topic 积压告警
Lag 增长速度 > 1000/s?
🔔 KafkaLagIncreasing

积压增速告警
消费者组成员 = 0?
🔔 KafkaNoConsumer

消费者全部离线
Broker 数量减少?
🔔 KafkaBrokerDown

Broker 掉线
磁盘使用 > 90%?
🔔 KafkaDiskFull

磁盘空间告警
✅ 集群健康

yaml 复制代码
groups:
- name: kafka-alerts
  rules:
  # 积压超过阈值
  - alert: KafkaHighLag
    expr: sum by (consumergroup, topic) (kafka_consumer_lag) > 50000
    for: 5m
    annotations:
      summary: "Topic {{ $labels.topic }} 积压 {{ $value }} 条"

  # 积压增长速度过快
  - alert: KafkaLagIncreasing
    expr: sum by (consumergroup) (rate(kafka_consumer_lag[5m])) > 1000
    for: 3m
    annotations:
      summary: "消费者组 {{ $labels.consumergroup }} 每秒积压 {{ $value }} 条"

  # 消费者全部下线
  - alert: KafkaNoConsumer
    expr: kafka_consumer_group_members == 0
    for: 1m
    annotations:
      summary: "消费者组 {{ $labels.consumergroup }} 没有活跃消费者"

  # Broker 掉线
  - alert: KafkaBrokerDown
    expr: count(kafka_brokers) < 3
    for: 1m
    annotations:
      summary: "Kafka Brokers 数量不足,当前 {{ $value }}"

  # 磁盘快满了
  - alert: KafkaDiskFull
    expr: kafka_log_log_size / 1024^3 > 450  # 假设单盘 500GB
    for: 10m
    annotations:
      summary: "Broker {{ $labels.broker }} 磁盘使用超过 450GB"

七、一次积压故障的完整排查

7.1 故障排查流程

#mermaid-svg-rsiLPzrBzyWGSTKF{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-rsiLPzrBzyWGSTKF .error-icon{fill:#552222;}#mermaid-svg-rsiLPzrBzyWGSTKF .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-rsiLPzrBzyWGSTKF .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-rsiLPzrBzyWGSTKF .marker{fill:#333333;stroke:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF .marker.cross{stroke:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-rsiLPzrBzyWGSTKF p{margin:0;}#mermaid-svg-rsiLPzrBzyWGSTKF .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster-label text{fill:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster-label span{color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster-label span p{background-color:transparent;}#mermaid-svg-rsiLPzrBzyWGSTKF .label text,#mermaid-svg-rsiLPzrBzyWGSTKF span{fill:#333;color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .node rect,#mermaid-svg-rsiLPzrBzyWGSTKF .node circle,#mermaid-svg-rsiLPzrBzyWGSTKF .node ellipse,#mermaid-svg-rsiLPzrBzyWGSTKF .node polygon,#mermaid-svg-rsiLPzrBzyWGSTKF .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .rough-node .label text,#mermaid-svg-rsiLPzrBzyWGSTKF .node .label text,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape .label,#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape .label{text-anchor:middle;}#mermaid-svg-rsiLPzrBzyWGSTKF .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .rough-node .label,#mermaid-svg-rsiLPzrBzyWGSTKF .node .label,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape .label,#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape .label{text-align:center;}#mermaid-svg-rsiLPzrBzyWGSTKF .node.clickable{cursor:pointer;}#mermaid-svg-rsiLPzrBzyWGSTKF .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF .arrowheadPath{fill:#333333;}#mermaid-svg-rsiLPzrBzyWGSTKF .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-rsiLPzrBzyWGSTKF .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-rsiLPzrBzyWGSTKF .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-rsiLPzrBzyWGSTKF .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-rsiLPzrBzyWGSTKF .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-rsiLPzrBzyWGSTKF .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster text{fill:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF .cluster span{color:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-rsiLPzrBzyWGSTKF .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-rsiLPzrBzyWGSTKF rect.text{fill:none;stroke-width:0;}#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape p,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-rsiLPzrBzyWGSTKF .icon-shape .label rect,#mermaid-svg-rsiLPzrBzyWGSTKF .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-rsiLPzrBzyWGSTKF .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-rsiLPzrBzyWGSTKF .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-rsiLPzrBzyWGSTKF :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 告警触发

order-events 积压暴涨
Step 1: 积压排行榜

sum by (consumergroup) (kafka_consumer_lag)
发现: order-processor 贡献 83000
Step 2: 看消费速率

rate(kafka_consumer_group_current_offset...)
发现: 消费速率为 0
Step 3: 看消费者组成员

kafka_consumer_group_members
发现: 成员数为 0
结论: 消费者全部离线

不是消费慢的问题
Step 4: 检查下游 Pod 状态
发现: Pod 因 OOM 全部重启
解决: 重启下游服务

消费者组自动 Rebalance
结果: Lag 开始下降

→ 恢复正常

7.2 排查步骤明细

步骤 操作 发现
Step 1 看积压排行榜:sum by (consumergroup) (kafka_consumer_lag) order-processor 贡献了 83000
Step 2 看消费速率:rate(kafka_consumer_group_current_offset...) 速率为 0,完全没有消费
Step 3 看消费者组成员:kafka_consumer_group_members{...} 结果为 0
结论 消费者全部下线,不是消费慢的问题 ---
Step 4 检查下游服务 该服务的 Pod 因 OOM 全部重启,消费者来不及重新加入组

7.3 解决

重启下游服务后,消费者组自动 Rebalance,Lag 开始下降。

7.4 复盘

如果有消费者组成员的告警,这个问题在消费者下线的那一刻就会触发,而不是等到积压到 8 万条。


八、总结

监控 Kafka,服务器 CPU 和内存是最不重要的指标------它们正常,积压照样能到几十万。
#mermaid-svg-uJWK7Pm7qJ8s61w2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .error-icon{fill:#552222;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .marker.cross{stroke:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 p{margin:0;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster-label text{fill:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster-label span{color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster-label span p{background-color:transparent;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .label text,#mermaid-svg-uJWK7Pm7qJ8s61w2 span{fill:#333;color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node rect,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node circle,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node ellipse,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node polygon,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .rough-node .label text,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .label text,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .rough-node .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape .label,#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape .label{text-align:center;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node.clickable{cursor:pointer;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .arrowheadPath{fill:#333333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster text{fill:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .cluster span{color:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-uJWK7Pm7qJ8s61w2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape p,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .icon-shape .label rect,#mermaid-svg-uJWK7Pm7qJ8s61w2 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-uJWK7Pm7qJ8s61w2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-uJWK7Pm7qJ8s61w2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-uJWK7Pm7qJ8s61w2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} kafka_exporter 核心能力
📊 Lag 监控

谁在积压?积压了多少?
⚡ 吞吐量对比

生产和消费哪个快?
👥 消费者组状态

消费者还活着吗?
🖥️ Broker 负载

集群均衡吗?
Lag > 50000 → 告警
消费速率为 0 → 告警
成员数为 0 → 紧急告警
Broker 掉线 → 告警

kafka_exporter 带来的核心能力:

维度 说明
Lag 谁在积压、积压了多少
吞吐量对比 生产和消费哪个快
消费者组状态 消费者是不是还活着
Broker 负载 集群是不是均衡

记住:Kafka"活着"不等于"健康" 。消费者下线了,集群本身不会报警------它只会默默帮你把消息存着,直到你的磁盘爆满。让 kafka_exporter 成为你的第一道防线。

相关推荐
轻口味2 小时前
轻规划鸿蒙开发实战10:分布式数据同步深度博弈,UserId 隔离与并发数据冲突消解机
分布式·华为·harmonyos·鸿蒙
Solis程序员2 小时前
Raft:分布式系统的定海神针
java·分布式·kafka·rabbitmq·agent·raft
我是一颗柠檬2 小时前
【Java项目技术亮点】Leaf号段模式双Buffer优化
java·开发语言·分布式·后端·架构
芒鸽2 小时前
HarmonyOS 分布式开发实战:设备协同、数据共享与跨设备迁移
分布式·wpf·harmonyos
省四收割者2 小时前
从硬件中断到分布式协程:全景解构高并发机制与 C / Golang 的巅峰对决
c++·分布式·嵌入式硬件·golang
知识分享小能手3 小时前
Hadoop学习教程,从入门到精通, HBase 分布式数据库 — 完整知识点与案例代码(8)
数据库·hadoop·分布式
王小王-1233 小时前
基于 Hadoop 的心脏病分析可视化与风险预测系统
大数据·hadoop·分布式·心脏病预测系统·疾病预测·冠心病风险预测
JiaHao汤17 小时前
分布式事务方案全景:从理论到 Seata 落地
java·分布式·spring·spring cloud
南部余额19 小时前
RabbitMQ 进阶:延迟队列完全指南
java·分布式·spring·rabbitmq