Kafka 监控之分层存储监控和 KRaft 监控指标

目录

[一. 前言](#一. 前言)

[二. 分层存储监控(Tiered Storage Monitoring)](#二. 分层存储监控(Tiered Storage Monitoring))

[三. KRaft 监控指标(KRaft Monitoring Metrics)](#三. KRaft 监控指标(KRaft Monitoring Metrics))

[3.1. KRaft 投票人数监控指标(KRaft Quorum Monitoring Metrics)](#3.1. KRaft 投票人数监控指标(KRaft Quorum Monitoring Metrics))

[3.2. KRaft 控制器监控指标(KRaft Controller Monitoring Metrics)](#3.2. KRaft 控制器监控指标(KRaft Controller Monitoring Metrics))

[3.3. KRaft Broker 监控指标(KRaft Broker Monitoring Metrics)](#3.3. KRaft Broker 监控指标(KRaft Broker Monitoring Metrics))


一. 前言

和任何一个分布式系统一样,Kafka 的存储和网络使用情况也是我们需要关注和监控的指标,只有对存储和网络状态进行充分的监控才能及时发现问题并规避风险。

二. 分层存储监控(Tiered Storage Monitoring)

原文引用:The following set of metrics are available for monitoring of the tiered storage feature:

以下一组指标可用于监视分层存储功能:

METRIC/ATTRIBUTE NAME DESCRIPTION MBEAN NAME
Remote Fetch Bytes Per Sec Rate of bytes read from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate kafka.server:type=BrokerTopicMetrics, name=RemoteFetchBytesPerSec,topic=([-.\w]+)
Remote Fetch Requests Per Sec Rate of read requests from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate kafka.server:type=BrokerTopicMetrics, name=RemoteFetchRequestsPerSec,topic=([-.\w]+)
Remote Fetch Errors Per Sec Rate of read errors from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate kafka.server:type=BrokerTopicMetrics, name=RemoteFetchErrorsPerSec,topic=([-.\w]+)
Remote Copy Bytes Per Sec Rate of bytes copied to remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate kafka.server:type=BrokerTopicMetrics, name=RemoteCopyBytesPerSec,topic=([-.\w]+)
Remote Copy Requests Per Sec Rate of write requests to remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate kafka.server:type=BrokerTopicMetrics, name=RemoteCopyRequestsPerSec,topic=([-.\w]+)
Remote Copy Errors Per Sec Rate of write errors from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate kafka.server:type=BrokerTopicMetrics, name=RemoteCopyErrorsPerSec,topic=([-.\w]+)
RemoteLogReader Task Queue Size Size of the queue holding remote storage read tasks org.apache.kafka.storage.internals.log:type= RemoteStorageThreadPool, name=RemoteLogReaderTaskQueueSize
RemoteLogReader Avg Idle Percent Average idle percent of thread pool for processing remote storage read tasks org.apache.kafka.storage.internals.log:type= RemoteStorageThreadPool, name=RemoteLogReaderAvgIdlePercent
RemoteLogManager Tasks Avg Idle Percent Average idle percent of thread pool for copying data to remote storage kafka.log.remote:type=RemoteLogManager, name=RemoteLogManagerTasksAvgIdlePercent

三. KRaft 监控指标(KRaft Monitoring Metrics)

原文引用:The set of metrics that allow monitoring of the KRaft quorum and the metadata log.

Note that some exposed metrics depend on the role of the node as defined by process.roles

允许监视 KRaft 投票数和元数据日志的一组度量。

请注意,一些公开的度量取决于 process.roles 定义的节点的角色

3.1. KRaft 投票人数监控指标(KRaft Quorum Monitoring Metrics)

原文引用:These metrics are reported on both Controllers and Brokers in a KRaft Cluster

KRaft 集群中的控制器和 Broker 都报告了这些指标:

METRIC/ATTRIBUTE NAME DESCRIPTION MBEAN NAME
Current State The current state of this member; possible values are leader, candidate, voted, follower, unattached, observer. kafka.server:type=raft-metrics,name=current-state
Current Leader The current quorum leader's id; -1 indicates unknown. kafka.server:type=raft-metrics,name=current-leader
Current Voted The current voted leader's id; -1 indicates not voted for anyone. kafka.server:type=raft-metrics,name=current-vote
Current Epoch The current quorum epoch. kafka.server:type=raft-metrics,name=current-epoch
High Watermark The high watermark maintained on this member; -1 if it is unknown. kafka.server:type=raft-metrics,name=high-watermark
Log End Offset The current raft log end offset. kafka.server:type=raft-metrics, name=log-end-offset
Number of Unknown Voter Connections Number of unknown voters whose connection information is not cached. This value of this metric is always 0. kafka.server:type=raft-metrics, name=number-unknown-voter-connections
Average Commit Latency The average time in milliseconds to commit an entry in the raft log. kafka.server:type=raft-metrics,name=commit-latency-avg
Maximum Commit Latency The maximum time in milliseconds to commit an entry in the raft log. kafka.server:type=raft-metrics,name=commit-latency-max
Average Election Latency The average time in milliseconds spent on electing a new leader. kafka.server:type=raft-metrics,name=election-latency-avg
Maximum Election Latency The maximum time in milliseconds spent on electing a new leader. kafka.server:type=raft-metrics,name=election-latency-max
Fetch Records Rate The average number of records fetched from the leader of the raft quorum. kafka.server:type=raft-metrics,name=fetch-records-rate
Append Records Rate The average number of records appended per sec by the leader of the raft quorum. kafka.server:type=raft-metrics,name=append-records-rate
Average Poll Idle Ratio The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records. kafka.server:type=raft-metrics, name=poll-idle-ratio-avg
Current Metadata Version Outputs the feature level of the current effective metadata version. kafka.server:type=MetadataLoader, name=CurrentMetadataVersion
Metadata Snapshot Load Count The total number of times we have loaded a KRaft snapshot since the process was started. kafka.server:type=MetadataLoader, name=HandleLoadSnapshotCount
Latest Metadata Snapshot Size The total size in bytes of the latest snapshot that the node has generated. If none have been generated yet, this is the size of the latest snapshot that was loaded. If no snapshots have been generated or loaded, this is 0. kafka.server:type=SnapshotEmitter, name=LatestSnapshotGeneratedBytes
Latest Metadata Snapshot Age The interval in milliseconds since the latest snapshot that the node has generated. If none have been generated yet, this is approximately the time delta since the process was started. kafka.server:type=SnapshotEmitter, name=LatestSnapshotGeneratedAgeMs

3.2. KRaft 控制器监控指标(KRaft Controller Monitoring Metrics)

METRIC/ATTRIBUTE NAME DESCRIPTION MBEAN NAME
Active Controller Count The number of Active Controllers on this node. Valid values are '0' or '1'. kafka.controller:type=KafkaController, name=ActiveControllerCount
Event Queue Time Ms A Histogram of the time in milliseconds that requests spent waiting in the Controller Event Queue. kafka.controller:type=ControllerEventManager, name=EventQueueTimeMs
Event Queue Processing Time Ms A Histogram of the time in milliseconds that requests spent being processed in the Controller Event Queue. kafka.controller:type=ControllerEventManager, name=EventQueueProcessingTimeMs
Fenced Broker Count The number of fenced brokers as observed by this Controller. kafka.controller:type=KafkaController, name=FencedBrokerCount
Active Broker Count The number of active brokers as observed by this Controller. kafka.controller:type=KafkaController, name=ActiveBrokerCount
Global Topic Count The number of global topics as observed by this Controller. kafka.controller:type=KafkaController, name=GlobalTopicCount
Global Partition Count The number of global partitions as observed by this Controller. kafka.controller:type=KafkaController, name=GlobalPartitionCount
Offline Partition Count The number of offline topic partitions (non-internal) as observed by this Controller. kafka.controller:type=KafkaController, name=OfflinePartitionCount
Preferred Replica Imbalance Count The count of topic partitions for which the leader is not the preferred leader. kafka.controller:type=KafkaController, name=PreferredReplicaImbalanceCount
Metadata Error Count The number of times this controller node has encountered an error during metadata log processing. kafka.controller:type=KafkaController, name=MetadataErrorCount
Last Applied Record Offset The offset of the last record from the cluster metadata partition that was applied by the Controller. kafka.controller:type=KafkaController, name=LastAppliedRecordOffset
Last Committed Record Offset The offset of the last record committed to this Controller. kafka.controller:type=KafkaController, name=LastCommittedRecordOffset
Last Applied Record Timestamp The timestamp of the last record from the cluster metadata partition that was applied by the Controller. kafka.controller:type=KafkaController, name=LastAppliedRecordTimestamp
Last Applied Record Lag Ms The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the controller. For active Controllers the value of this lag is always zero. kafka.controller:type=KafkaController, name=LastAppliedRecordLagMs
ZooKeeper Write Behind Lag The amount of lag in records that ZooKeeper is behind relative to the highest committed record in the metadata log. This metric will only be reported by the active KRaft controller. kafka.controller:type=KafkaController, name=ZkWriteBehindLag
ZooKeeper Metadata Snapshot Write Time The number of milliseconds the KRaft controller took reconciling a snapshot into ZooKeeper. kafka.controller:type=KafkaController, name=ZkWriteSnapshotTimeMs
ZooKeeper Metadata Delta Write Time The number of milliseconds the KRaft controller took writing a delta into ZK. kafka.controller:type=KafkaController, name=ZkWriteDeltaTimeMs
Timed-out Broker Heartbeat Count The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. kafka.controller:type=KafkaController, name=TimedOutBrokerHeartbeatCount
Number Of Operations Started In Event Queue The total number of controller event queue operations that were started. This includes deferred operations. kafka.controller:type=KafkaController, name=EventQueueOperationsStartedCount
Number of Operations Timed Out In Event Queue The total number of controller event queue operations that timed out before they could be performed. kafka.controller:type=KafkaController, name=EventQueueOperationsTimedOutCount
Number Of New Controller Elections Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. kafka.controller:type=KafkaController, name=NewActiveControllersCount

3.3. KRaft Broker 监控指标(KRaft Broker Monitoring Metrics)

METRIC/ATTRIBUTE NAME DESCRIPTION MBEAN NAME
Last Applied Record Offset The offset of the last record from the cluster metadata partition that was applied by the broker kafka.server:type=broker-metadata-metrics,name=last-applied-record-offset
Last Applied Record Timestamp The timestamp of the last record from the cluster metadata partition that was applied by the broker. kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp
Last Applied Record Lag Ms The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the broker kafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms
Metadata Load Error Count The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it. kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count
Metadata Apply Error Count The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta. kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count
相关推荐
得谷养人4 小时前
flink-1.16 table sql 消费 kafka 数据,指定时间戳位置消费数据报错:Invalid negative offset 问题解决
sql·flink·kafka
DachuiLi16 小时前
McDonald‘s Event-Driven Architecture 麦当劳事件驱动架构
kafka
Elastic 中国社区官方博客21 小时前
如何通过 Kafka 将数据导入 Elasticsearch
大数据·数据库·分布式·elasticsearch·搜索引擎·kafka·全文检索
神秘打工猴1 天前
Kafka 监控都有哪些?
分布式·kafka
Kobebryant-Manba1 天前
kafka基本概念
分布式·学习·kafka
dzend1 天前
Kafka、RocketMQ、RabbitMQ 对比
kafka·rabbitmq·rocketmq
李昊哲小课1 天前
deepin 安装 kafka
大数据·分布式·zookeeper·数据分析·kafka
Kobebryant-Manba1 天前
zookeeper+kafka的windows下安装
分布式·zookeeper·kafka
lucky_syq2 天前
Flume和Kafka的区别?
大数据·kafka·flume