前言
对于 Kafka 组件而言,我们通常会对 kafka 服务端添加一些监控,来确保服务的稳定性,虽然有 kafka-exporter 来对消费者进行监控,但是指标很少,对于生产者和消费者更细粒度的监控就无法做到了。只能将监控部署在客户端上,这样我们就能拿到更加详细的监控数据。
例如:对于消费者来说,我们可以获取到消费者重平衡次数,每次poll 数据的条数等等。
这里梳理下如何对消费者添加 JMX 监控。
一、所需组件
这里我们只针对于 JAVA 语言的 KAFKA 客户端为例,需要以下东西:
- kafka-client:3.4.1
- JMX-exporter
需要提前准备好 kafka 服务和创建好topic。
二、编写测试代码
1. pom 文件
bash
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<kafka.version>3.4.1</kafka.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
2. 生产者代码
bash
public class ProducerDemo {
public static void main(String[] args) {
//主题(当主题不存在,自动创建主题)
String topic = "test";
//配置
Properties properties = new Properties();
//kafka服务器地址
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.195.132:9092");
//反序列化器
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class);
//生产者
KafkaProducer<String,String> kafkaProducer = new KafkaProducer(properties);
//生产信息
for (int i = 0; i < 100; i++) {
String msg = String.format("hello,第%d条信息", i);
//消息(key可以为null,key值影响消息发往哪个分区)
ProducerRecord<String, String> producerRecord = new ProducerRecord<String, String>(topic, String.valueOf(i), msg);
//发送
kafkaProducer.send(producerRecord);
System.out.println("发送第"+i+"条信息");
}
//关闭
kafkaProducer.close();
}
}
3. 消费者代码
bash
public class ConsumerDemo {
public static void main(String[] args) throws Exception{
//主题
String topic = "test";
//配置
Properties properties = new Properties();
//kafka服务器地址
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.195.132:9092");
//k,v的序列化器
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class);
//消费者分组
properties.put(ConsumerConfig.GROUP_ID_CONFIG,"Consumer-Group-1");
//offset重置模式
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
//消费者
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer(properties);
//订阅(可以订阅多个主题)
kafkaConsumer.subscribe(Collections.singletonList(topic));
//消费
while (true){
//获取信息
ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(1000));
//遍历
records.forEach(o->{
System.out.println(String.format("topic==%s,offset==%s,key==%s,value==%s",o.topic(),o.offset(),o.key(),o.value()));
});
//睡眠
Thread.sleep(500);
}
}
}
4. 打包
bash
mvn clean package
5. 测试生产和消费
生产数据测试:
bash
java -cp kafka-1.0-jar-with-dependencies.jar ProducerDemo
消费数据测试:
bash
java -cp kafka-1.0-jar-with-dependencies.jar ConsumerDemo
三、使用jmx-agent
1. 下载
https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.18.0/jmx_prometheus_javaagent-0.18.0.jar
2. 编写kafka-client.yaml
bash
---
lowercaseOutputName: true
lowercaseOutputLabelNames: true
whitelistObjectNames:
- "kafka.consumer:*"
- "kafka.producer:*"
blacklistObjectNames:
- "kafka.admin.client:*"
- "kafka.consumer:type=*,id=*"
- "kafka.producer:type=*,id=*"
- "kafka.*:type=kafka-metrics-count,*"
rules:
# "kafka.consumer:type=app-info,client-id=*"
# "kafka.producer:type=app-info,client-id=*"
- pattern: "kafka.(.+)<type=app-info, client-id=(.+)><>(.+): (.+)"
value: 1
name: kafka_$1_app_info
cache: true
labels:
client_type: $1
client_id: $2
$3: $4
type: UNTYPED
# "kafka.consumer:type=consumer-metrics,client-id=*, protocol=*, cipher=*"
# "kafka.consumer:type=type=consumer-fetch-manager-metrics,client-id=*, topic=*, partition=*"
# "kafka.producer:type=producer-metrics,client-id=*, protocol=*, cipher=*"
- pattern: "kafka.(.+)<type=(.+), (.+)=(.+), (.+)=(.+), (.+)=(.+)><>(.+):"
name: kafka_$1_$2_$9
type: GAUGE
cache: true
labels:
client_type: $1
$3: "$4"
$5: "$6"
$7: "$8"
# "kafka.consumer:type=consumer-node-metrics,client-id=*, node-id=*"
# "kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*, topic=*"
# "kafka.producer:type=producer-node-metrics,client-id=*, node-id=*"
# "kafka.producer:type=producer-topic-metrics,client-id=*, topic=*"
- pattern: "kafka.(.+)<type=(.+), (.+)=(.+), (.+)=(.+)><>(.+):"
name: kafka_$1_$2_$7
type: GAUGE
cache: true
labels:
client_type: $1
$3: "$4"
$5: "$6"
# "kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*"
# "kafka.consumer:type=consumer-metrics,client-id=*"
# "kafka.producer:type=producer-metrics,client-id=*"
- pattern: "kafka.(.+)<type=(.+), (.+)=(.+)><>(.+):"
name: kafka_$1_$2_$5
type: GAUGE
cache: true
labels:
client_type: $1
$3: "$4"
- pattern: "kafka.(.+)<type=(.+)><>(.+):"
name: kafka_$1_$2_$3
cache: true
labels:
client_type: $1
3. 重新启动消费程序
bash
java -cp kafka-1.0-jar-with-dependencies.jar -javaagent:/opt/javaProject/jmx_prometheus_javaagent-0.18.0.jar=1234:/opt/javaProject/kafka_client.yml ConsumerDemo
4. 打开web ui
这里就能看到我们采集到了kafka 消费者实例的详细指标。我们就可以和Prometheus进行集成,这里就不详细介绍了。
四、具体指标
想要获取更多的指标和指标具体的用途可以参考官网的文档
https://kafka.apache.org/documentation/#consumer_monitoring
具体指标如下:
以下是添加了中文翻译后的Kafka Consumer Metrics表格:
Consumer Coordinator Metrics
METRIC NAME | DESCRIPTION | MBEAN NAME | 中文描述 |
---|---|---|---|
commit-latency-avg | The average time taken for a commit request | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 提交请求的平均时间 |
commit-latency-max | The max time taken for a commit request | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 提交请求的最大时间 |
commit-rate | The number of commit calls per second | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 每秒提交调用次数 |
commit-total | The total number of commit calls | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 提交调用的总次数 |
assigned-partitions | The number of partitions currently assigned to this consumer | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 当前分配给该消费者的分区数量 |
heartbeat-response-time-max | The max time taken to receive a response to a heartbeat request | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 接收到心跳请求响应的最大时间 |
heartbeat-rate | The average number of heartbeats per second | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 每秒心跳次数的平均值 |
heartbeat-total | The total number of heartbeats | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 心跳的总次数 |
join-time-avg | The average time taken for a group rejoin | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 加入组的平均时间 |
join-time-max | The max time taken for a group rejoin | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 加入组的最大时间 |
join-rate | The number of group joins per second | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 每秒加入组的次数 |
join-total | The total number of group joins | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 加入组的总次数 |
sync-time-avg | The average time taken for a group sync | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 同步组的平均时间 |
sync-time-max | The max time taken for a group sync | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 同步组的最大时间 |
sync-rate | The number of group syncs per second | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 每秒同步组的次数 |
sync-total | The total number of group syncs | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 同步组的总次数 |
rebalance-latency-avg | The average time taken for a group rebalance | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 组重新平衡的平均时间 |
rebalance-latency-max | The max time taken for a group rebalance | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 组重新平衡的最大时间 |
rebalance-latency-total | The total time taken for group rebalances so far | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 到目前为止组重新平衡的总时间 |
rebalance-total | The total number of group rebalances participated | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 参与的组重新平衡总次数 |
rebalance-rate-per-hour | The number of group rebalance participated per hour | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 每小时参与的组重新平衡次数 |
failed-rebalance-total | The total number of failed group rebalances | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 失败的组重新平衡总次数 |
failed-rebalance-rate-per-hour | The number of failed group rebalance events per hour | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 每小时失败的组重新平衡事件次数 |
last-rebalance-seconds-ago | The number of seconds since the last rebalance event | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 自上次重新平衡事件以来的秒数 |
last-heartbeat-seconds-ago | The number of seconds since the last controller heartbeat | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 自上次控制器心跳以来的秒数 |
partitions-revoked-latency-avg | The average time taken by the on-partitions-revoked rebalance listener callback | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 分区撤销重新平衡监听器回调的平均时间 |
partitions-revoked-latency-max | The max time taken by the on-partitions-revoked rebalance listener callback | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 分区撤销重新平衡监听器回调的最大时间 |
partitions-assigned-latency-avg | The average time taken by the on-partitions-assigned rebalance listener callback | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 分配分区重新平衡监听器回调的平均时间 |
partitions-assigned-latency-max | The max time taken by the on-partitions-assigned rebalance listener callback | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 分配分区重新平衡监听器回调的最大时间 |
partitions-lost-latency-avg | The average time taken by the on-partitions-lost rebalance listener callback | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 丢失分区重新平衡监听器回调的平均时间 |
partitions-lost-latency-max | The max time taken by the on-partitions-lost rebalance listener callback | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) | 丢失分区重新平衡监听器回调的最大时间 |
Consumer Fetch Metrics
METRIC NAME | DESCRIPTION | MBEAN NAME | 中文描述 |
---|---|---|---|
bytes-consumed-rate | The average number of bytes consumed per second | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 每秒消费的字节平均数量 |
bytes-consumed-total | The total number of bytes consumed | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 消费的字节总数 |
fetch-latency-avg | The average time taken for a fetch request | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 提取请求的平均时间 |
fetch-latency-max | The max time taken for any fetch request | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 提取请求的最大时间 |
fetch-rate | The number of fetch requests per second | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 每秒提取请求的次数 |
fetch-size-avg | The average number of bytes fetched per request | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 每次请求提取的字节平均数量 |
fetch-size-max | The maximum number of bytes fetched per request | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 每次请求提取的字节最大数量 |
fetch-throttle-time-avg | The average throttle time in ms | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 平均限速时间(毫秒) |
fetch-throttle-time-max | The maximum throttle time in ms | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 最大限速时间(毫秒) |
fetch-total | The total number of fetch requests | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 提取请求的总次数 |
records-consumed-rate | The average number of records consumed per second | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 每秒消费的记录平均数量 |
records-consumed-total | The total number of records consumed | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 消费的记录总数 |
records-lag-max | The maximum lag in terms of number of records for any partition in this window | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 任一分区的记录最大滞后数量 |
records-lead-min | The minimum lead in terms of number of records for any partition in this window | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 任一分区的记录最小领先数量 |
records-per-request-avg | The average number of records in each request | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" | 每次请求的记录平均数量 |
bytes-consumed-rate | The average number of bytes consumed per second for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 每秒消费的字节平均数量(按主题) |
bytes-consumed-total | The total number of bytes consumed for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 消费的字节总数(按主题) |
fetch-size-avg | The average number of bytes fetched per request for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 每次请求提取的字节平均数量(按主题) |
fetch-size-max | The maximum number of bytes fetched per request for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 每次请求提取的字节最大数量(按主题) |
records-consumed-rate | The average number of records consumed per second for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 每秒消费的记录平均数量(按主题) |
records-consumed-total | The total number of records consumed for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 消费的记录总数(按主题) |
records-per-request-avg | The average number of records in each request for a topic | kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" | 每次请求的记录平均数量(按主题) |
preferred-read-replica | The current read replica for the partition, or -1 if reading from leader | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 当前分区的读取副本,若从领导者读取则为-1 |
records-lag | The latest lag of the partition | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 分区的最新滞后数量 |
records-lag-avg | The average lag of the partition | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 分区的平均滞后数量 |
records-lag-max | The max lag of the partition | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 分区的最大滞后数量 |
records-lead | The latest lead of the partition | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 分区的最新领先数量 |
records-lead-avg | The average lead of the partition | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 分区的平均领先数量 |
records-lead-min | The min lead of the partition | kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" | 分区的最小领先数量 |