【Kafka】对 kafka 消费程序客户端进行监控采集

前言

对于 Kafka 组件而言,我们通常会对 kafka 服务端添加一些监控,来确保服务的稳定性,虽然有 kafka-exporter 来对消费者进行监控,但是指标很少,对于生产者和消费者更细粒度的监控就无法做到了。只能将监控部署在客户端上,这样我们就能拿到更加详细的监控数据。

例如:对于消费者来说,我们可以获取到消费者重平衡次数,每次poll 数据的条数等等。

这里梳理下如何对消费者添加 JMX 监控。

一、所需组件

这里我们只针对于 JAVA 语言的 KAFKA 客户端为例,需要以下东西:

  • kafka-client:3.4.1
  • JMX-exporter

需要提前准备好 kafka 服务和创建好topic。

二、编写测试代码

1. pom 文件

bash 复制代码
 <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <kafka.version>3.4.1</kafka.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>${kafka.version}</version>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.3.0</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

2. 生产者代码

bash 复制代码
public class ProducerDemo {

    public static void main(String[] args) {
        //主题(当主题不存在,自动创建主题)
        String topic = "test";
        //配置
        Properties properties = new Properties();
        //kafka服务器地址
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.195.132:9092");
        //反序列化器
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class);

        //生产者
        KafkaProducer<String,String> kafkaProducer = new KafkaProducer(properties);

        //生产信息
        for (int i = 0; i < 100; i++) {
            String msg = String.format("hello,第%d条信息", i);
            //消息(key可以为null,key值影响消息发往哪个分区)
            ProducerRecord<String, String> producerRecord = new ProducerRecord<String, String>(topic, String.valueOf(i), msg);
            //发送
            kafkaProducer.send(producerRecord);
            System.out.println("发送第"+i+"条信息");
        }
        //关闭
        kafkaProducer.close();
    }
}

3. 消费者代码

bash 复制代码
public class ConsumerDemo {

    public static void main(String[] args) throws Exception{
        //主题
        String topic = "test";

        //配置
        Properties properties = new Properties();
        //kafka服务器地址
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.195.132:9092");
        //k,v的序列化器
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class);
        //消费者分组
        properties.put(ConsumerConfig.GROUP_ID_CONFIG,"Consumer-Group-1");
        //offset重置模式
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");

        //消费者
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer(properties);
        //订阅(可以订阅多个主题)
        kafkaConsumer.subscribe(Collections.singletonList(topic));

        //消费
        while (true){
            //获取信息
            ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(1000));
            //遍历
            records.forEach(o->{
                System.out.println(String.format("topic==%s,offset==%s,key==%s,value==%s",o.topic(),o.offset(),o.key(),o.value()));
            });
            //睡眠
            Thread.sleep(500);
        }
    }
}

4. 打包

bash 复制代码
mvn clean package

5. 测试生产和消费

生产数据测试:

bash 复制代码
java -cp kafka-1.0-jar-with-dependencies.jar      ProducerDemo

消费数据测试:

bash 复制代码
java -cp kafka-1.0-jar-with-dependencies.jar      ConsumerDemo

三、使用jmx-agent

1. 下载

https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.18.0/jmx_prometheus_javaagent-0.18.0.jar

2. 编写kafka-client.yaml

bash 复制代码
---
lowercaseOutputName: true
lowercaseOutputLabelNames: true
whitelistObjectNames:
  - "kafka.consumer:*"
  - "kafka.producer:*"
blacklistObjectNames:
  - "kafka.admin.client:*"
  - "kafka.consumer:type=*,id=*"
  - "kafka.producer:type=*,id=*"
  - "kafka.*:type=kafka-metrics-count,*"
rules:
  # "kafka.consumer:type=app-info,client-id=*"
  # "kafka.producer:type=app-info,client-id=*"
  - pattern: "kafka.(.+)<type=app-info, client-id=(.+)><>(.+): (.+)"
    value: 1
    name: kafka_$1_app_info
    cache: true
    labels:
      client_type: $1
      client_id: $2
      $3: $4
    type: UNTYPED
  # "kafka.consumer:type=consumer-metrics,client-id=*, protocol=*, cipher=*"
  # "kafka.consumer:type=type=consumer-fetch-manager-metrics,client-id=*, topic=*, partition=*"
  # "kafka.producer:type=producer-metrics,client-id=*, protocol=*, cipher=*"
  - pattern: "kafka.(.+)<type=(.+), (.+)=(.+), (.+)=(.+), (.+)=(.+)><>(.+):"
    name: kafka_$1_$2_$9
    type: GAUGE
    cache: true
    labels:
      client_type: $1
      $3: "$4"
      $5: "$6"
      $7: "$8"
  # "kafka.consumer:type=consumer-node-metrics,client-id=*, node-id=*"
  # "kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*, topic=*"
  # "kafka.producer:type=producer-node-metrics,client-id=*, node-id=*"
  # "kafka.producer:type=producer-topic-metrics,client-id=*, topic=*"
  - pattern: "kafka.(.+)<type=(.+), (.+)=(.+), (.+)=(.+)><>(.+):"
    name: kafka_$1_$2_$7
    type: GAUGE
    cache: true
    labels:
      client_type: $1
      $3: "$4"
      $5: "$6"
  # "kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*"
  # "kafka.consumer:type=consumer-metrics,client-id=*"
  # "kafka.producer:type=producer-metrics,client-id=*"
  - pattern: "kafka.(.+)<type=(.+), (.+)=(.+)><>(.+):"
    name: kafka_$1_$2_$5
    type: GAUGE
    cache: true
    labels:
      client_type: $1
      $3: "$4"
  - pattern: "kafka.(.+)<type=(.+)><>(.+):"
    name: kafka_$1_$2_$3
    cache: true
    labels:
      client_type: $1

yaml 参考:https://github.com/confluentinc/jmx-monitoring-stacks/blob/main/shared-assets/jmx-exporter/kafka_client.yml

3. 重新启动消费程序

bash 复制代码
java -cp kafka-1.0-jar-with-dependencies.jar -javaagent:/opt/javaProject/jmx_prometheus_javaagent-0.18.0.jar=1234:/opt/javaProject/kafka_client.yml ConsumerDemo
4. 打开web ui

访问http://xxx.xxx.xxx.xxx:1234

这里就能看到我们采集到了kafka 消费者实例的详细指标。我们就可以和Prometheus进行集成,这里就不详细介绍了。

四、具体指标

想要获取更多的指标和指标具体的用途可以参考官网的文档
https://kafka.apache.org/documentation/#consumer_monitoring

具体指标如下:

以下是添加了中文翻译后的Kafka Consumer Metrics表格:

Consumer Coordinator Metrics

METRIC NAME DESCRIPTION MBEAN NAME 中文描述
commit-latency-avg The average time taken for a commit request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 提交请求的平均时间
commit-latency-max The max time taken for a commit request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 提交请求的最大时间
commit-rate The number of commit calls per second kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 每秒提交调用次数
commit-total The total number of commit calls kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 提交调用的总次数
assigned-partitions The number of partitions currently assigned to this consumer kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 当前分配给该消费者的分区数量
heartbeat-response-time-max The max time taken to receive a response to a heartbeat request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 接收到心跳请求响应的最大时间
heartbeat-rate The average number of heartbeats per second kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 每秒心跳次数的平均值
heartbeat-total The total number of heartbeats kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 心跳的总次数
join-time-avg The average time taken for a group rejoin kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 加入组的平均时间
join-time-max The max time taken for a group rejoin kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 加入组的最大时间
join-rate The number of group joins per second kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 每秒加入组的次数
join-total The total number of group joins kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 加入组的总次数
sync-time-avg The average time taken for a group sync kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 同步组的平均时间
sync-time-max The max time taken for a group sync kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 同步组的最大时间
sync-rate The number of group syncs per second kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 每秒同步组的次数
sync-total The total number of group syncs kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 同步组的总次数
rebalance-latency-avg The average time taken for a group rebalance kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 组重新平衡的平均时间
rebalance-latency-max The max time taken for a group rebalance kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 组重新平衡的最大时间
rebalance-latency-total The total time taken for group rebalances so far kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 到目前为止组重新平衡的总时间
rebalance-total The total number of group rebalances participated kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 参与的组重新平衡总次数
rebalance-rate-per-hour The number of group rebalance participated per hour kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 每小时参与的组重新平衡次数
failed-rebalance-total The total number of failed group rebalances kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 失败的组重新平衡总次数
failed-rebalance-rate-per-hour The number of failed group rebalance events per hour kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 每小时失败的组重新平衡事件次数
last-rebalance-seconds-ago The number of seconds since the last rebalance event kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 自上次重新平衡事件以来的秒数
last-heartbeat-seconds-ago The number of seconds since the last controller heartbeat kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 自上次控制器心跳以来的秒数
partitions-revoked-latency-avg The average time taken by the on-partitions-revoked rebalance listener callback kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 分区撤销重新平衡监听器回调的平均时间
partitions-revoked-latency-max The max time taken by the on-partitions-revoked rebalance listener callback kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 分区撤销重新平衡监听器回调的最大时间
partitions-assigned-latency-avg The average time taken by the on-partitions-assigned rebalance listener callback kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 分配分区重新平衡监听器回调的平均时间
partitions-assigned-latency-max The max time taken by the on-partitions-assigned rebalance listener callback kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 分配分区重新平衡监听器回调的最大时间
partitions-lost-latency-avg The average time taken by the on-partitions-lost rebalance listener callback kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 丢失分区重新平衡监听器回调的平均时间
partitions-lost-latency-max The max time taken by the on-partitions-lost rebalance listener callback kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) 丢失分区重新平衡监听器回调的最大时间

Consumer Fetch Metrics

METRIC NAME DESCRIPTION MBEAN NAME 中文描述
bytes-consumed-rate The average number of bytes consumed per second kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 每秒消费的字节平均数量
bytes-consumed-total The total number of bytes consumed kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 消费的字节总数
fetch-latency-avg The average time taken for a fetch request kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 提取请求的平均时间
fetch-latency-max The max time taken for any fetch request kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 提取请求的最大时间
fetch-rate The number of fetch requests per second kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 每秒提取请求的次数
fetch-size-avg The average number of bytes fetched per request kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 每次请求提取的字节平均数量
fetch-size-max The maximum number of bytes fetched per request kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 每次请求提取的字节最大数量
fetch-throttle-time-avg The average throttle time in ms kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 平均限速时间(毫秒)
fetch-throttle-time-max The maximum throttle time in ms kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 最大限速时间(毫秒)
fetch-total The total number of fetch requests kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 提取请求的总次数
records-consumed-rate The average number of records consumed per second kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 每秒消费的记录平均数量
records-consumed-total The total number of records consumed kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 消费的记录总数
records-lag-max The maximum lag in terms of number of records for any partition in this window kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 任一分区的记录最大滞后数量
records-lead-min The minimum lead in terms of number of records for any partition in this window kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 任一分区的记录最小领先数量
records-per-request-avg The average number of records in each request kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}" 每次请求的记录平均数量
bytes-consumed-rate The average number of bytes consumed per second for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 每秒消费的字节平均数量(按主题)
bytes-consumed-total The total number of bytes consumed for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 消费的字节总数(按主题)
fetch-size-avg The average number of bytes fetched per request for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 每次请求提取的字节平均数量(按主题)
fetch-size-max The maximum number of bytes fetched per request for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 每次请求提取的字节最大数量(按主题)
records-consumed-rate The average number of records consumed per second for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 每秒消费的记录平均数量(按主题)
records-consumed-total The total number of records consumed for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 消费的记录总数(按主题)
records-per-request-avg The average number of records in each request for a topic kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}" 每次请求的记录平均数量(按主题)
preferred-read-replica The current read replica for the partition, or -1 if reading from leader kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 当前分区的读取副本,若从领导者读取则为-1
records-lag The latest lag of the partition kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 分区的最新滞后数量
records-lag-avg The average lag of the partition kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 分区的平均滞后数量
records-lag-max The max lag of the partition kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 分区的最大滞后数量
records-lead The latest lead of the partition kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 分区的最新领先数量
records-lead-avg The average lead of the partition kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 分区的平均领先数量
records-lead-min The min lead of the partition kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}" 分区的最小领先数量
相关推荐
cnsxjean7 小时前
SpringBoot集成Minio实现上传凭证、分片上传、秒传和断点续传
java·前端·spring boot·分布式·后端·中间件·架构
桃园码工13 小时前
3-测试go-redis+redsync实现分布式锁 --开源项目obtain_data测试
redis·分布式·golang
sx_170613 小时前
Spark面试题
大数据·分布式·spark
wclass-zhengge15 小时前
02微服务系统与设计(D1_走出微服务误区:避免从单体到分布式单体)
分布式·微服务·架构
ZOMI酱16 小时前
【AI系统】分布式通信与 NVLink
人工智能·分布式
scc214018 小时前
kafka学习-02
分布式·学习·kafka
lzhlizihang18 小时前
使用Java代码操作Kafka(五):Kafka消费 offset API,包含指定 Offset 消费以及指定时间消费
java·kafka·offset
xidianjiapei00118 小时前
Kafka Transactions: Part 1: Exactly-Once Messaging
分布式·kafka·消息系统·exactly-once语义·精确一次语义
zmd-zk18 小时前
kafka生产者和消费者命令的使用
大数据·分布式·学习·kafka
lzhlizihang18 小时前
Flume和kafka的整合:使用Flume将日志数据抽取到Kafka中
大数据·kafka·flume