作者介绍:简历上没有一个精通的运维工程师。请点击上方的蓝色《运维小路》关注我,下面的思维导图也是预计更新的内容和当前进度(不定时更新)。

我们上一章介绍了中间件:Zookeeper,本章将介绍另外一个中间件:Kafka。目前这2个中间件都是基于JAVA语言的。
我们上个小节介绍了生产者怎么给afka发送数据,本小节我们来介绍消费者(Consumer),以及我们如何消费数据。
目前的Kafka基本都以消费者组进行消费数据,不同的消费者组可以重复消费Topic里面的消息。相同的消费者使用同一个消费者组ID,就可以组成一个消费者组。
消费者(Consumer)
消费者负责从 Kafka Topic 的 分区(Partition) 中读取消息,并处理这些消息。其核心目标是为应用程序提供 高吞吐、可靠、可扩展 的消息消费能力。消费者从Kafka读取(消费)消息以后,一般还需要进行后续操作,比如写入数据库或者ElasticSearch等。
- 消费者组(Consumer Group)
-
多个消费者可以组成一个消费者组,共同消费一个 Topic。一个消费者可以订阅多个分区,但是消费者不能大于分区,否则会出现部分消费者无法和分区匹配上而出现不工作。
-
分区分配:每个 Partition 只能被同一消费者组内的一个消费者消费(实现并行处理)。
-
再平衡(Rebalance):增加或者减少消费者实例时,Kafka 自动重新分配分区(需触发再平衡)。
-
默认如果不指定,则会被kafka随机分配一个消费者组。
简单来说,一个Topic有3个分区:
如果只有一个消费者,则它会消费3个分区。
如果有两个消费者,则出现一个消费者消费一个分区,一个消费两个分区 。
如果有三个消费者,则一个消费者消费一个分区。
如果有四个及以上消费者,则只有三个消费者消费分区,其他消费者闲置。
csharp
#这个Topic已经被我写入了部分数据
[root@localhost kafka_2.13-2.8.2]# ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 192.168.31.143:9092 --topic my-topic123 -time -1
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
my-topic123:0:58
my-topic123:1:60
my-topic123:2:53
#手动使用Kafka的命令进行消费
./bin/kafka-console-consumer.sh \
--topic my-topic123 \
--bootstrap-server 192.168.31.143:9092 \
--from-beginning
#有它就是重头消费,没有就是从当前消费
--from-beginning
Kafka的日志
csharp
[2025-05-08 22:39:43,721] INFO [GroupCoordinator 0]: Stabilized group console-consumer-44198 generation 1 (__consumer_offsets-37) with 1 members (kafka.coordinator.group.GroupCoordinator)
[2025-05-08 22:39:43,737] INFO [GroupCoordinator 0]: Assignment received from leader for group console-consumer-44198 for generation 1. The group has 1 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator)
消费者组 console-consumer-44198 已稳定下来,有 1 个消费者加入了这个组,因为只有 1 个消费者,它被分配了所有属于该主题的分区。这通常是单消费者情况下看到的行为。由于我们未指定消费组,所以每次启动命令都会随机生成一个消费组。
我们通过下面的代码使用相同的消费者组进行消费。
ini
from kafka import KafkaConsumer
consumer = KafkaConsumer(
'my-topic123',
bootstrap_servers='192.168.31.143:9092',
group_id='my-python-consumer-group', # 消费组 ID
enable_auto_commit=True, # 自动提交 offset
auto_commit_interval_ms=5000, # 每隔 5 秒提交一次
auto_offset_reset='earliest', # 从最早开始读
value_deserializer=lambda m: m.decode('utf-8') # 解码为字符串
)
try:
for message in consumer:
print(f"Received message: {message.value} from partition {message.partition}")
# 如果你想手动控制 offset 提交:
# consumer.commit()
except KeyboardInterrupt:
print("Stopping consumer...")
finally:
consumer.close()
当前只有一个消费者进行消费,所以它同时消费了3个分区的内容。
css
Received message: {"timestamp": "2025-05-08 22:49:21", "count": 1031, "data": "Message-1031"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:22", "count": 1032, "data": "Message-1032"} from partition 2
Received message: {"timestamp": "2025-05-08 22:49:23", "count": 1033, "data": "Message-1033"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:24", "count": 1034, "data": "Message-1034"} from partition 2
Received message: {"timestamp": "2025-05-08 22:49:25", "count": 1035, "data": "Message-1035"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:26", "count": 1036, "data": "Message-1036"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:27", "count": 1037, "data": "Message-1037"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:28", "count": 1038, "data": "Message-1038"} from partition 0
Received message: {"timestamp": "2025-05-08 22:49:29", "count": 1039, "data": "Message-1039"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:30", "count": 1040, "data": "Message-1040"} from partition 2
Received message: {"timestamp": "2025-05-08 22:49:31", "count": 1041, "data": "Message-1041"} from partition 1
Received message: {"timestamp": "2025-05-08 22:49:32", "count": 1042, "data": "Message-1042"} from partition 1
新加入消费者进行消费,这个时候有2个消费者:所以就发生了在平衡操作,当前消费者只消费分区2的数据。
css
#只读取分区2数据
[root@localhost kafka_2.13-2.8.2]# python3 cour.py
Received message: {"timestamp": "2025-05-08 22:50:38", "count": 1107, "data": "Message-1107"} from partition 2
Received message: {"timestamp": "2025-05-08 22:50:44", "count": 1113, "data": "Message-1113"} from partition 2
Received message: {"timestamp": "2025-05-08 22:50:46", "count": 1115, "data": "Message-1115"} from partition 2
Received message: {"timestamp": "2025-05-08 22:50:49", "count": 1118, "data": "Message-1118"} from partition 2
Received message: {"timestamp": "2025-05-08 22:50:50", "count": 1119, "data": "Message-1119"} from partition 2
Received message: {"timestamp": "2025-05-08 22:50:53", "count": 1122, "data": "Message-1122"} from partition 2
刚才的第一个消费者消费另外2个分区。
css
#只剩余分区0和分区1数据
Received message: {"timestamp": "2025-05-08 22:52:27", "count": 1216, "data": "Message-1216"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:28", "count": 1217, "data": "Message-1217"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:37", "count": 1226, "data": "Message-1226"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:38", "count": 1227, "data": "Message-1227"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:40", "count": 1229, "data": "Message-1229"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:44", "count": 1233, "data": "Message-1233"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:45", "count": 1234, "data": "Message-1234"} from partition 0
Received message: {"timestamp": "2025-05-08 22:52:46", "count": 1235, "data": "Message-1235"} from partition 0
Received message: {"timestamp": "2025-05-08 22:52:47", "count": 1236, "data": "Message-1236"} from partition 0
Received message: {"timestamp": "2025-05-08 22:52:48", "count": 1237, "data": "Message-1237"} from partition 1
Received message: {"timestamp": "2025-05-08 22:52:49", "count": 1238, "data": "Message-1238"} from partition 0
运维小路
一个不会开发的运维!一个要学开发的运维!一个学不会开发的运维!欢迎大家骚扰的运维!
关注微信公众号《运维小路》获取更多内容。