Kafka的基本介绍以及扩展

文章目录

基本操作

新增Topic

指定两个分区,两个副本,replication不能大于集群中的broker数

shell 复制代码
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --create --zookeeper hadoop01:2181 --partitions 2 --replication-factor 2 --topic hello
Created topic hello.

查询Topic

shell 复制代码
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --list --zookeeper hadoop01:2181
hello

# 查看详细信息
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --describe --zookeeper hadoop01:2181
Topic: hello	PartitionCount: 2	ReplicationFactor: 2	Configs: 
	Topic: hello	Partition: 0	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: hello	Partition: 1	Leader: 2	Replicas: 2,0	Isr: 2,0

修改Topic

修改partition的数量,只能增加

shell 复制代码
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --alter --zookeeper hadoop01:2181 --partitions 5 --topic hello
WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
Adding partitions succeeded!


[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --describe --zookeeper hadoop01:2181
Topic: hello	PartitionCount: 5	ReplicationFactor: 2	Configs: 
	Topic: hello	Partition: 0	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: hello	Partition: 1	Leader: 2	Replicas: 2,0	Isr: 2,0
	Topic: hello	Partition: 2	Leader: 0	Replicas: 0,2	Isr: 0,2
	Topic: hello	Partition: 3	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: hello	Partition: 4	Leader: 2	Replicas: 2,0	Isr: 2,0

删除Topic

删除topic,删除操作是不可逆的,从1.0开始默认开启删除功能,之前的版本只会标记为删除状态,需要设置delete.topic.enable为true才可以真正删除

shell 复制代码
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --delete --zookeeper hadoop01:2181 --topic helloTopic hello is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.

生产者和消费者

创建生产者

shell 复制代码
bin/kafka-console-prodecer.sh

创建消费者

shell 复制代码
bin/kafka-console-consumer.sh
shell 复制代码
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --create --zookeeper hadoop01:2181 --partitions 5 --replication-factor 2 --topic hello
Created topic hello.

# producer
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-console-producer.sh --broker-list hadoop01:9092 --topic hello

# consumer 这个只消费最新的消息
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-console-consumer.sh  --bootstrap-server hadoop01:9092 --topic hello

# 消费之前的消息
[root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-console-consumer.sh  --bootstrap-server hadoop01:9092 --topic hello --from-beginning

Broker扩展

配置文件server.properties

shell 复制代码
# The number of messages to accept before forcing a flush of data to disk
# 根据条数选择刷新磁盘的时机
log.flush.interval.messages=10000

# 根据消息的间隔时间刷新
# The maximum amount of time a message can sit in a log before we force a flush
log.flush.interval.ms=1000



# The minimum age of a log file to be eligible for deletion due to age  日志保存时间
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies  每隔5分钟检查文件是否满足删除的条件
log.retention.check.interval.ms=300000

Producer扩展

  • Partitioner:根据用户设置的算法(比如根据消息的key来设计算法到底分发到哪个分区里面)来计算发送到哪个分区-Partition,默认是随机
  • 数据通信方式:同步发送和异步发送,同步是指生产者发送数据后,要等待接收方发回响应后再发送下一个数据的通讯方式;异步指发送生产者发送消息后不等接收方响应就立即发送下一条数据的方式,通信方式通过acks的配置来控制。
    • acks:默认为1.表示需要Leader节点回复收到消息
    • acks:all,表示需要所有的Leader节点以及所有的副本节点回复收到消息(acks=-1)
    • acks:0,不需要回复

Topic、Partition、Message扩展

  • 每个Partition在存储层面是Append Log文件,新消息都会被直接追加到log文件的尾部,每条消息在log文件中的位置称为offset(偏移量)
  • 越多的Partition可以容纳更多的Consumer,有效提升并发消费的能力
  • 业务类型增加了可以增加Topic,数据量大需要增加Partition
  • Message:offset,类型是long表示此消息在一个Partition中的起始位置,可以认为offset是Partition中的messageId,自增;MessageSize,类似为int32,表示此消息的字节大小;data,类型为bytes,表示message的具体内容

存储策略

  • 在kafka中每个topic包含1到多个partition,每个partition存储一部分Message,每条Message包含三个属性,其中有一个是Offset
  • Offset相当于这个partition中的message的唯一ID,可以通过分段+索引的方式去找到这个message;分段就是segment文件,每个partition由多个segment文件组成;索引就是index,每个index里面都会记录每个segment文件中的第一条数据的偏移量,然后根据这个偏移量就可以去segment文件中找到对应的消息
shell 复制代码
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
# 这个配置就表示每个segment文件的大小,超过这个大小就会再创建一个新的文件
log.segment.bytes=1073741824

kafka消息的存储流程:producer生产的消息会被发送到Topic的多个Partition上面,Topic收到消息之后会往partition的最后一个segment文件中添加这条消息,文件达到一定大小后会创建新的文件

容错机制

  • 一个Broker宕机后对集群的影响不大

    shell 复制代码
    # 模拟节点宕机
    [root@hadoop01 config]# jps
    41728 NameNode
    53523 Kafka
    42246 ResourceManager
    59789 Jps
    41998 SecondaryNameNode
    52655 QuorumPeerMain
    [root@hadoop01 config]# kill 53523
    [root@hadoop01 config]# jps
    41728 NameNode
    59809 Jps
    42246 ResourceManager
    41998 SecondaryNameNode
    52655 QuorumPeerMain
    
    # 连接到kafka
    [root@hadoop01 zookeeper3.8.4]# bin/zkCli.sh
    [zk: localhost:2181(CONNECTED) 0] ls /
    [admin, brokers, cluster, config, consumers, controller, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]
    [zk: localhost:2181(CONNECTED) 1] ls /brokers 
    [ids, seqid, topics]
    [zk: localhost:2181(CONNECTED) 2] ls /brokers/ids 
    [1, 2]
    [zk: localhost:2181(CONNECTED) 4] get /brokers/ids/1
    {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://hadoop02:9092"],"jmx_port":-1,"host":"hadoop02","timestamp":"1710206078306","port":9092,"version":4}
    [zk: localhost:2181(CONNECTED) 5] 
    shell 复制代码
    # zookeeper会重新选举leader
    [root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --describe --zookeeper hadoop01:2181 --topic hello
    Topic: hello	PartitionCount: 5	ReplicationFactor: 2	Configs: 
    	Topic: hello	Partition: 0	Leader: 2	Replicas: 0,2	Isr: 2
    	Topic: hello	Partition: 1	Leader: 1	Replicas: 1,0	Isr: 1
    	Topic: hello	Partition: 2	Leader: 2	Replicas: 2,1	Isr: 2,1
    	Topic: hello	Partition: 3	Leader: 1	Replicas: 0,1	Isr: 1
    	Topic: hello	Partition: 4	Leader: 1	Replicas: 1,2	Isr: 1,2
    You have new mail in /var/spool/mail/root
    • 当kafka集群中新增一个Broker节点,zookeeper会自动识别并在适当的时机选择此节点提供Leader服务
    shell 复制代码
    # 重新启动
    [root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-server-start.sh -daemon config/server.properties 
    You have new mail in /var/spool/mail/root
    [root@hadoop01 kafka_2.12-2.4.0]# jps
    41728 NameNode
    60640 Kafka
    60707 Jps
    42246 ResourceManager
    41998 SecondaryNameNode
    52655 QuorumPeerMain
    
    # 进入zookeeper观察
    [zk: localhost:2181(CONNECTED) 2] ls /brokers/ids
    [0, 1, 2]
    [zk: localhost:2181(CONNECTED) 3] get /brokers/ids/0
    {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://hadoop01:9092"],"jmx_port":-1,"host":"hadoop01","timestamp":"1710221534732","port":9092,"version":4}
    
    # 查询kafka topic信息
    [root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --describe --zookeeper hadoop01:2181 --topic hello
    Topic: hello	PartitionCount: 5	ReplicationFactor: 2	Configs: 
    	Topic: hello	Partition: 0	Leader: 2	Replicas: 0,2	Isr: 2,0
    	Topic: hello	Partition: 1	Leader: 1	Replicas: 1,0	Isr: 1,0
    	Topic: hello	Partition: 2	Leader: 2	Replicas: 2,1	Isr: 2,1
    	Topic: hello	Partition: 3	Leader: 1	Replicas: 0,1	Isr: 1,0
    	Topic: hello	Partition: 4	Leader: 1	Replicas: 1,2	Isr: 1,2
    You have new mail in /var/spool/mail/root
  • 新启动的几点不会是任何分区的leader,所以要重新均匀分配,其实不分配也可以,在kafka中有对应的配置

    properties 复制代码
     [root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-leader-election.sh --bootstrap-server hadoop01:9092 --election-type preferred --all-topic-partitions
    You have new mail in /var/spool/mail/root
    [root@hadoop01 kafka_2.12-2.4.0]# bin/kafka-topics.sh --describe --zookeeper hadoop01:2181 --topic hello
    Topic: hello	PartitionCount: 5	ReplicationFactor: 2	Configs: 
    	Topic: hello	Partition: 0	Leader: 0	Replicas: 0,2	Isr: 2,0
    	Topic: hello	Partition: 1	Leader: 1	Replicas: 1,0	Isr: 1,0
    	Topic: hello	Partition: 2	Leader: 2	Replicas: 2,1	Isr: 2,1
    	Topic: hello	Partition: 3	Leader: 0	Replicas: 0,1	Isr: 1,0
    	Topic: hello	Partition: 4	Leader: 1	Replicas: 1,2	Isr: 1,2

在kafka中的Broker是无状态的,本身是不保存任何信息的,Broker的所有信息都放在zookeeper里面了,所以,Broker进程挂掉或者启动,对集群的影响不大!

相关推荐
奔跑吧邓邓子3 小时前
大数据利器Hadoop:从基础到实战,一篇文章掌握大数据处理精髓!
大数据·hadoop·分布式
weixin_453965008 小时前
[单master节点k8s部署]30.ceph分布式存储(一)
分布式·ceph·kubernetes
weixin_453965008 小时前
[单master节点k8s部署]32.ceph分布式存储(三)
分布式·ceph·kubernetes
Dylanioucn10 小时前
【分布式微服务云原生】掌握分布式缓存:Redis与Memcached的深入解析与实战指南
分布式·缓存·云原生
weixin_4539650013 小时前
[单master节点k8s部署]31.ceph分布式存储(二)
分布式·ceph·kubernetes
坎坎坷坷.13 小时前
分布式理论:拜占庭将军问题
分布式
极客先躯19 小时前
高级java每日一道面试题-2024年10月3日-分布式篇-分布式系统中的容错策略都有哪些?
java·分布式·版本控制·共识算法·超时重试·心跳检测·容错策略
niu_sama20 小时前
仿RabbitMQ实现消息队列三种主题的调试及源码
分布式·rabbitmq
鸡c20 小时前
rabbitMq------客户端模块
分布式·rabbitmq·ruby
猿java21 小时前
使用 Kafka面临的挑战
java·后端·kafka