kafka leader traffic 重平衡

To rebalance broker traffic after a leader imbalance in Apache Kafka, you need to redistribute partition leaders more evenly across your brokers. This ensures that no broker becomes a bottleneck due to hosting too many partition leaders (which handle all client requests).


🧭 Why Leader Imbalance Happens

Leader imbalance can occur when:

  • New brokers are added
  • Broker restarts or failures occur
  • Uneven topic partition assignments
  • Manual reassignment was skipped or misconfigured

✅ Solutions to Rebalance Leader Traffic

✅ Option 1: Use Kafka's Built-In Auto Balancer (Kafka 2.4+)

Kafka has an auto leader rebalance feature.

Check if auto rebalance is enabled:

bash 复制代码
kafka-configs.sh --bootstrap-server <broker>:9092 \
  --entity-type brokers \
  --entity-name <broker-id> \
  --all \
  --describe | grep leader

Look for:

text 复制代码
auto.leader.rebalance.enable=true

Example:

bash 复制代码
bash-4.4$ kafka-configs --bootstrap-server localhost:9092 --entity-type brokers --entity-name 1 --all  --describe | grep leader
  auto.leader.rebalance.enable=true sensitive=false synonyms={DEFAULT_CONFIG:auto.leader.rebalance.enable=true}
  leader.imbalance.check.interval.seconds=300 sensitive=false synonyms={DEFAULT_CONFIG:leader.imbalance.check.interval.seconds=300}
  leader.imbalance.per.broker.percentage=10 sensitive=false synonyms={DEFAULT_CONFIG:leader.imbalance.per.broker.percentage=10}
  unclean.leader.election.enable=false sensitive=false synonyms={DEFAULT_CONFIG:unclean.leader.election.enable=false}

This is the default setting, but if it's false, enable it:

bash 复制代码
kafka-configs.sh --bootstrap-server <broker>:9092 \
  --entity-type brokers \
  --entity-name <broker-id> \
  --alter \
  --add-config auto.leader.rebalance.enable=true

Kafka will then periodically rebalance leaders using leader.imbalance.check.interval.seconds.

重平衡过程:

系统会自动将其他broker上的一些leader转移到新broker,

前提是新broker必须已经是该partition的follower。

转移是渐进式的,不会一次性全部迁移。

自动 Leader 重平衡 仅调整 Leader 角色,不会自动调整 Partition 的副本分配(即不会自动把 Follower 迁移到新 Broker)。


✅ Option 2: Trigger Manual Leader Reassignment

Use the built-in script:

bash 复制代码
kafka-preferred-replica-election.sh --bootstrap-server <broker>:9092

This command triggers preferred leader election . Kafka will try to move partition leaders to their preferred replicas (usually the first broker in the replica list).

This is useful if leader imbalance occurred due to broker failures or restarts.


✅ Option 3: Full Partition Reassignment (Advanced)

If imbalance persists, you may need a full partition reassignment to redistribute both partition leaders and replicas.

Steps:

a) Generate current assignment:
bash 复制代码
kafka-reassign-partitions.sh \
  --bootstrap-server <broker>:9092 \
  --generate \
  --topics-to-move-json-file topics.json \
  --broker-list "1,2,3"

topics.json format:

json 复制代码
{
  "topics": [{"topic": "your-topic-name"}],
  "version": 1
}
b) Review the proposed plan, then execute:
bash 复制代码
kafka-reassign-partitions.sh \
  --bootstrap-server <broker>:9092 \
  --execute \
  --reassignment-json-file reassignment.json

✅ Option 4: Use Cruise Control (Recommended for Large Clusters)

LinkedIn's Cruise Control is a powerful tool to automate:

  • Traffic balancing
  • Leader and replica assignment
  • Self-healing after broker failure

📌 Summary

Method Use When Command
Auto Leader Rebalance Minor imbalance auto.leader.rebalance.enable=true
Manual Election Leader drift after broker restart kafka-preferred-replica-election.sh
Full Reassignment Major imbalance or cluster expansion kafka-reassign-partitions.sh
Cruise Control Ongoing traffic management Cruise Control API/UI
相关推荐
一条闲鱼_mytube11 小时前
《分布式事务实战完全指南》:从理论到实践
分布式
这周也會开心11 小时前
RabbitMQ知识点
分布式·rabbitmq
岁岁种桃花儿12 小时前
Kafka从入门到上天系列第三篇:基础架构推演+基础组件图形推演
分布式·kafka
qq_12498707531 天前
基于Hadoop的信贷风险评估的数据可视化分析与预测系统的设计与实现(源码+论文+部署+安装)
大数据·人工智能·hadoop·分布式·信息可视化·毕业设计·计算机毕业设计
ask_baidu1 天前
KafkaUtils
kafka·bigdata
洛豳枭薰1 天前
消息队列关键问题描述
kafka·rabbitmq·rocketmq
lucky67071 天前
Spring Boot集成Kafka:最佳实践与详细指南
spring boot·kafka·linq
Coder_Boy_1 天前
基于Spring AI的分布式在线考试系统-事件处理架构实现方案
人工智能·spring boot·分布式·spring
袁煦丞 cpolar内网穿透实验室1 天前
远程调试内网 Kafka 不再求运维!cpolar 内网穿透实验室第 791 个成功挑战
运维·分布式·kafka·远程工作·内网穿透·cpolar