kafka leader traffic 重平衡

To rebalance broker traffic after a leader imbalance in Apache Kafka, you need to redistribute partition leaders more evenly across your brokers. This ensures that no broker becomes a bottleneck due to hosting too many partition leaders (which handle all client requests).


🧭 Why Leader Imbalance Happens

Leader imbalance can occur when:

  • New brokers are added
  • Broker restarts or failures occur
  • Uneven topic partition assignments
  • Manual reassignment was skipped or misconfigured

✅ Solutions to Rebalance Leader Traffic

✅ Option 1: Use Kafka's Built-In Auto Balancer (Kafka 2.4+)

Kafka has an auto leader rebalance feature.

Check if auto rebalance is enabled:

bash 复制代码
kafka-configs.sh --bootstrap-server <broker>:9092 \
  --entity-type brokers \
  --entity-name <broker-id> \
  --all \
  --describe | grep leader

Look for:

text 复制代码
auto.leader.rebalance.enable=true

Example:

bash 复制代码
bash-4.4$ kafka-configs --bootstrap-server localhost:9092 --entity-type brokers --entity-name 1 --all  --describe | grep leader
  auto.leader.rebalance.enable=true sensitive=false synonyms={DEFAULT_CONFIG:auto.leader.rebalance.enable=true}
  leader.imbalance.check.interval.seconds=300 sensitive=false synonyms={DEFAULT_CONFIG:leader.imbalance.check.interval.seconds=300}
  leader.imbalance.per.broker.percentage=10 sensitive=false synonyms={DEFAULT_CONFIG:leader.imbalance.per.broker.percentage=10}
  unclean.leader.election.enable=false sensitive=false synonyms={DEFAULT_CONFIG:unclean.leader.election.enable=false}

This is the default setting, but if it's false, enable it:

bash 复制代码
kafka-configs.sh --bootstrap-server <broker>:9092 \
  --entity-type brokers \
  --entity-name <broker-id> \
  --alter \
  --add-config auto.leader.rebalance.enable=true

Kafka will then periodically rebalance leaders using leader.imbalance.check.interval.seconds.

重平衡过程:

系统会自动将其他broker上的一些leader转移到新broker,

前提是新broker必须已经是该partition的follower。

转移是渐进式的,不会一次性全部迁移。

自动 Leader 重平衡 仅调整 Leader 角色,不会自动调整 Partition 的副本分配(即不会自动把 Follower 迁移到新 Broker)。


✅ Option 2: Trigger Manual Leader Reassignment

Use the built-in script:

bash 复制代码
kafka-preferred-replica-election.sh --bootstrap-server <broker>:9092

This command triggers preferred leader election . Kafka will try to move partition leaders to their preferred replicas (usually the first broker in the replica list).

This is useful if leader imbalance occurred due to broker failures or restarts.


✅ Option 3: Full Partition Reassignment (Advanced)

If imbalance persists, you may need a full partition reassignment to redistribute both partition leaders and replicas.

Steps:

a) Generate current assignment:
bash 复制代码
kafka-reassign-partitions.sh \
  --bootstrap-server <broker>:9092 \
  --generate \
  --topics-to-move-json-file topics.json \
  --broker-list "1,2,3"

topics.json format:

json 复制代码
{
  "topics": [{"topic": "your-topic-name"}],
  "version": 1
}
b) Review the proposed plan, then execute:
bash 复制代码
kafka-reassign-partitions.sh \
  --bootstrap-server <broker>:9092 \
  --execute \
  --reassignment-json-file reassignment.json

✅ Option 4: Use Cruise Control (Recommended for Large Clusters)

LinkedIn's Cruise Control is a powerful tool to automate:

  • Traffic balancing
  • Leader and replica assignment
  • Self-healing after broker failure

📌 Summary

Method Use When Command
Auto Leader Rebalance Minor imbalance auto.leader.rebalance.enable=true
Manual Election Leader drift after broker restart kafka-preferred-replica-election.sh
Full Reassignment Major imbalance or cluster expansion kafka-reassign-partitions.sh
Cruise Control Ongoing traffic management Cruise Control API/UI
相关推荐
SuperHeroWu72 小时前
【HarmonyOS 6】UIAbility跨设备连接详解(分布式软总线运用)
分布式·华为·harmonyos·鸿蒙·连接·分布式协同·跨设备链接
杜子不疼.3 小时前
【探索实战】从0到1打造分布式云原生平台:Kurator全栈实践指南
分布式·云原生
最笨的羊羊4 小时前
Flink CDC系列之:Kafka CSV 序列化器CsvSerializationSchema
kafka·csv·schema·flink cdc系列·serialization·序列化器
最笨的羊羊4 小时前
Flink CDC系列之:Kafka的Debezium JSON 结构定义类DebeziumJsonStruct
kafka·debezium·flink cdc系列·debezium json·结构定义类·jsonstruct
m***l1155 小时前
集成RabbitMQ+MQ常用操作
分布式·rabbitmq
拾忆,想起7 小时前
Dubbo分组(Group)使用指南:实现服务接口的多版本管理与环境隔离
分布式·微服务·性能优化·架构·dubbo
回家路上绕了弯7 小时前
彻底解决超卖问题:从单体到分布式的全场景技术方案
分布式·后端
拾忆,想起8 小时前
Dubbo动态配置实时生效全攻略:零停机实现配置热更新
分布式·微服务·性能优化·架构·dubbo
每天进步一点_JL1 天前
事务与消息中间件:分布式系统中的可见性边界问题
分布式·后端
静若繁花_jingjing1 天前
ZooKeeper & Nacos
分布式·zookeeper·云原生