Kafka新节点加入集群操作指南

一、环境准备

1. Java环境安装

bash 复制代码

# 安装JDK
apt-get update
apt-get install openjdk-8-jdk -y

2. 下载并解压

bash 复制代码

wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.13-2.8.1.tgz
tar xf kafka_2.13-2.8.1.tgz
mv kafka_2.13-2.8.1 kafka

二、配置环境变量

1. 创建kafka.sh

bash 复制代码

vim /etc/profile.d/kafka.sh

# 添加以下内容
export KAFKA_HOME=/data/kafka
export PATH=$PATH:$KAFKA_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
export KAFKA_OPTS="-Djava.security.auth.login.config=/data/kafka/config/kafka_jaas.conf"

2. 使环境变量生效

bash 复制代码

source /etc/profile.d/kafka.sh

三、配置文件准备

拷贝之前的

1. 修改server.properties

properties 复制代码

# 基础配置
broker.id=4  # 确保集群中唯一
delete.topic.enable=true

# 认证配置
listeners=SASL_PLAINTEXT://172.24.77.18:9092  # 修改为本机IP
advertised.listeners=SASL_PLAINTEXT://172.24.77.18:9092  # 修改为本机IP
inter.broker.listener.name=SASL_PLAINTEXT
security.protocol=SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-256
sasl.enabled.mechanisms=SCRAM-SHA-256,PLAIN

# ACL配置
allow.everyone.if.no.acl.found=true
super.users=User:super
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer

# 性能配置
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

# 日志配置
log.dirs=/data/kafka/data
num.partitions=3
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

# ZooKeeper配置
zookeeper.connect=172.24.77.13:2181,172.24.77.14:2181  # 修改为实际ZK地址
zookeeper.connection.timeout.ms=18000
zookeeper.set.acl=true

# 复制配置
default.replication.factor=3
min.insync.replicas=2

2. 配置kafka_jaas.conf

properties 复制代码

KafkaServer {
    org.apache.kafka.common.security.scram.ScramLoginModule required
    username="super"
    password="super_password";
};
Client {
    org.apache.zookeeper.server.auth.DigestLoginModule required
    username="super"
    password="super_password";
};
# 客户端会youxian
KafkaClient {    
    org.apache.kafka.common.security.scram.ScramLoginModule required
    username="super"
    password="super_password";
};

3. 配置admin-config.properties

properties 复制代码

sasl.mechanism=SCRAM-SHA-256
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
    username="super" \
    password="super_password";

四、启动服务

1. 启动Kafka

bash 复制代码

cd /data/kafka
bin/kafka-server-start.sh -daemon config/server.properties

2. 检查启动状态

bash 复制代码

# 查看进程
jps | grep Kafka

# 查看日志
tail -200 /data/kafka/logs/server.log

五、验证配置并迁移topic

1. 检查集群状态

bash 复制代码

# 创建测试主题
bin/kafka-topics.sh --create \
    --topic test \
    --partitions 3 \
    --replication-factor 3 \
    --bootstrap-server localhost:9092 \
    --command-config config/admin-config.properties

# 查看主题信息
bin/kafka-topics.sh --describe \
    --topic test \
    --bootstrap-server localhost:9092 \
    --command-config config/admin-config.properties

2. 验证复制

bash 复制代码

# 检查topic
bin/kafka-topics.sh --list     --bootstrap-server 172.24.77.18:9092     --command-config config/admin-config.properties
# 检查分区分布
bin/kafka-topics.sh --describe \
    --topic test-cluster \
    --bootstrap-server 172.24.77.18:9092 \
    --command-config config/admin-config.properties

3. 查看节点状态

shell 复制代码

[zk: localhost:2181(CONNECTED) 2] ls /brokers/ids
[1, 2, 3, 4]
[zk: localhost:2181(CONNECTED) 3] get /brokers/ids/4
{"features":{},"listener_security_protocol_map":{"SASL_PLAINTEXT":"SASL_PLAINTEXT"},"endpoints":["SASL_PLAINTEXT://172.24.77.18:9092"],"jmx_port":-1,"port":-1,"host":null,"version":5,"timestamp":"1731397341315"}

说明节点加入成功了

4. 假装broker3不要了，换成broker4

shell 复制代码

# 查看当前topic详情
bin/kafka-topics.sh --describe \
    --topic test-cluster \
    --bootstrap-server 172.24.77.18:9092 \
    --command-config config/admin-config.properties

# 当前状态：
Topic: test-cluster	
Partition: 0	Leader: 1	Replicas: 3,2,1	Isr: 1
Partition: 1	Leader: 1	Replicas: 2,3,1	Isr: 1
Partition: 2	Leader: 1	Replicas: 2,1,3	Isr: 1

5. 执行分区重新分配

shell 复制代码

# 1. 先列出要迁移的topics
cat > move-topics.json << EOF
{
  "topics": [
    {"topic": "test-cluster"}
  ],
  "version": 1
}
EOF

# 2. 生成迁移计划
bin/kafka-reassign-partitions.sh \
  --bootstrap-server 172.24.77.18:9092 \
  --command-config config/admin-config.properties \
  --topics-to-move-json-file move-topics.json \
  --broker-list "1,2,4" \
  --generate
会有两段json配置，当前分区信息和预分配的分区信息
Current partition replica assignment   ---这个保存下来回滚使用
...
Proposed partition reassignment configuration   ---这个迁移使用
...
# 3.  reassignment.json
把`Proposed partition reassignment configuration`下的内容放到reassignment.json里
# 4. 执行迁移并限流，限流100MB
bin/kafka-reassign-partitions.sh   --bootstrap-server 172.24.77.18:9092   --command-config config/admin-config.properties \
--reassignment-json-file reassignment.json   --execute   --throttle 100000000

# 5. 迁移后解除限流可以使用--verify参数，验证的同时解除限流
bin/kafka-reassign-partitions.sh \
  --bootstrap-server 172.24.77.18:9092 \
  --command-config config/admin-config.properties \
  --reassignment-json-file reassignment.json \
  --verify
提示：
Clearing broker-level throttles on brokers ...  # 清除 broker 级别的限流
Clearing topic-level throttles on topic ...     # 清除 topic 级别的限流

验证过程：
先检查所有分区的迁移状态
如果迁移完成，自动清除 broker 级别的限流
自动清除 topic 级别的限流

6. 理解限流的影响

复制代码

Broker1: 
  - topic-A (迁移中)
  - topic-B (正常)
  - topic-C (正常)

Broker2:
  - topic-D (完全不参与迁移)
  
Broker1 参与迁移，所以它上面的 topic-A、B、C 都会受到带宽限制
Broker2 没参与迁移，完全不受影响

所以在生产环境中要注意：
合理设置限流值，考虑到 broker 上其他 topic 的正常运行
尽量在业务低峰期进行迁移
可以分批迁移 topic，避免影响面过大

# 查看topic详细信息
 $ /data/kafka/bin/kafka-topics.sh --describe --topic test-cluster \
 --bootstrap-server 172.24.77.18:9092 --command-config 
 config/admin-config.properties

六、故障排查

1. 常见问题检查

bash 复制代码

# 检查端口
netstat -nltp | grep 9092

# 检查连接
telnet localhost 9092

# 检查ZK连接
echo stat | nc localhost 2181

2. 日志检查

bash 复制代码

# 检查Kafka日志
tail -f /data/kafka/logs/server.log

# 检查系统日志
journalctl -u kafka