🔥 极简版:每天凌晨2点自动备份 + 手动完整恢复

一、自动备份(只在 1 台节点做)
✅ 执行节点:k8s-etcd2(192.168.44.107)
集群认证密钥目录(按照你真是环境):
root@k8s-etcd2:~# find / -name "*etcd*.pem" 2>/dev/null
/etc/kubeasz/clusters/k8s-cluster1/ssl/etcd-key.pem
/etc/kubeasz/clusters/k8s-cluster1/ssl/etcd.pem

1. 创建备份目录
mkdir -p /data/nfs/etcd-backup
2. 创建自动备份脚本(已修正:只连本机)
cat > /root/etcd-backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/data/nfs/etcd-backup"
mkdir -p $BACKUP_DIR
etcdctl \
--endpoints=https://192.168.44.107:2379 \
--cacert=/etc/kubeasz/clusters/k8s-cluster1/ssl/ca.pem \
--cert=/etc/kubeasz/clusters/k8s-cluster1/ssl/etcd.pem \
--key=/etc/kubeasz/clusters/k8s-cluster1/ssl/etcd-key.pem \
snapshot save ${BACKUP_DIR}/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db
# 自动删除30天前的旧备份
find $BACKUP_DIR -name "etcd-snapshot-*.db" -mtime +30 -delete
EOF
chmod +x /root/etcd-backup.sh
3. 设置定时:每天凌晨 2:00 自动备份 1 次--乌班图系统
echo "0 2 * * * /root/etcd-backup.sh" >> /var/spool/cron/crontabs/root
systemctl restart cron
4. 查看定时任务是否生效
crontab -l
看到 0 2 * * * /root/etcd-backup.sh 即成功。
二、手动恢复流程(必须严格按节点执行)
恢复前提
- 集群异常、etcd 不可用
- 使用
/data/nfs/etcd-backup/下的 同一个备份文件 - 业务低峰期操作
【第一步:停止所有相关服务】
1. 在 3台 Master 节点执行(101、102、103)
systemctl stop kube-apiserver
systemctl stop kube-controller-manager
systemctl stop kube-scheduler
2. 在 3台 Etcd 节点执行(106、107、108)
systemctl stop etcd
【第二步:复制备份文件到 3 台 Etcd】
执行节点:k8s-etcd2(107)
# 把备份文件复制到 3 台 etcd 节点的 /var/lib/ 目录
scp /data/nfs/etcd-backup/etcd-snapshot-20260331-071654.db 192.168.44.106:/var/lib/
scp /data/nfs/etcd-backup/etcd-snapshot-20260331-071654.db 192.168.44.107:/var/lib/
scp /data/nfs/etcd-backup/ tcd-snapshot-20260331-071654.db 192.168.44.108:/var/lib/
【第三步:3台 Etcd 分别执行恢复(关键!)】
👉 节点 1:k8s-etcd1(192.168.44.106)
mv /var/lib/etcd /var/lib/etcd.bak
mkdir -p /var/lib/etcd
etcdctl snapshot restore /var/lib/你要恢复的备份文件.db \
--data-dir=/var/lib/etcd \
--name=k8s-etcd1 \
--initial-cluster=k8s-etcd1=https://192.168.44.106:2380,k8s-etcd2=https://192.168.44.107:2380,k8s-etcd3=https://192.168.44.108:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://192.168.44.106:2380
👉 节点 2:k8s-etcd2(192.168.44.107)← 你常用节点
mv /var/lib/etcd /var/lib/etcd.bak
mkdir -p /var/lib/etcd
etcdctl snapshot restore /var/lib/你要恢复的备份文件.db \
--data-dir=/var/lib/etcd \
--name=k8s-etcd2 \
--initial-cluster=k8s-etcd1=https://192.168.44.106:2380,k8s-etcd2=https://192.168.44.107:2380,k8s-etcd3=https://192.168.44.108:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://192.168.44.107:2380
👉 节点 3:k8s-etcd3(192.168.44.108)
mv /var/lib/etcd /var/lib/etcd.bak
mkdir -p /var/lib/etcd
etcdctl snapshot restore /var/lib/你要恢复的备份文件.db \
--data-dir=/var/lib/etcd \
--name=k8s-etcd3 \
--initial-cluster=k8s-etcd1=https://192.168.44.106:2380,k8s-etcd2=https://192.168.44.107:2380,k8s-etcd3=https://192.168.44.108:2380 \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://192.168.44.108:2380
【第四步:启动服务】
1. 在 3台 Etcd 节点执行
systemctl start etcd
systemctl status etcd
2. 在 3台 Master 节点执行
systemctl start kube-apiserver
systemctl start kube-controller-manager
systemctl start kube-scheduler
【第五步:验证恢复成功】
任意 Master 节点执行:
kubectl get nodes
kubectl get pods -A
🔥 一句话终极总结
|----------|-------------------------------------------|
| 项目 | 内容 |
| 自动备份 | 只在 k8s-etcd2(107)配置,每天凌晨2点一次 |
| 恢复 | 3台 Etcd 各跑各的命令,必须用同一份备份 |
| 证书路径 | /etc/kubeasz/clusters/k8s-cluster1/ssl/ |
| 备份存放 | 192.168.44.120:/data/nfs/etcd-backup |
一、NFS 服务器配置(192.168.44.120)
# 创建共享目录
mkdir -p /data/nfs/etcd-backup
chmod 755 /data/nfs
# 安装 NFS 服务
apt update && apt install -y nfs-kernel-server
# 配置共享
echo "/data/nfs *(rw,sync,no_root_squash,no_subtree_check)" > /etc/exports
# 启动服务
systemctl restart nfs-server
systemctl enable nfs-server
二、k8s-etcd2 挂载 NFS(192.168.44.107)
# 安装 NFS 客户端
apt update && apt install -y nfs-common
# 挂载 NFS
mount -t nfs 192.168.44.120:/data/nfs /data/nfs
# 设置开机自动挂载
echo "192.168.44.120:/data/nfs /data/nfs nfs defaults 0 0" >> /etc/fstab
# 验证挂载
df -h /data/nfs | grep nfs
107ETCD节点:
root@k8s-etcd2:/data/nfs/etcd-backup# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/ubuntu-vg/ubuntu-lv during curtin installation
/dev/disk/by-id/dm-uuid-LVM-t1ie4orXuJ2lGIfCnfey60oGrDyMLJhOt2vUhlkSOd8Fvf1NL1H4ypB4q24GQ93W / ext4 defaults 0 1
# /boot was on /dev/sda2 during curtin installation
/dev/disk/by-uuid/c520645e-d80b-434b-b6a8-2213bd9061c6 /boot ext4 defaults 0 1
192.168.44.120:/data/nfs /data/nfs nfs defaults 0 0
root@k8s-etcd2:/data/nfs/etcd-backup#
三、验证同步
在 k8s-etcd2 上:
echo "test" > /data/nfs/test.txt
在 44.120 上:
cat /data/nfs/test.txt
看到 test 即成功。
四、恢复备份文件到 NFS
# 创建备份目录
mkdir -p /data/nfs/etcd-backup
# 复制已有备份(如果本地有)
cp /data/nfs.local/etcd-backup/*.db /data/nfs/etcd-backup/ 2>/dev/null
# 或者直接重新备份
/root/etcd-backup.sh
完成。之后 /data/nfs/etcd-backup/ 下的文件会自动同步到 44.120。