目录
引言
在Kubernetes集群中,ETCD是一个至关重要的组件,负责存储集群的状态信息和配置数据。从集群的规格、配置到运行中的工作负载状态,ETCD都承载着关键的数据。因此,对ETCD数据进行定期备份和恢复策略的制定,对于确保Kubernetes集群的可靠性与数据完整性至关重要。本文将详细介绍如何在Kubernetes中执行ETCD数据的备份与恢复操作。
一、ETCD数据备份
(一)确定备份策略
在进行ETCD数据备份之前,首先需要确定备份策略。这包括确定备份的频率、备份的存储位置以及备份的保留周期等。建议定期进行ETCD数据备份,并在多个安全的位置进行存储,以防止数据丢失。
(二)使用etcdctl工具进行备份
1.安装etcdctl命令
下载etcdctl命令,etcdctl是etcd的命令行客户端工具,用于与etcd集群进行交互
cs
[root@master01 manifests]#cat etcd.yaml |grep image:
image: k8s.gcr.io/etcd:3.4.13-0
#查看ETCD版本
[root@master01 mnt]#wget https://github.com/etcd-io/etcd/releases/download/v3.4.13/etcd-v3.4.13-linux-amd64.tar.gz
#下载与版本对应的工具包
[root@master01 mnt]#ls
etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#tar xf etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#ls
etcd-v3.4.13-linux-amd64 etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#ls etcd-v3.4.13-linux-amd64
Documentation etcd etcdctl README-etcdctl.md README.md READMEv2-etcdctl.md
[root@master01 mnt]#mv etcd-v3.4.13-linux-amd64/etcdctl /usr/local/sbin/
[root@master01 mnt]#etcdctl version
etcdctl version: 3.4.13
API version: 3.4
2.设置ETCDCTL_API环境变量
ETCDCTL_API环境变量用于指定etcdctl与etcd集群交互时使用的API版本。从etcd v3开始,etcdctl默认使用v3 API。但如果你需要与旧版本的etcd集群交互,可能需要设置此环境变量
cs
[root@master01 mnt]#echo "ETCDCTL_API=3" >> ~/.bashrc
[root@master01 mnt]#bash
[root@master01 mnt]#echo "$ETCDCTL_API"
3
(三)执行备份
cs
[root@master01 mnt]#mkdir /opt/etcd/backup -p
#创建一个用于存放备份文件的目录
[root@master01 mnt]#ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/etcd/backup/etcdbackup.db
{"level":"info","ts":1718807742.198743,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/opt/etcd/backup/etcdbackup.db.part"}
{"level":"info","ts":"2024-06-19T22:35:42.238+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1718807742.238828,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2024-06-19T22:35:42.945+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1718807742.9925601,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"17 MB","took":0.793473218}
{"level":"info","ts":1718807743.0122747,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/opt/etcd/backup/etcdbackup.db"}
Snapshot saved at /opt/etcd/backup/etcdbackup.db
[root@master01 mnt]#ll -h /opt/etcd/backup/etcdbackup.db
-rw------- 1 root root 17M 6月 19 22:35 /opt/etcd/backup/etcdbackup.db
#上述命令中,--endpoints指定etcd的访问地址,--cacert、--cert、--key分别指定etcd的CA证书、客户端证书和私钥
[root@master01 mnt]#ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/etcd/backup/etcdbackup.db
#以表格的形式输出验证快照
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 9feffbe0 | 1544178 | 1131 | 17 MB |
二、数据还原
(一)创建新资源
cs
[root@master01 ~]#kubectl run nginx --image=nginx:1.18.0
pod/nginx created
[root@master01 ~]#kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 3s
#创建一个新的资源,如果还原备份之后,该资源消失,说明还原成功
(二)数据恢复
1.停止etcd服务和K8s集群的相关组件
在恢复之前,需要停止etcd服务和K8s集群的相关组件(如apiserver、controller-manager、scheduler等)。由于etcd是通过静态Pod方式部署的,你可以通过重命名/etc/kubernetes/manifests/目录来停止所有由该目录下的YAML文件启动的服务
cs
[root@master01 ~]#mkdir /opt/backup/ -p
[root@master01 ~]#ls /opt/backup/
[root@master01 ~]#mv /etc/kubernetes/manifests/* /opt/backup/
[root@master01 ~]#kubectl get pod -n kube-system
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
2.备份当前数据
在恢复之前,建议备份当前的etcd数据目录,以防万一恢复出现问题需要回滚。
cs
[root@master01 ~]#mv /var/lib/etcd /var/lib/etcd.bck
3.恢复数据
使用etcdctl的snapshot restore命令从备份文件中恢复数据。指定恢复后的数据目录和其他相关参数
cs
[root@master01 ~]#ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db --name etcd-master01 --data-dir /var/lib/etcd --initial-cluster etcd-master01=https://192.168.83.30:2380 --initial-cluster-token etcd-cluster-token --initial-advertise-peer-urls https://192.168.83.30:2380
{"level":"info","ts":1718892798.584346,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
{"level":"info","ts":1718892798.9371617,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1543520}
{"level":"info","ts":1718892798.9976206,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"29d76e10db177ccd","local-member-id":"0","added-peer-id":"a6f5c4f9af0db4c1","added-peer-peer-urls":["https://192.168.83.30:2380"]}
{"level":"info","ts":1718892799.0065427,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
#----------------------------------------------------------------------------------------------------------
etcdctl snapshot restore 命令用于从 etcd 的快照文件中恢复数据。在你给出的命令中,有一些参数需要被替换为具体的值来匹配你的 etcd 集群配置。以下是每个参数的解释和应该替换为什么:
ETCDCTL_API=3
#这个环境变量告诉 etcdctl 使用 etcd API 的第 3 版本。通常,你可以直接在命令行前设置这个环境变量,或者在你的 shell 配置文件中设置它。
etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db
#/opt/etcd/backup/etcdbackup.db 是你要从中恢复数据的 etcd 快照文件的路径。确保这个路径是正确的,并且文件是可读的。
--name etcd-master01
#--name 参数定义了 etcd 实例的名称。在你的例子中,它被设置为 etcd-master01。这通常与集群中的特定节点相关联。
--data-dir /var/lib/etcd
#--data-dir 参数指定了 etcd 存储其数据的目录。在恢复过程中,这个目录将被用于存储恢复的数据。确保这个目录是可写的,并且没有重要的数据,因为恢复过程可能会覆盖它。
--initial-cluster etcd-master01=https://192.168.83.30:2380
#--initial-cluster 参数定义了 etcd 集群的初始成员列表。在恢复过程中,你需要指定集群中所有节点的名称和它们的客户端 URL。
--initial-cluster-token etcd-cluster-token
#--initial-cluster-token 参数用于 etcd 集群中的节点在初次启动时相互发现。它应该是一个唯一的字符串,用于你的 etcd 集群。确保所有节点在启动时都使用相同的集群令牌。
--initial-advertise-peer-urls https://192.168.83.30:2380
#--initial-advertise-peer-urls 参数指定了 etcd 节点在集群中用于通信的 URL。这通常是节点的对等体(peer)URL。
注意
如果你有多个 etcd 节点,你需要确保 --initial-cluster 参数包含所有节点的信息,并且每个节点的 --name、--initial-advertise-peer-urls 和其他相关参数都是正确配置的。
在恢复数据之前,最好先备份现有的 etcd 数据目录(如果有的话),以防万一恢复过程出现问题。
确保 etcd 集群中的所有节点都已正确配置,并且网络是通畅的,以便节点之间可以相互通信。
4.重启服务
将/etc/kubernetes/manifests/目录的名字改回原样,以重启K8s集群的相关组件。
cs
[root@master01 ~]#mv /opt/backup/* /etc/kubernetes/manifests/
[root@master01 ~]#kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74ff55c5b-dwzdp 1/1 Running 10 35d
coredns-74ff55c5b-ws8c8 1/1 Running 10 35d
etcd-master01 1/1 Running 10 24h
kube-apiserver-master01 1/1 Running 5 24h
kube-controller-manager-master01 1/1 Running 45 24h
kube-proxy-58zbl 1/1 Running 0 4d7h
kube-proxy-9v7jw 1/1 Running 0 4d7h
kube-proxy-xdgb4 1/1 Running 0 4d7h
kube-scheduler-master01 1/1 Running 48 24h
[root@master01 ~]#kubectl get pod
No resources found in default namespace.
#之前创建的nginx的pod资源消失,其它服务则正常运行,证明数据恢复到备份状态
三、验证效果
1.删除svc资源
cs
[root@master01 ~]#kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35d
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 35d
[root@master01 ~]#kubectl delete service kubernetes
service "kubernetes" deleted
[root@master01 ~]#kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 35d
2.停止ETCD资源与k8s各组件
cs
[root@master01 ~]#mv /etc/kubernetes/manifests/* /opt/backup/
[root@master01 ~]#kubectl get all -A
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
3.数据恢复
cs
[root@master01 ~]#rm -rf /var/lib/etcd
#因为之前有备份,可以选择删除,如果数据有变动,则需要重新备份
[root@master01 ~]#ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db --name etcd-master01 --data-dir /var/lib/etcd --initial-cluster etcd-master01=https://192.168.83.30:2380 --initial-cluster-token etcd-cluster-token --initial-advertise-peer-urls https://192.168.83.30:2380
{"level":"info","ts":1718893339.6779017,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
{"level":"info","ts":1718893339.8944564,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1543520}
{"level":"info","ts":1718893339.9436255,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"29d76e10db177ccd","local-member-id":"0","added-peer-id":"a6f5c4f9af0db4c1","added-peer-peer-urls":["https://192.168.83.30:2380"]}
{"level":"info","ts":1718893339.9502227,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
[root@master01 ~]#mv /opt/backup/* /etc/kubernetes/manifests/
[root@master01 ~]#kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel pod/kube-flannel-ds-8sgt8 1/1 Running 1 23d
kube-flannel pod/kube-flannel-ds-nplmm 1/1 Running 12 35d
kube-flannel pod/kube-flannel-ds-xwklx 1/1 Running 3 23d
kube-system pod/coredns-74ff55c5b-dwzdp 1/1 Running 10 35d
kube-system pod/coredns-74ff55c5b-ws8c8 1/1 Running 10 35d
kube-system pod/etcd-master01 1/1 Running 0 25h
kube-system pod/kube-apiserver-master01 1/1 Running 0 25h
kube-system pod/kube-controller-manager-master01 1/1 Running 0 25h
kube-system pod/kube-proxy-58zbl 1/1 Running 0 4d7h
kube-system pod/kube-proxy-9v7jw 1/1 Running 0 4d7h
kube-system pod/kube-proxy-xdgb4 1/1 Running 0 4d7h
kube-system pod/kube-scheduler-master01 1/1 Running 0 25h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35d
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 35d
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-flannel daemonset.apps/kube-flannel-ds 3 3 3 3 3 <none> 35d
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 35d
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 35d
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 35d