【云原生】Kubernetes----ETCD数据的备份与恢复

目录

引言

一、ETCD数据备份

(一)确定备份策略

(二)使用etcdctl工具进行备份

1.安装etcdctl命令

2.设置ETCDCTL_API环境变量

(三)执行备份

二、数据还原

(一)创建新资源

(二)数据恢复

1.停止etcd服务和K8s集群的相关组件

2.备份当前数据

3.恢复数据

4.重启服务

三、验证效果


引言

在Kubernetes集群中,ETCD是一个至关重要的组件,负责存储集群的状态信息和配置数据。从集群的规格、配置到运行中的工作负载状态,ETCD都承载着关键的数据。因此,对ETCD数据进行定期备份和恢复策略的制定,对于确保Kubernetes集群的可靠性与数据完整性至关重要。本文将详细介绍如何在Kubernetes中执行ETCD数据的备份与恢复操作。

一、ETCD数据备份

(一)确定备份策略

在进行ETCD数据备份之前,首先需要确定备份策略。这包括确定备份的频率、备份的存储位置以及备份的保留周期等。建议定期进行ETCD数据备份,并在多个安全的位置进行存储,以防止数据丢失。

(二)使用etcdctl工具进行备份

1.安装etcdctl命令

下载etcdctl命令,etcdctl是etcd的命令行客户端工具,用于与etcd集群进行交互

cs 复制代码
[root@master01 manifests]#cat etcd.yaml |grep  image:
    image: k8s.gcr.io/etcd:3.4.13-0
#查看ETCD版本

[root@master01 mnt]#wget https://github.com/etcd-io/etcd/releases/download/v3.4.13/etcd-v3.4.13-linux-amd64.tar.gz
#下载与版本对应的工具包
[root@master01 mnt]#ls
etcd-v3.4.13-linux-amd64.tar.gz 
[root@master01 mnt]#tar xf etcd-v3.4.13-linux-amd64.tar.gz 
[root@master01 mnt]#ls
etcd-v3.4.13-linux-amd64  etcd-v3.4.13-linux-amd64.tar.gz
[root@master01 mnt]#ls etcd-v3.4.13-linux-amd64
Documentation  etcd  etcdctl  README-etcdctl.md  README.md  READMEv2-etcdctl.md
[root@master01 mnt]#mv etcd-v3.4.13-linux-amd64/etcdctl /usr/local/sbin/
[root@master01 mnt]#etcdctl version
etcdctl version: 3.4.13
API version: 3.4

2.设置ETCDCTL_API环境变量

ETCDCTL_API环境变量用于指定etcdctl与etcd集群交互时使用的API版本。从etcd v3开始,etcdctl默认使用v3 API。但如果你需要与旧版本的etcd集群交互,可能需要设置此环境变量

cs 复制代码
[root@master01 mnt]#echo "ETCDCTL_API=3" >> ~/.bashrc 
[root@master01 mnt]#bash
[root@master01 mnt]#echo "$ETCDCTL_API"
3

(三)执行备份

cs 复制代码
[root@master01 mnt]#mkdir /opt/etcd/backup -p
#创建一个用于存放备份文件的目录
[root@master01 mnt]#ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/etcd/backup/etcdbackup.db
{"level":"info","ts":1718807742.198743,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/opt/etcd/backup/etcdbackup.db.part"}
{"level":"info","ts":"2024-06-19T22:35:42.238+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1718807742.238828,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2024-06-19T22:35:42.945+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1718807742.9925601,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"17 MB","took":0.793473218}
{"level":"info","ts":1718807743.0122747,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/opt/etcd/backup/etcdbackup.db"}
Snapshot saved at /opt/etcd/backup/etcdbackup.db
[root@master01 mnt]#ll -h /opt/etcd/backup/etcdbackup.db
-rw------- 1 root root 17M 6月  19 22:35 /opt/etcd/backup/etcdbackup.db

#上述命令中,--endpoints指定etcd的访问地址,--cacert、--cert、--key分别指定etcd的CA证书、客户端证书和私钥
[root@master01 mnt]#ETCDCTL_API=3 etcdctl --write-out=table snapshot status /opt/etcd/backup/etcdbackup.db
#以表格的形式输出验证快照
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 9feffbe0 |  1544178 |       1131 |      17 MB |

二、数据还原

(一)创建新资源

cs 复制代码
[root@master01 ~]#kubectl run nginx --image=nginx:1.18.0
pod/nginx created
[root@master01 ~]#kubectl get pod 
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          3s
#创建一个新的资源,如果还原备份之后,该资源消失,说明还原成功

(二)数据恢复

1.停止etcd服务和K8s集群的相关组件

在恢复之前,需要停止etcd服务和K8s集群的相关组件(如apiserver、controller-manager、scheduler等)。由于etcd是通过静态Pod方式部署的,你可以通过重命名/etc/kubernetes/manifests/目录来停止所有由该目录下的YAML文件启动的服务

cs 复制代码
[root@master01 ~]#mkdir /opt/backup/ -p
[root@master01 ~]#ls /opt/backup/ 
[root@master01 ~]#mv /etc/kubernetes/manifests/*  /opt/backup/
[root@master01 ~]#kubectl get pod -n kube-system
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?

2.备份当前数据

在恢复之前,建议备份当前的etcd数据目录,以防万一恢复出现问题需要回滚。

cs 复制代码
[root@master01 ~]#mv /var/lib/etcd /var/lib/etcd.bck

3.恢复数据

使用etcdctl的snapshot restore命令从备份文件中恢复数据。指定恢复后的数据目录和其他相关参数

cs 复制代码
[root@master01 ~]#ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db --name etcd-master01 --data-dir /var/lib/etcd --initial-cluster etcd-master01=https://192.168.83.30:2380 --initial-cluster-token etcd-cluster-token --initial-advertise-peer-urls https://192.168.83.30:2380
{"level":"info","ts":1718892798.584346,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
{"level":"info","ts":1718892798.9371617,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1543520}
{"level":"info","ts":1718892798.9976206,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"29d76e10db177ccd","local-member-id":"0","added-peer-id":"a6f5c4f9af0db4c1","added-peer-peer-urls":["https://192.168.83.30:2380"]}
{"level":"info","ts":1718892799.0065427,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}

#----------------------------------------------------------------------------------------------------------
etcdctl snapshot restore 命令用于从 etcd 的快照文件中恢复数据。在你给出的命令中,有一些参数需要被替换为具体的值来匹配你的 etcd 集群配置。以下是每个参数的解释和应该替换为什么:

ETCDCTL_API=3
#这个环境变量告诉 etcdctl 使用 etcd API 的第 3 版本。通常,你可以直接在命令行前设置这个环境变量,或者在你的 shell 配置文件中设置它。

etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db
#/opt/etcd/backup/etcdbackup.db 是你要从中恢复数据的 etcd 快照文件的路径。确保这个路径是正确的,并且文件是可读的。

--name etcd-master01
#--name 参数定义了 etcd 实例的名称。在你的例子中,它被设置为 etcd-master01。这通常与集群中的特定节点相关联。

--data-dir /var/lib/etcd
#--data-dir 参数指定了 etcd 存储其数据的目录。在恢复过程中,这个目录将被用于存储恢复的数据。确保这个目录是可写的,并且没有重要的数据,因为恢复过程可能会覆盖它。

--initial-cluster etcd-master01=https://192.168.83.30:2380
#--initial-cluster 参数定义了 etcd 集群的初始成员列表。在恢复过程中,你需要指定集群中所有节点的名称和它们的客户端 URL。

--initial-cluster-token etcd-cluster-token
#--initial-cluster-token 参数用于 etcd 集群中的节点在初次启动时相互发现。它应该是一个唯一的字符串,用于你的 etcd 集群。确保所有节点在启动时都使用相同的集群令牌。

--initial-advertise-peer-urls https://192.168.83.30:2380
#--initial-advertise-peer-urls 参数指定了 etcd 节点在集群中用于通信的 URL。这通常是节点的对等体(peer)URL。

注意
如果你有多个 etcd 节点,你需要确保 --initial-cluster 参数包含所有节点的信息,并且每个节点的 --name、--initial-advertise-peer-urls 和其他相关参数都是正确配置的。
在恢复数据之前,最好先备份现有的 etcd 数据目录(如果有的话),以防万一恢复过程出现问题。
确保 etcd 集群中的所有节点都已正确配置,并且网络是通畅的,以便节点之间可以相互通信。

4.重启服务

将/etc/kubernetes/manifests/目录的名字改回原样,以重启K8s集群的相关组件。

cs 复制代码
[root@master01 ~]#mv /opt/backup/*  /etc/kubernetes/manifests/
[root@master01 ~]#kubectl get pod -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-dwzdp            1/1     Running   10         35d
coredns-74ff55c5b-ws8c8            1/1     Running   10         35d
etcd-master01                      1/1     Running   10         24h
kube-apiserver-master01            1/1     Running   5          24h
kube-controller-manager-master01   1/1     Running   45         24h
kube-proxy-58zbl                   1/1     Running   0          4d7h
kube-proxy-9v7jw                   1/1     Running   0          4d7h
kube-proxy-xdgb4                   1/1     Running   0          4d7h
kube-scheduler-master01            1/1     Running   48         24h
[root@master01 ~]#kubectl get pod 
No resources found in default namespace.
#之前创建的nginx的pod资源消失,其它服务则正常运行,证明数据恢复到备份状态

三、验证效果

1.删除svc资源

cs 复制代码
[root@master01 ~]#kubectl get svc -A
NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  35d
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   35d
[root@master01 ~]#kubectl delete service kubernetes 
service "kubernetes" deleted
[root@master01 ~]#kubectl get svc -A
NAMESPACE     NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-system   service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   35d

2.停止ETCD资源与k8s各组件

cs 复制代码
[root@master01 ~]#mv /etc/kubernetes/manifests/*  /opt/backup/
[root@master01 ~]#kubectl get all -A
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?
The connection to the server 192.168.83.30:6443 was refused - did you specify the right host or port?

3.数据恢复

cs 复制代码
[root@master01 ~]#rm -rf /var/lib/etcd    
#因为之前有备份,可以选择删除,如果数据有变动,则需要重新备份
[root@master01 ~]#ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd/backup/etcdbackup.db --name etcd-master01 --data-dir /var/lib/etcd --initial-cluster etcd-master01=https://192.168.83.30:2380 --initial-cluster-token etcd-cluster-token --initial-advertise-peer-urls https://192.168.83.30:2380
{"level":"info","ts":1718893339.6779017,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
{"level":"info","ts":1718893339.8944564,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1543520}
{"level":"info","ts":1718893339.9436255,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"29d76e10db177ccd","local-member-id":"0","added-peer-id":"a6f5c4f9af0db4c1","added-peer-peer-urls":["https://192.168.83.30:2380"]}
{"level":"info","ts":1718893339.9502227,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/backup/etcdbackup.db","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
[root@master01 ~]#mv /opt/backup/*  /etc/kubernetes/manifests/
[root@master01 ~]#kubectl get all -A
NAMESPACE      NAME                                   READY   STATUS    RESTARTS   AGE
kube-flannel   pod/kube-flannel-ds-8sgt8              1/1     Running   1          23d
kube-flannel   pod/kube-flannel-ds-nplmm              1/1     Running   12         35d
kube-flannel   pod/kube-flannel-ds-xwklx              1/1     Running   3          23d
kube-system    pod/coredns-74ff55c5b-dwzdp            1/1     Running   10         35d
kube-system    pod/coredns-74ff55c5b-ws8c8            1/1     Running   10         35d
kube-system    pod/etcd-master01                      1/1     Running   0          25h
kube-system    pod/kube-apiserver-master01            1/1     Running   0          25h
kube-system    pod/kube-controller-manager-master01   1/1     Running   0          25h
kube-system    pod/kube-proxy-58zbl                   1/1     Running   0          4d7h
kube-system    pod/kube-proxy-9v7jw                   1/1     Running   0          4d7h
kube-system    pod/kube-proxy-xdgb4                   1/1     Running   0          4d7h
kube-system    pod/kube-scheduler-master01            1/1     Running   0          25h

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  35d
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   35d

NAMESPACE      NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   daemonset.apps/kube-flannel-ds   3         3         3       3            3           <none>                   35d
kube-system    daemonset.apps/kube-proxy        3         3         3       3            3           kubernetes.io/os=linux   35d

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2/2     2            2           35d

NAMESPACE     NAME                                DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-74ff55c5b   2         2         2       35d
相关推荐
GJCTYU26 分钟前
阿里云多端低代码开发平台魔笔使用测评
低代码·阿里云·云原生·容器·serverless·云计算
YCyjs11 小时前
K8S群集调度二
云原生·容器·kubernetes
Hoxy.R11 小时前
K8s小白入门
云原生·容器·kubernetes
为什么这亚子17 小时前
九、Go语言快速入门之map
运维·开发语言·后端·算法·云原生·golang·云计算
beifengtz19 小时前
推荐一款ETCD桌面客户端——Etcd Workbench
etcd·etcd客户端
ZHOU西口19 小时前
微服务实战系列之玩转Docker(十八)
分布式·docker·云原生·架构·数据安全·etcd·rbac
牛角上的男孩20 小时前
Istio Gateway发布服务
云原生·gateway·istio
JuiceFS21 小时前
好未来:多云环境下基于 JuiceFS 建设低运维模型仓库
运维·云原生
景天科技苑1 天前
【云原生开发】K8S多集群资源管理平台架构设计
云原生·容器·kubernetes·k8s·云原生开发·k8s管理系统
wclass-zhengge1 天前
K8S篇(基本介绍)
云原生·容器·kubernetes