Kubernetes控制平面组件:etcd高可用集群搭建

云原生学习路线导航页(持续更新中)

1.etcd 高可用集群的搭建

推荐先阅读:Kubernetes控制平面组件:etcd常用配置参数,搞清楚etcd的常用参数,再阅读本节将会更加清晰

1.1.Install cfssl

sh 复制代码
# Debian/Ubuntu
apt install golang-cfssl

# 或者使用go直接安装
go install github.com/cloudflare/cfssl/cmd/cfssl@latest
go install github.com/cloudflare/cfssl/cmd/cfssljson@latest
  • 作用:安装 cfssl 工具,用于生成 TLS 证书。
  • 原因:ETCD 集群需要 TLS 证书来加密节点之间的通信,确保数据安全性。

1.2.Generate tls certs and clone etcd code

sh 复制代码
mkdir /root/go/src/github.com/etcd-io
cd /root/go/src/github.com/etcd-io
git clone https://github.com/etcd-io/etcd.git
cd etcd/hack/tls-setup
  • 作用:
    • 创建 Go 工作目录。
    • 克隆 ETCD 官方仓库。
    • 进入 TLS 证书生成脚本目录。目的是先生成证书,才能去启动etcd
  • 原因:ETCD 官方提供了 TLS 证书生成的脚本和配置文件,方便用户快速生成证书。

1.3.Edit req-csr.json and keep 127.0.0.1 and localhost only for single cluster setup.

sh 复制代码
vi config/req-csr.json
  • 作用:编辑证书签名请求(CSR)配置文件,配置文件编辑好就可以生成证书了
  • 原因:
    • etcd 的证书签名请求文件,默认会生成一些ip,我们需要把ips改成自己的etcd集群ip

    • 因为我这里虽然构建3节点etcd集群,但是都在本地一台机器上,所有只需要保留 127.0.0.1 和 localhost,避免生成不必要的证书。

1.4.Generate certs

sh 复制代码
export infra0=127.0.0.1
export infra1=127.0.0.1
export infra2=127.0.0.1
make
mkdir /tmp/etcd-certs
mv certs /tmp/etcd-certs
  • 作用:
    • 先设置环境变量,指定集群节点的 IP 地址。因为我们准备将etcd的三个节点分别命名为 infra0、infra1、infra2
    • 使用 make 命令生成 TLS 证书。默认证书会生成到 当前目录/certs
    • 创建证书存储目录,并将生成的证书移动到该目录。
  • 原因:
    • 环境变量用于指定集群节点的 IP 地址。
    • make 命令调用 cfssl 生成证书。
    • 将证书集中存储,便于后续使用。后续使用etcdctl时需要执行cert目录

1.5.Start etcd cluster member1

  • 创建 start-all.sh 文件,将下面的命令复制进去

    • 声明了3个etcd实例,--initial-cluster-state为new,指明cert地址、节点名称、data-dir
    • 因为我要在同一台机器上启动3个实例,所以3个实例的端口是各异的
    sh 复制代码
    #
    # each etcd instance name need to be unique
    # x380 is for peer communication
    # x379 is for client communication
    # dir-data cannot be shared
    #
    nohup etcd --name infra0 \
    --data-dir=/tmp/etcd/infra0 \
    --listen-peer-urls https://127.0.0.1:3380 \
    --initial-advertise-peer-urls https://127.0.0.1:3380 \
    --listen-client-urls https://127.0.0.1:3379 \
    --advertise-client-urls https://127.0.0.1:3379 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \
    --initial-cluster-state new \
    --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \
    --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra0.log &
    
    nohup etcd --name infra1 \
    --data-dir=/tmp/etcd/infra1 \
    --listen-peer-urls https://127.0.0.1:4380 \
    --initial-advertise-peer-urls https://127.0.0.1:4380 \
    --listen-client-urls https://127.0.0.1:4379 \
    --advertise-client-urls https://127.0.0.1:4379 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \
    --initial-cluster-state new \
    --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \
    --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra1.log &
    
    nohup etcd --name infra2 \
    --data-dir=/tmp/etcd/infra2 \
    --listen-peer-urls https://127.0.0.1:5380 \
    --initial-advertise-peer-urls https://127.0.0.1:5380 \
    --listen-client-urls https://127.0.0.1:5379 \
    --advertise-client-urls https://127.0.0.1:5379 \
    --initial-cluster-token etcd-cluster-1 \
    --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \
    --initial-cluster-state new \
    --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \
    --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra2.log &
  • 执行创建集群

    shell 复制代码
    chmod +0777 start-all.sh
    ./start-all.sh
  • 执行后集群就启动了,ps -ef | grep etcd 可以看出3个etcd节点已经有了

  • 常见错误

    • 如果执行报错:nohup: nohup: failed to run command 'etcd'nohup: failed to run command 'etcd'failed to run command 'etcd': No such file or directory: No such file or directory,说明还没有etcd命令,需要安装一下

      shell 复制代码
      # centos中
      yum install etcd
      # 设置使用的etcdctl api为v3
      export ETCDCTL_API=3

1.6.Member list 验证 etcd

sh 复制代码
etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem member list
  • 作用:查看 ETCD 集群的成员列表。

  • 原因:验证集群是否正常运行,并确认所有节点已成功加入集群。

  • 如果报错:flag provided but not defined: -cert,说明没有设置 etcdctl 的版本

    shell 复制代码
    export ETCDCTL_API=3

2.数据备份

2.1.Insert some data

  • 插入一些数据,模拟etcd的正常使用
    • key=a value=b
    • key=/a value=/b
    • key=/a/f value=ok
shell 复制代码
# 插入3条数据
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem put a b
OK
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem put /a /b
OK
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem put /a/f ok
OK

# 查看所有的数据
[root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem get --prefix ""
/a
/b
/a/f
ok
a
b

2.2.Backup

  • 执行备份命令,将当前etcd集群全量备份为快照snapshot,备份生成文件snapshot.db

    sh 复制代码
    etcdctl --endpoints https://127.0.0.1:3379 \
    	--cert /tmp/etcd-certs/certs/127.0.0.1.pem \
    	--key /tmp/etcd-certs/certs/127.0.0.1-key.pem \
    	--cacert /tmp/etcd-certs/certs/ca.pem snapshot save snapshot.db
  • 执行后集群就备份了,ls 查看当前目录文件,会多出一个 snapshot.db

  • 在集群出现故障或数据丢失时,可以通过备份恢复数据。

3.销毁etcd集群,模拟故障

sh 复制代码
ps -ef | grep "/tmp/etcd/infra" | grep -v grep | awk '{print $2}'|xargs kill
  • 作用:终止所有 ETCD 节点的进程。
  • 原因:在恢复数据之前,需要停止所有 ETCD 实例。
sh 复制代码
rm -rf /tmp/etcd
  • 作用:删除 ETCD 数据目录。
  • 原因:模拟数据丢失场景,测试备份恢复功能。

4.使用快照恢复etcd集群数据

  • 创建 restore.sh 文件,将下面的命令复制进去

    • 使用 snapshot 恢复3个etcd实例,指定将数据恢复到哪里--data-dir
    sh 复制代码
    export ETCDCTL_API=3
    etcdctl snapshot restore snapshot.db \
      --name infra0 \
      --data-dir=/tmp/etcd/infra0 \
      --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \
      --initial-cluster-token etcd-cluster-1 \
      --initial-advertise-peer-urls https://127.0.0.1:3380
    
    etcdctl snapshot restore snapshot.db \
        --name infra1 \
        --data-dir=/tmp/etcd/infra1 \
        --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \
        --initial-cluster-token etcd-cluster-1 \
        --initial-advertise-peer-urls https://127.0.0.1:4380
    
    etcdctl snapshot restore snapshot.db \
      --name infra2 \
      --data-dir=/tmp/etcd/infra2 \
      --initial-cluster infra0=https://127.0.0.1:3380,infra1=https://127.0.0.1:4380,infra2=https://127.0.0.1:5380 \
      --initial-cluster-token etcd-cluster-1 \
      --initial-advertise-peer-urls https://127.0.0.1:5380
  • 执行恢复集群数据,完成后 ls /tmp/etcd 查看数据是否恢复回来了

    shell 复制代码
    chmod +0777 restore.sh
    ./restore.sh
    ls /tmp/etcd

5.重启etcd集群

  • 创建 restart-all.sh 文件,将下面的命令复制进去

    • 使用 重新启动 3个etcd实例,--data-dir指定数据目录
    sh 复制代码
    nohup etcd --name infra0 \
    --data-dir=/tmp/etcd/infra0 \
    --listen-peer-urls https://127.0.0.1:3380 \
    --listen-client-urls https://127.0.0.1:3379 \
    --advertise-client-urls https://127.0.0.1:3379 \
    --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \
    --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra0.log &
    
    nohup etcd --name infra1 \
    --data-dir=/tmp/etcd/infra1 \
    --listen-peer-urls https://127.0.0.1:4380 \
    --listen-client-urls https://127.0.0.1:4379 \
    --advertise-client-urls https://127.0.0.1:4379 \
    --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \
    --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra1.log &
    
    nohup etcd --name infra2 \
    --data-dir=/tmp/etcd/infra2 \
    --listen-peer-urls https://127.0.0.1:5380 \
    --listen-client-urls https://127.0.0.1:5379 \
    --advertise-client-urls https://127.0.0.1:5379 \
    --client-cert-auth --trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem \
    --peer-client-cert-auth --peer-trusted-ca-file=/tmp/etcd-certs/certs/ca.pem \
    --peer-cert-file=/tmp/etcd-certs/certs/127.0.0.1.pem \
    --peer-key-file=/tmp/etcd-certs/certs/127.0.0.1-key.pem 2>&1 > /var/log/infra2.log &
  • 执行重启集群,完成后 ps -ef | grep etcd 查看3个etcd节点是否都重新启动了

    shell 复制代码
    ps -ef | grep etcd

6.验证数据是否恢复

  • 获取etcd的member,查看节点是否正常

    sh 复制代码
    [root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem member list
    1701f7e3861531d4, started, infra0, https://127.0.0.1:3380, https://127.0.0.1:3379
    6a58b5afdcebd95d, started, infra1, https://127.0.0.1:4380, https://127.0.0.1:4379
    84a1a2f39cda4029, started, infra2, https://127.0.0.1:5380, https://127.0.0.1:5379
  • 获取etcd的所有数据,验证数据是否恢复

    sh 复制代码
    [root@VM-226-235-tencentos ~/go/src/github.com/etcd-io/etcd/hack/tls-setup]# etcdctl --endpoints https://127.0.0.1:3379 --cert /tmp/etcd-certs/certs/127.0.0.1.pem --key /tmp/etcd-certs/certs/127.0.0.1-key.pem --cacert /tmp/etcd-certs/certs/ca.pem get --prefix ""
    /a
    /b
    /a/f
    ok
    a
    b
相关推荐
qq_448941081 小时前
10、k8s对外服务之ingress
linux·容器·kubernetes
野猪佩挤1 小时前
minio作为K8S后端存储
云原生·容器·kubernetes
斯普信专业组3 小时前
K8S下redis哨兵集群使用secret隐藏configmap内明文密码方案详解
redis·kubernetes·bootstrap
福大大架构师每日一题8 小时前
6.4 k8s的informer机制
云原生·容器·kubernetes
炸鸡物料库9 小时前
Kubernetes 使用 Kube-Prometheus 构建指标监控 +飞书告警
运维·云原生·kubernetes·飞书·prometheus·devops
CarryBest10 小时前
搭建Kubernetes (K8s) 集群----Centos系统
容器·kubernetes·centos
Karoku06611 小时前
【CI/CD】持续集成及 Jenkins
运维·ci/cd·docker·云原生·容器·kubernetes·jenkins
A ?Charis18 小时前
k8s-对接NFS存储
linux·服务器·kubernetes
KTKong19 小时前
kubeadm拉起的k8s集群证书过期的做法集群已奔溃也可以解决
云原生·容器·kubernetes
运维开发王义杰1 天前
Kubernetes:EKS 中 Istio Ingress Gateway 负载均衡器配置及常见问题解析
kubernetes·gateway·istio