k8s的etcd启动报错

背景

电脑休眠状态意外断电导致虚拟机直接进入关机状态。

问题

kubectl命令报错

bash 复制代码
[root@master01 ~]#kubectl get node 
The connection to the server master01.kktb.org:6443 was refused - did you specify the right host or port?

kubelet服务报错

bash 复制代码
Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.251784     747 kubelet.go:2424] "Error getting node" err="node \"master01.kktb.org\" not found"
Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.329952     747 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://master01.kktb.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master01.kktb.org?timeout=10s": dial tcp 10.0.6.5:6443: connect: connection refused
Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.352614     747 kubelet.go:2424] "Error getting node" err="node \"master01.kktb.org\" not found"
Oct 15 08:39:37 master01.kktb.org kubelet[747]: I1015 08:39:37.384900     747 kubelet_node_status.go:70] "Attempting to register node" node="master01.kktb.org"
Oct 15 08:39:37 master01.kktb.org kubelet[747]: E1015 08:39:37.385258     747 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://master01.kktb.org:6443/api/v1/nodes\": dial tcp 10.0.6.5:6443: connect: connection refused" node="master01.kktb.org"

6443端口无任何程序监听,判断可能是etcd出现了故障

master节点上的容器

bash 复制代码
[root@master01 kubelet.service.d]#docker ps 
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS          PORTS     NAMES
9924571324d2   e3ed7dee73e9                "kube-scheduler --au..."   18 minutes ago   Up 18 minutes             k8s_kube-scheduler_kube-scheduler-master01.kktb.org_kube-system_bca604f760be36f9a20d9b66b0bf821d_7
9d6171792af2   88784fb4ac2f                "kube-controller-man..."   18 minutes ago   Up 18 minutes             k8s_kube-controller-manager_kube-controller-manager-master01.kktb.org_kube-system_13461bcfe1de9f0e85e957cde36c42d2_10
eb3fdcf0b392   registry.k8s.io/pause:3.6   "/pause"                 18 minutes ago   Up 18 minutes             k8s_POD_etcd-master01.kktb.org_kube-system_67c8baa117269300c9b5e18d5b0a0e44_5
302e36ffb616   registry.k8s.io/pause:3.6   "/pause"                 18 minutes ago   Up 18 minutes             k8s_POD_kube-scheduler-master01.kktb.org_kube-system_bca604f760be36f9a20d9b66b0bf821d_5
a8f8f02011c7   registry.k8s.io/pause:3.6   "/pause"                 18 minutes ago   Up 18 minutes             k8s_POD_kube-controller-manager-master01.kktb.org_kube-system_13461bcfe1de9f0e85e957cde36c42d2_5
3fde7a0d5aae   registry.k8s.io/pause:3.6   "/pause"                 18 minutes ago   Up 18 minutes             k8s_POD_kube-apiserver-master01.kktb.org_kube-system_5846def9d7d3ade7311eb0f023db33ff_5

镜像都存在

bash 复制代码
[root@master01 kubelet.service.d]#docker images 
REPOSITORY                           TAG       IMAGE ID       CREATED         SIZE
istio/proxyv2                        1.19.1    b3547b3ef18b   2 weeks ago     251MB
ubuntu                               jammy     3565a89d9e81   2 weeks ago     77.8MB
ubuntu                               latest    3565a89d9e81   2 weeks ago     77.8MB
flannel/flannel                      v0.22.3   e23f7ca36333   3 weeks ago     70.2MB
flannel/flannel-cni-plugin           v1.2.0    a55d1bad692b   2 months ago    8.04MB
hello-world                          latest    9c7a54a9a43c   5 months ago    13.3kB
k8s.gcr.io/kube-apiserver            v1.24.0   529072250ccc   17 months ago   130MB
k8s.gcr.io/kube-proxy                v1.24.0   77b49675beae   17 months ago   110MB
k8s.gcr.io/kube-scheduler            v1.24.0   e3ed7dee73e9   17 months ago   51MB
k8s.gcr.io/kube-controller-manager   v1.24.0   88784fb4ac2f   17 months ago   119MB
k8s.gcr.io/etcd                      3.5.3-0   aebe758cef4c   18 months ago   299MB
k8s.gcr.io/pause                     3.7       221177c6082a   19 months ago   711kB
ikubernetes/proxy                    v0.1.1    a4fcedf206e8   23 months ago   62.4MB
k8s.gcr.io/coredns/coredns           v1.8.6    a4ca41631cc7   2 years ago     46.8MB
registry.k8s.io/pause                3.6       6270bb605e12   2 years ago     683kB
ikubernetes/demoapp                  v1.0      3342b7518915   3 years ago     92.7MB

ps -a看到退出的容器

bash 复制代码
[root@master01 kubelet.service.d]#docker ps -a 
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS                      PORTS     NAMES
07195c9438d3   aebe758cef4c                "etcd --advertise-cl..."   34 seconds ago   Exited (2) 33 seconds ago             k8s_etcd_etcd-master01.kktb.org_kube-system_67c8baa117269300c9b5e18d5b0a0e44_30
532fabab4479   529072250ccc                "kube-apiserver --ad..."   3 minutes ago    Exited (1) 3 minutes ago              k8s_kube-apiserver_kube-apiserver-master01.kktb.org_kube-system_5846def9d7d3ade7311eb0f023db33ff_27

etcd容器报错

bash 复制代码
[root@master01 kubelet.service.d]#docker logs  07195c9438d3
{"level":"info","ts":"2023-10-15T09:31:34.713Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--advertise-client-urls=https://10.0.6.5:2379","--cert-file=/etc/kubernetes/pki/etcd/server.crt","--client-cert-auth=true","--data-dir=/var/lib/etcd","--experimental-initial-corrupt-check=true","--initial-advertise-peer-urls=https://10.0.6.5:2380","--initial-cluster=master01.kktb.org=https://10.0.6.5:2380","--key-file=/etc/kubernetes/pki/etcd/server.key","--listen-client-urls=https://127.0.0.1:2379,https://10.0.6.5:2379","--listen-metrics-urls=http://127.0.0.1:2381","--listen-peer-urls=https://10.0.6.5:2380","--name=master01.kktb.org","--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt","--peer-client-cert-auth=true","--peer-key-file=/etc/kubernetes/pki/etcd/peer.key","--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt","--snapshot-count=10000","--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"]}
{"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"}
{"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["https://10.0.6.5:2380"]}
{"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":["https://10.0.6.5:2379","https://127.0.0.1:2379"]}
{"level":"info","ts":"2023-10-15T09:31:34.714Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.3","git-sha":"0452feec7","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"master01.kktb.org","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://10.0.6.5:2380"],"listen-peer-urls":["https://10.0.6.5:2380"],"advertise-client-urls":["https://10.0.6.5:2379"],"listen-client-urls":["https://10.0.6.5:2379","https://127.0.0.1:2379"],"listen-metrics-urls":["http://127.0.0.1:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2023-10-15T09:31:34.716Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"655.189µs"}
{"level":"info","ts":"2023-10-15T09:31:35.189Z","caller":"etcdserver/server.go:508","msg":"recovered v2 store from snapshot","snapshot-index":450049,"snapshot-size":"11 kB"}
{"level":"warn","ts":"2023-10-15T09:31:35.190Z","caller":"snap/db.go:88","msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":450049,"snapshot-file-path":"/var/lib/etcd/member/snap/000000000006de01.snap.db","error":"snap: snapshot file doesn't exist"}
{"level":"panic","ts":"2023-10-15T09:31:35.190Z","caller":"etcdserver/server.go:515","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdserver/server.go:515\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/go/src/go.etcd.io/etcd/release/etcd/server/embed/etcd.go:245\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"}
panic: failed to recover v3 backend from snapshot

是由于恢复快照数据失败。查看其他两个节点是否也有同样报错。

幸运的是节点2的etcd启动正常

bash 复制代码
[root@master02 snap]#docker ps 
CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS          PORTS     NAMES
daa59123cbbf   aebe758cef4c                "etcd --advertise-cl..."   50 seconds ago   Up 50 seconds             k8s_etcd_etcd-master02.kktb.org_kube-system_3564c10547c1c588ab6c79bcad0e90d0_19

一些关键字日志

bash 复制代码
[root@master02 snap]#docker logs -f daa59123cbbf

{"level":"info","ts":"2023-10-15T09:39:06.599Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"2.526769ms"}
{"level":"warn","ts":"2023-10-15T09:39:06.599Z","caller":"wal/util.go:90","msg":"ignored file in WAL directory","path":"0000000000000004-00000000000685b1.wal.broken"}
{"level":"info","ts":"2023-10-15T09:39:07.119Z","caller":"etcdserver/server.go:508","msg":"recovered v2 store from snapshot","snapshot-index":450048,"snapshot-size":"11 kB"}
{"level":"info","ts":"2023-10-15T09:39:07.119Z","caller":"etcdserver/server.go:521","msg":"recovered v3 backend from snapshot","backend-size-bytes":6635520,"backend-size":"6.6 MB","backend-size-in-use-bytes":2555904,"backend-size-in-use":"2.6 MB"}
{"level":"warn","ts":"2023-10-15T09:39:07.119Z","caller":"wal/util.go:90","msg":"ignored file in WAL directory","path":"0000000000000004-00000000000685b1.wal.broken"}
{"level":"info","ts":"2023-10-15T09:39:07.161Z","caller":"etcdserver/raft.go:483","msg":"restarting local member","cluster-id":"e3d7c840b56e7ccb","local-member-id":"77c05bfe945af1a1","commit-index":458027}
{"level":"info","ts":"2023-10-15T09:39:07.161Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"77c05bfe945af1a1 switched to configuration voters=(6311186738797484145 8628998035010679201 11257876798449243399)"}
{"level":"info","ts":"2023-10-15T09:39:07.161Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"77c05bfe945af1a1 became follower at term 16"}



{"level":"warn","ts":"2023-10-15T09:40:06.425Z","caller":"etcdhttp/metrics.go:86","msg":"/health error","output":"{\"health\":\"false\",\"reason\":\"RAFT NO LEADER\"}","status-code":503}
{"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"5795d6e69d2e1871","rtt":"0s","error":"dial tcp 10.0.6.5:2380: connect: connection refused"}
{"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"9c3c034d28add507","rtt":"0s","error":"dial tcp 10.0.6.7:2380: connect: connection refused"}
{"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"9c3c034d28add507","rtt":"0s","error":"dial tcp 10.0.6.7:2380: connect: connection refused"}
{"level":"warn","ts":"2023-10-15T09:40:07.179Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"5795d6e69d2e1871","rtt":"0s","error":"dial tcp 10.0.6.5:2380: connect: connection refused"}

解决

解决问题的方法,直接将节点1的etcd数据目录文件夹移除,尝试将etcd2上面的数据拷贝过来

bash 复制代码
# 先将节点1的数据文件移动至opt中
[root@master01 ~]#mv /var/lib/etcd/member /opt/
# 节点2拷贝数据到节点1
[root@master02 snap]#scp -r /var/lib/etcd/member 10.0.6.5:/var/lib/etcd/

etcd、api-server服务恢复

集群状态

bash 复制代码
sh-5.1# etcdctl --endpoints=https://10.0.6.5:2379,https://10.0.6.6:2379,https://10.0.6.7:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt   --cert=/etc/kubernetes/pki/etcd/server.crt  --key=/etc/kubernetes/pki/etcd/server.key member list --write-out=table
+------------------+---------+-------------------+-----------------------+-----------------------+------------+
|        ID        | STATUS  |       NAME        |      PEER ADDRS       |     CLIENT ADDRS      | IS LEARNER |
+------------------+---------+-------------------+-----------------------+-----------------------+------------+
| 5795d6e69d2e1871 | started | master01.kktb.org | https://10.0.6.5:2380 | https://10.0.6.5:2379 |      false |
| 77c05bfe945af1a1 | started | master02.kktb.org | https://10.0.6.6:2380 | https://10.0.6.6:2379 |      false |
| 9c3c034d28add507 | started | master03.kktb.org | https://10.0.6.7:2380 | https://10.0.6.7:2379 |      false |
+------------------+---------+-------------------+-----------------------+-----------------------+------------+

kubectl命令使用正常

bash 复制代码
[root@master01 ~]#kubectl get node 
NAME                STATUS   ROLES           AGE   VERSION
master01.kktb.org   Ready    control-plane   22d   v1.24.3
master02.kktb.org   Ready    control-plane   22d   v1.24.3
master03.kktb.org   Ready    control-plane   22d   v1.24.3
相关推荐
哼了个哈16 分钟前
docker镜像构建(基于ISO)
docker·容器
predisw4 小时前
请求是如何通过k8s service 路由到对应的pod
云原生·容器·kubernetes·cloud native
檀越剑指大厂11 小时前
【fnOS使用Docker快速搭建WordPress站点并免费配置公网地址】
运维·docker·容器
u01005596011 小时前
sudo mkdir -p /etc/docker其中的 -p 什么意思?
运维·docker·容器
嗑瓜子儿溜茶水儿18 小时前
docker 部署 NginX
nginx·docker·容器
G_whang19 小时前
mac m2 安装 docker
macos·docker·容器
童安格粉丝1 天前
Docker图形化界面工具Portainer最佳实践
运维·redis·docker·容器·portainer·实践·详解
蓝天~白云1 天前
docker 更换存储路径
运维·docker·容器
霍格沃兹测试开发学社测试人社区1 天前
Docker隔离及资源限制原理
软件测试·测试开发·docker·容器·eureka·kubernetes
开源架构师1 天前
开源架构的容器化部署优化版
ci/cd·docker·kubernetes·开源·容器安全·开源架构·容器化部署