K8s 多 Master 重启:流程梳理与问题排查

前言

集群部署在 VMware 创建的三台虚拟机上,每台虚拟机同时承担 Master 角色。因长期未做系统安全更新,近期执行了 dnf upgrade-minimal --security --allowerasing 升级内核与软件包。内核等更新需重启节点才能生效,而三台节点都运行着 etcd,重启必须逐台进行,避免丢失 quorum 导致集群不可用。

此外,由于集群跑在 VMware 虚拟机之上,执行这套流程时还遇到了一些环境相关的问题,具体表现与解决方式整理在文末问题排查

bash 复制代码
# kubectl get node -o wide
NAME     STATUS   ROLES           AGE    VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
demo-1   Ready    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-53-generic    containerd://1.7.27
demo-2   Ready    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-64-generic    containerd://1.7.27
demo-3   Ready    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-106-generic   containerd://1.7.27

节点重启流程

每台节点按以下步骤循环执行:检查 etcd → 阻止调度 → 驱逐 Pod → 重启 → 恢复调度,全部完成后再处理下一台。

1.检查 etcd 状态

etcd 推荐奇数节点 部署,以保证 quorum(多数派)存活时集群可正常读写。容错计算公式:⌊n/2⌋ + 1 ,参考官方容错性说明。当前 3 节点需至少保证 2 个 etcd 存活。

以下操作需在每台 etcd 节点上验证:

bash 复制代码
## 验证数据一致性 / 节点健康
# etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        endpoint status --write-out=table
## DB SIZE:    数据库大小,部署时通过 --quota-backend-bytes 设置上限(默认 2G)
## IS LEADER:  是否为 leader
## IS LEARNER: 是否为非投票成员(worker)
## RAFT TERM:  leader 任期,须保证各节点该值一致;重启 / 网络抖动都会使其 +1
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://x.x.x.x:2379   | 683c58b549788bd9 |  3.5.15 |   30 MB |      true |      false |        40 |  130863104 |          130863104 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

## 验证连通性 / 响应延迟
# etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        endpoint health --write-out=table
## HEALTH: 能否读写
## TOOK:   读一个随机 key,无错误即判定健康;耗时 ≈ 网络往返 + leader 心跳确认
+------------------------+--------+-------------+-------+
|        ENDPOINT        | HEALTH |    TOOK     | ERROR |
+------------------------+--------+-------------+-------+
| https://x.x.x.x:2379   |   true | 60.842745ms |       |
+------------------------+--------+-------------+-------+

## 查看成员列表
# etcdctl --endpoints=https://127.0.0.1:2379 \
        --cacert=/etc/kubernetes/pki/etcd/ca.crt \
        --cert=/etc/kubernetes/pki/etcd/server.crt \
        --key=/etc/kubernetes/pki/etcd/server.key \
        member list -w table
## STATUS:       节点状态
## PEER ADDRS:   节点间通信地址
## CLIENT ADDRS: 客户端(API server)访问地址
## IS LEARNER:   是否为非投票成员(worker)
+---------+---------+--------+----------------------+----------------------+------------+
|   ID    | STATUS  |  NAME  |      PEER ADDRS      |     CLIENT ADDRS     | IS LEARNER |
+---------+---------+--------+----------------------+----------------------+------------+
| xxxxxxx | started | demo-1 | https://x.x.x.x:2380 | https://x.x.x.x:2379 |      false |
| xxxxxxx | started | demo-2 | https://x.x.x.x:2380 | https://x.x.x.x:2379 |      false |
| xxxxxxx | started | demo-3 | https://x.x.x.x:2380 | https://x.x.x.x:2379 |      false |
+---------+---------+--------+----------------------+----------------------+------------+

2.阻止 Pod 调度

通过 cordon 标记待重启节点,阻止新 Pod 调度上来:

注:一次只操作一台节点,完成该节点的 cordon → 重启 → uncordon 流程后,才能处理下一台!!!

bash 复制代码
# kubectl cordon demo-1

3.驱逐 Pod

kube-apiserver / controller-manager / scheduler / etcd 是 kubelet 直接管理的静态 Pod(static pod),drain 不会驱逐它们。只要其余两个节点的 etcd 存活,集群控制面就正常。

bash 复制代码
# kubectl drain demo-1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --timeout=300s

驱逐后确认节点已停止调度:

bash 复制代码
# kubectl get node
NAME     STATUS                     ROLES           AGE    VERSION
demo-1   Ready,SchedulingDisabled   control-plane   336d   v1.31.14
demo-2   Ready                      control-plane   336d   v1.31.14
demo-3   Ready                      control-plane   336d   v1.31.14

常见报错 :驱逐超时通常是因为 PDB(PodDisruptionBudget)不允许驱逐,例如:

bash 复制代码
error when evicting pods/"prometheus-k8s-0" -n "monitoring": Cannot evict pod as it would violate the pod's disruption budget.

排查并处理该节点的 PDB:

除临时改 PDB 外,也可扩容对应副本数,或手动清理 Pod(但这会使 PDB 失去意义)。

bash 复制代码
# kubectl get pdb -n monitoring prometheus-k8s
NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
prometheus-k8s   1               N/A               0                     337d

## 临时将 minAvailable 调为 0(结束后建议还原)
# kubectl patch pdb prometheus-k8s -n monitoring --type=json -p='[{"op":"replace","path":"/spec/minAvailable","value":0}]'

4.重启节点

bash 复制代码
# ssh demo-1
# reboot

5.恢复调度并验证

节点重启后,确认状态恢复 Ready(内核版本也会变为更新后的版本),再用 uncordon 解除调度限制:

bash 复制代码
# kubectl get node -o wide
NAME     STATUS                      ROLES           AGE    VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
demo-1   Ready,SchedulingDisabled    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-124-generic   containerd://1.7.27
demo-2   Ready                       control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-64-generic    containerd://1.7.27
demo-3   Ready                       control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-106-generic   containerd://1.7.27

# kubectl uncordon demo-1
node/demo-1 uncordoned

# kubectl get node -o wide
NAME     STATUS   ROLES           AGE    VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
demo-1   Ready    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-124-generic   containerd://1.7.27
demo-2   Ready    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-64-generic    containerd://1.7.27
demo-3   Ready    control-plane   336d   v1.31.14   x.x.x.x       <none>        Ubuntu 24.04.4 LTS   6.8.0-106-generic   containerd://1.7.27

6.循环操作

当前节点处理完毕,回到第 1 步对下一台节点重复整个流程...

问题排查

1.重启后 Pod 被标记为 <invalid>

1.1.问题现象

重启节点后,该节点所有 Pod 的 RESTARTS 列显示为 <invalid>,静态 Pod READY 列展示为 0/1。在 k8s 源码中:pod 创建时间与当前时间偏差超过 2 秒即显示 invalid

bash 复制代码
# kubectl get pods -A -o wide | grep demo-3
NAMESPACE     NAME                             READY   STATUS    RESTARTS            AGE     IP         NODE
kube-system   coredns-dbbb9ff68-z8wjd          1/1     Running   0 (<invalid> ago)   131m    x.x.x.x    demo-3
kube-system   etcd-demo-3                      0/1     Running   0 (<invalid> ago)   10m     x.x.x.x    demo-3
kube-system   kube-apiserver-demo-3            0/1     Running   0 (<invalid> ago)   10m     x.x.x.x    demo-3
kube-system   kube-controller-manager-demo-3   0/1     Running   0 (<invalid> ago)   10m     x.x.x.x    demo-3
kube-system   kube-proxy-f2fj8                 1/1     Running   1 (<invalid> ago)   75d     x.x.x.x    demo-3
kube-system   kube-scheduler-demo-3            0/1     Running   0 (<invalid> ago)   10m     x.x.x.x    demo-3
kube-system   node-local-dns-xkfxc             1/1     Running   3 (<invalid> ago)   75d     x.x.x.x    demo-3

1.2.排查过程

1.2.1.校验容器与节点时间

以 etcd Pod 为例,其 startedAt(UTC)比节点当前 UTC 时间晚了约 7 小时 40 分,处于 "未来" 时间:

通过时间结尾的 Z 判断时间格式为 UTC

bash 复制代码
## 节点当前时间(CST / UTC)
# date
Wed Jun 17 09:06:16 PM CST 2026
# date -u
Wed Jun 17 01:06:16 PM UTC 2026

## Pod 容器状态(时间为 UTC)
# kubectl get pod -n kube-system etcd-demo-1 -o jsonpath='{.status.containerStatuses}' | jq .
[
  {
    ...
    "lastState": {
      "terminated": {
        "exitCode": 255,
        "finishedAt": "2026-06-17T20:46:05Z",
        "reason": "Unknown",
        "startedAt": "2026-06-17T09:56:07Z"
      }
    },
    "name": "etcd",
    "ready": false,
    "restartCount": 4,
    "state": {
      "running": {
        "startedAt": "2026-06-17T20:46:16Z"
      }
    }
  }
]

## containerd 记录的容器创建/启动时间
## https://github.com/kubernetes-sigs/cri-tools/blob/v1.26.1/cmd/crictl/container.go#L862
# crictl inspect 2ea3cdadfec6f | grep -Ei "createdAt|startedAt|finishedAt"
    "createdAt": "2026-06-18T04:46:16.17870878+08:00",
    "startedAt": "2026-06-18T04:46:16.641950022+08:00",
    "finishedAt": "0001-01-01T00:00:00Z",

## 换算为 UTC 统一对比
##   containerd:  2026-06-17 20:46:16 UTC
##   节点当前:    2026-06-17 13:06:16 UTC   ← 容器时间在「未来」(晚 7h40m)

1.2.2.校验 Pod 底层容器状态

排除 containerd / etcd 异常。底层容器实际处于 Running 状态:

bash 复制代码
# crictl ps -a | grep 'etcd'
CONTAINER        IMAGE            CREATED                  STATE      NAME    ATTEMPT  POD ID           POD
2ea3cdadfec6f    2e96e5913fc06    Less than a second ago   Running    etcd    4        7451b51061d22    etcd-demo-1
fab531bce16e7    2e96e5913fc06    3 hours ago              Exited     etcd    3        5f897d72fd206    etcd-demo-1

## 进程确实在跑
# ps aux | grep -v 'grep' | grep 'etcd'
root  2089  ...  etcd --advertise-client-urls=https://x.x.x.x:2379 ...

## etcd 节点状态
# etcdctl --endpoints=https://127.0.0.1:2379 \
          --cacert=/etc/kubernetes/pki/etcd/ca.crt \
          --cert=/etc/kubernetes/pki/etcd/server.crt \
          --key=/etc/kubernetes/pki/etcd/server.key \
          endpoint status -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://x.x.x.x:2379   | 683c58b549788bd9 |  3.5.15 |   30 MB |     false |      false |        59 |  131325454 |          131325454 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

1.2.3.定位时间问题根因

各节点都配了 NTP,对比时间一致。这里我的想法是这样的:containerd 服务本身已经很多年了,大概率不会有这种 bug,更可能是他创建容器时,识别到的时间就是在 "未来"...那现在查看时间却正常了,应该是被节点中配置的时间同步改回来了。所以我才通过 dmesg 看一下启动后的内核记录:

bash 复制代码
# dmesg -T | grep -iEw "rtc|time|clock"
[Wed Jun 17 20:54:08 2026] vmware: Host bus clock speed read from hypervisor : 66000000 Hz
[Wed Jun 17 20:54:08 2026] vmware: using clock offset of 5629468472 ns
[Wed Jun 17 20:54:08 2026] PM: RTC time: 12:54:08, date: 2026-06-17
[Wed Jun 17 20:54:09 2026] PTP clock support registered
## 设置 UTC 时间
[Wed Jun 17 20:54:10 2026] rtc_cmos 00:01: setting system clock to 2026-06-17T12:54:10 UTC (1781700850)
[Wed Jun 17 20:54:11 2026] Loaded X.509 cert 'Build time autogenerated kernel key: ...'
## 内核设置完时间约 40 秒后 systemd-journald 又记录了一次时间向后跳变
[Wed Jun 17 20:54:50 2026] systemd-journald[525]: Time jumped backwards, rotating.

用 journalctl 把 systemd-journald 记录的日志导出,由于输出较多,我做了一些精简。发现时间线为:20:54:xx → 04:45:xx → 20:54:xx,在主机时间被拨到"未来"(04:45)时 containerd 才启动,所以创建出的容器时间也是 "未来"。

注:这份日志并不能直接证明是 VMware 改的时间。我的判断依据是:时间被改为 04:45:xx 前启动的只有 ssh/cron 等系统服务,他们不没有改时间的能力;而 VGAuthService 是距改时最近、且具备改时能力的服务,因此列为第一怀疑对象,后续治本方案也验证了这一点。

bash 复制代码
## 内容较多,输出到文件后截取必要片段
# journalctl -b --no-pager > journalctl.txt
## -b: 只看本次开机后的日志;
## --no-pager: 不分页全部输出

## 50 秒前后可看到明显的时间差异:20:54:xx → 04:45:xx → 20:54:xx
Jun 17 20:54:19 demo-1 kernel: DMI: VMware, Inc. VMware Virtual Platform/...
Jun 17 20:54:19 demo-1 kernel: vmware: hypercall mode: 0x02
Jun 17 20:54:19 demo-1 kernel: Hypervisor detected: VMware
Jun 17 20:54:19 demo-1 kernel: vmware: TSC freq read from hypervisor : 2600.000 MHz
Jun 17 20:54:19 demo-1 kernel: vmware: Host bus clock speed read from hypervisor : 66000000 Hz
Jun 17 20:54:19 demo-1 kernel: vmware: using clock offset of 5629468472 ns
Jun 17 20:54:19 demo-1 kernel: Booting paravirtualized kernel on VMware hypervisor
Jun 17 20:54:26 demo-1 VGAuthService[799]: Using '/var/lib/vmware/VGAuth/aliasStore' for alias store root directory
Jun 17 20:54:26 demo-1 VGAuthService[799]: LoadCatalogAndSchema: Using '/etc/vmware-tools/vgauth/schemas' for SAML schemas
Jun 17 20:54:26 demo-1 VGAuthService[799]: LoadPrefs: Allowing 300 of clock skew for SAML date validation
Jun 17 20:54:26 demo-1 VGAuthService[799]: SAML_Init: Using xmlsec1 1.2.39 for XML signature support
Jun 17 20:54:26 demo-1 VGAuthService[799]: ServiceNetworkCreateSocketDir: Created socket directory '/var/run/vmware'
Jun 17 20:54:26 demo-1 VGAuthService[799]: BEGIN SERVICE
Jun 17 20:54:26 demo-1 systemd[1]: Starting etcd.service - Etcd Service...
## 主机时间被拨到 "未来" 后,containerd 才启动
Jun 18 04:45:57 demo-1 systemd-resolved[796]: Clock change detected. Flushing caches.
Jun 18 04:45:57 demo-1 systemd[1]: Started kubelet.service - kubelet: The Kubernetes Node Agent.
Jun 18 04:45:58 demo-1 systemd[1]: Starting containerd.service - containerd container runtime...
Jun 18 04:46:16 demo-1 containerd[919]: time="..." msg="CreateContainer ... for &ContainerMetadata{Name:etcd,Attempt:4,} returns container id \"2ea3cdadfec6f...\""
Jun 18 04:46:16 demo-1 containerd[919]: time="..." msg="StartContainer for \"2ea3cdadfec6f...\""
Jun 18 04:46:16 demo-1 systemd[1]: Started cri-containerd-2ea3cdadfec6f....scope - libcontainer container 2ea3cdadfec6f....
Jun 18 04:46:16 demo-1 containerd[919]: time="..." msg="StartContainer for \"2ea3cdadfec6f...\" returns successfully"
## 时钟再次被更改
Jun 17 20:54:50 demo-1 systemd-resolved[796]: Clock change detected. Flushing caches.
Jun 17 20:54:50 demo-1 systemd-journald[525]: Time jumped backwards, rotating.
Jun 17 20:54:50 demo-1 systemd-timesyncd[797]: Contacted time server 91.189.91.157:123 (ntp.ubuntu.com).
Jun 17 20:54:50 demo-1 systemd-timesyncd[797]: Initial clock synchronization to Wed 2026-06-17 20:54:50.365713 CST.
Jun 17 20:54:50 demo-1 systemd[1]: etcd.service: Scheduled restart job, restart counter is at 2.
Jun 17 20:54:50 demo-1 systemd[1]: Starting etcd.service - Etcd Service...

1.3.解决方式

通过上面日志输出,怀疑根因是 VMware 导致的时间跳变,那就有有两条路:

  1. 改虚拟机自身启动顺序(治标);
  2. 改 VMware 时间同步配置(治本)。

1.3.1.治标:更改启动顺序

适用于没有 VMware 宿主机权限的场景。新建一个等待时钟同步的服务,让 kubelet 依赖它,确保主机时间恢复正常后再拉起容器。以及 contaierd 也需要这个依赖,否则会出现下一个问题。

这个自定义服务具体内容不重要,换成 sleep 60 也能达到目的。

bash 复制代码
## 1. 用 Drop-In 而非直接改 kubelet.service
##    /usr/lib/systemd/system/kubelet.service 归 rpm/deb 包所有,升级会被覆盖;
##    systemd Drop-In (/etc/systemd/system/kubelet.service.d/) 不属于任何包,类似 helm custom value.
# systemctl status kubelet
## ● kubelet.service - kubelet: The Kubernetes Node Agent
##      Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: enabled)
##     Drop-In: /usr/lib/systemd/system/kubelet.service.d/
##              └─10-kubeadm.conf
# mkdir /etc/systemd/system/kubelet.service.d/

## 2. 新建等待时钟同步服务
cat > /etc/systemd/system/wait-for-clock-sync.service <<'EOF'
[Unit]
Description=Wait for system clock to be synchronized
After=systemd-timesyncd.service network-online.target
Before=kubelet.service
Wants=network-online.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c 'for i in $(seq 1 60); do if timedatectl show -p NTPSynchronized --value 2>/dev/null | grep -q yes; then exit 0; fi; sleep 1; done; exit 0'
TimeoutStartSec=70

[Install]
WantedBy=multi-user.target
EOF

## 3. 让 kubelet 依赖它
cat > /etc/systemd/system/kubelet.service.d/20-wait-time-sync.conf <<'EOF'
[Unit]
After=wait-for-clock-sync.service
Requires=wait-for-clock-sync.service
EOF

## 4. 启用 + 重载
# systemctl daemon-reload
# systemctl enable --now wait-for-clock-sync.service

## 5. 验证依赖链
# systemctl list-dependencies --reverse wait-for-clock-sync.service
# systemctl status wait-for-clock-sync.service

1.3.2.治本:关闭 VMware 时间同步

本次环境是 VMware 与虚拟机时间不一致、启动时把虚拟机时间拨乱导致的。参考 VMware 官方博客,关掉所有时间同步:

  1. 关闭虚拟机;
  2. 更改对应机器 vmx 配置;
  3. 开机。
bash 复制代码
## 在 ESXi 宿主机上编辑对应虚拟机的 vmx
# grep -i 'time' /vmfs/volumes/data-1/demo-1/demo-1.vmx
time.synchronize.continue = "FALSE"
time.synchronize.restore = "FALSE"
time.synchronize.resume.disk = "FALSE"
time.synchronize.shrink = "FALSE"
time.synchronize.tools.startup = "FALSE"
time.synchronize.tools.enable = "FALSE"
time.synchronize.resume.host = "FALSE"
bash 复制代码
## 关闭后重启,dmesg 不再有时间跳变
# dmesg -T | grep -iEw "rtc|time|clock"
[Thu Jun 18 18:14:06 2026] vmware: Host bus clock speed read from hypervisor : 66000000 Hz
[Thu Jun 18 18:14:06 2026] vmware: using clock offset of 4177512396 ns
[Thu Jun 18 18:14:07 2026] PM: RTC time: 10:14:06, date: 2026-06-18
[Thu Jun 18 18:14:08 2026] PTP clock support registered
[Thu Jun 18 18:14:08 2026] rtc_cmos 00:01: setting system clock to 2026-06-18T10:14:08 UTC (1781777648)
[Thu Jun 18 18:14:08 2026] Loaded X.509 cert 'Build time autogenerated kernel key: ...'
[Thu Jun 18 18:14:09 2026] Loaded X.509 cert 'Build time autogenerated kernel key: ...'

2.重启后容器名称被占用,容器无法创建

2.1.问题现象

触发原因与问题 1 同源(时间跳变导致 containerd 数据写入不完整),根治同样用 [1.3](#根治同样用 1.3 的方案) 的方案。本节讲的是重启后已经出现该症状时,如何手动恢复。

此问题一般由两种情况导致:

  1. 非原子写入:containerd 创建容器时会写两条记录(名称 + 详情),如出现 断电/panic/时间跳变 等情况只写一条就会残留;
  2. 时间跳变(本次根因)。
bash 复制代码
# kubectl get pods -n kube-system -o wide | grep demo-1
NAME                              READY   STATUS     RESTARTS  AGE    IP        NODE
coredns-dbbb9ff68-pzr4j           1/1     Running    4         5h8m   x.x.x.x   demo-1
etcd-demo-1                       0/1     Unknown    8         20m    x.x.x.x   demo-1
kube-apiserver-demo-1             0/1     Unknown    32        20m    x.x.x.x   demo-1
kube-controller-manager-demo-1    0/1     Unknown    20        20m    x.x.x.x   demo-1
kube-proxy-jgx42                  1/1     Running    0         34m    x.x.x.x   demo-1
kube-scheduler-demo-1             0/1     Unknown    19        20m    x.x.x.x   demo-1
node-local-dns-vgjlf              1/1     Running    0         28m    x.x.x.x   demo-1

2.2.排查过程

2.2.1.查底层容器与日志

容器已创建但处于 Exited,且日志文件不存在(容器并未真正起来):

bash 复制代码
# crictl ps -a | grep -i 'etcd'
CONTAINER       IMAGE           CREATED                  STATE    NAME   ATTEMPT   POD ID          POD
c93ac13a41b67   2e96e5913fc06   Less than a second ago   Exited   etcd   8         f2fc82e9a4d98   etcd-demo-1

## 日志文件不存在
# crictl logs 624bf43aff2db
FATA[0000] failed to try resolving symlinks in path "/var/log/pods/kube-system_etcd-demo-1_c93ac13a41b67d89d5dbbfbc90cf9c8f/etcd/8.log":
lstat /var/log/pods/kube-system_etcd-demo-1_c93ac13a41b67d89d5dbbfbc90cf9c8f/etcd/8.log: no such file or directory

2.2.2.查 kubelet 日志

容器由 kubelet 拉起,看 kubelet 的报错。以下日志都在说同一件事:kubelet 想创建 etcd-demo-1,但这个名字在 containerd 里已被占用(reserved):

bash 复制代码
# journalctl -u kubelet --since "3 min ago" --no-pager | grep -i 'etcd'
Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675072 ... "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to reserve sandbox name \"etcd-demo-1_kube-system_df1fae7c70ff1a1dfc6127a8f7bf67a2_6\": name ... is reserved for \"7cc9d6627964...\""
Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675212 ... "Failed to create sandbox for pod" err="... failed to reserve sandbox name \"etcd-demo-1_kube-system_..._6\": name ... is reserved for \"7cc9d6627964...\"" pod="kube-system/etcd-demo-1"
Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675273 ... "CreatePodSandbox for pod failed" err="... failed to reserve sandbox name \"etcd-demo-1_kube-system_..._6\": name ... is reserved for \"7cc9d6627964...\"" pod="kube-system/etcd-demo-1"
Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675410 ... "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-demo-1_kube-system(...)\" with CreatePodSandboxError: ... name ... is reserved for \"7cc9d6627964...\"" pod="kube-system/etcd-demo-1"

2.3.解决方式

补充:containerd 社区已有对应 issue #10848,2.2.x / 2.3.x 已修复(见 PR #11576),升级后可避免复发。

暂停 kubelet 服务后,手动清理被占用的容器,再重启 containerd 即可:

bash 复制代码
## 停 kubelet,否则它会反复尝试创建
# systemctl stop kubelet

## 找到被占用的 sandbox(按名字查重名)
# crictl pods                     ## 列出所有 sandbox
# crictl pods -q --name <name>    ## 拿到对应 ID

## 删除残留 sandbox
# crictl stopp $ID
# crictl rmp -f $ID

# systemctl restart containerd
# systemctl start kubelet