问题表现
swift
E1126 07:37:20.876269 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.135:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.135 because it doesn't contain any IP SANs" node="server-06"
E1126 07:37:20.881132 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.136:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.136 because it doesn't contain any IP SANs" node="server-07"
E1126 07:37:20.886884 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.133:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.133 because it doesn't contain any IP SANs" node="server-04"
E1126 07:37:20.891583 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.134:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.134 because it doesn't contain any IP SANs" node="server-05"
I1126 07:37:24.595189 1 server.go:192] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I1126 07:37:34.596509 1 server.go:192] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E1126 07:37:35.842888 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.131:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.131 because it doesn't contain any IP SANs" node="server-03"
E1126 07:37:35.843603 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.136:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.136 because it doesn't contain any IP SANs" node="server-07"
E1126 07:37:35.865009 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.137:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.137 because it doesn't contain any IP SANs" node="server-08"
E1126 07:37:35.869804 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.135:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.135 because it doesn't contain any IP SANs" node="server-06"
E1126 07:37:35.875473 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.133:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.133 because it doesn't contain any IP SANs" node="server-04"
E1126 07:37:35.880808 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.174.134:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.174.134 because it doesn't contain any IP SANs" node="server-05"
问题解释
控制平面节点初始化或者 join 以及工作节点 join 都是自签 kubelet server 端的证书,第三方组件请求 kubelet 获取信息时 TLS 握手校验 kubelet server 证书时,发现请求的 ip 不在证书的 sans 中,则无法通过证书校验,TLS 握手失败。有个配置可以将 kubelet server 证书改为请求 Kubernetes CA 签发
ini
# 默认行为(没有 serverTLSBootstrap)
kubelet 启动
↓
自己生成证书(自签名)
↓
Issuer: CN = server-03-ca@1764041318 # ❌ 自己签发自己
Subject: CN = server-03@1764041318
↓
其他组件不信任这个证书
↓
需要 --kubelet-insecure-tls 跳过验证
配置 serverTLSBootstrap 为 true 后
ini
kubelet 启动
↓
不再自己生成证书
↓
而是请求 Kubernetes CA 签发
↓
Issuer: CN = kubernetes # ✅ 由可信 CA 签发
Subject: CN = system:node:server-03, O = system:nodes
↓
所有组件都信任这个证书
↓
不需要 --kubelet-insecure-tls
解决方案
编辑每个节点的 kubelet 配置:
bash
# 编辑 kubelet 配置
vim /var/lib/kubelet/config.yaml
添加或修改:
vbnet
serverTLSBootstrap: true
重启 kubelet
systemctl restart kubelet
在主节点批准 CSR 请求
csharp
# 查看待批准的 CSR
kubectl get csr
# 批准所有待处理的 CSR
kubectl get csr | grep Pending | awk '{print $1}' | xargs kubectl certificate approve
问题解决
metrics-server 能正常获取到节点指标
vbscript
root@server-03:~# kubectl top nodes
NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
server-03 108m 5% 1054Mi 27%
server-04 110m 5% 828Mi 21%
server-05 98m 4% 907Mi 24%
server-06 26m 1% 392Mi 10%
server-07 18m 0% 369Mi 9%
server-08 20m 1% 367Mi 9%