aws eks节点的初始化引导和鉴权逻辑

参考资料

kubernetes集群的kubelet的启动引导

按照官方文档和相关资料的说法,kubelet加入集群通常会进行csr的自动审批,这被称作启动引导令牌(Bootstrap Tokens)认证

kubelet会寻找kubeconfig文件,如果没找到则

  • 寻找bootstrap-kubeconfigw文件开始引导过程
  • 使用bootstrap-kubeconfigw文件中的api server url和token(有限权限)进行身份认证
  • 创建和取回证书签名请求(CSR),此时kube-controller-manager会自动批复该 CSR
  • kubelet 取回签发的证书,创建 kubeconfig文件,包含密钥和已签名的证书

在使用kubeadm初始化control plane时日志中的信息和引导配置一致

https://kubernetes.io/zh-cn/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#bootstrap-initialization

复制代码
[apiclient] All control plane components are healthy after 15.003839 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[bootstrap-token] Using token: 47wg36.zm1tr8lfosat2943
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key

同样,kubelet启动时需要以下参数来完成初始化

  • --cert-dir="/var/lib/kubelet/pki",path to store the key and certificate it generates (optional, can use default)
  • --kubeconfig="/var/lib/kubelet/kubeconfig",A path to a kubeconfig file that does not yet exist; it will place the bootstrapped config file here
  • --bootstrap-kubeconfig="/var/lib/kubelet/bootstrap-kubeconfig",A path to a bootstrap kubeconfig file to provide the URL for the server and bootstrap credentials, e.g. a bootstrap token
  • Optional: instructions to rotate certificates

其中/var/lib/kubelet/bootstrap-kubeconfig文件的内容格式如下

  • user字段下使用一个token进行身份认证
  • certificate-authority由kubelet用来验证apiserver的服务器证书

https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#kubelet-configuration

yaml 复制代码
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /var/lib/kubernetes/ca.pem
    server: https://my.server.example.com:6443
  name: bootstrap
contexts:
- context:
    cluster: bootstrap
    user: kubelet-bootstrap
  name: bootstrap
current-context: bootstrap
preferences: {}
users:
- name: kubelet-bootstrap
  user:
    token: 07401b.f395accd246ae52d

对于kubeconfig文件

在启动 kubelet 时,如果 --kubeconfig 标志所指定的文件并不存在,会使用通过标志 --bootstrap-kubeconfig 所指定的启动引导 kubeconfig 配置来向 API 服务器请求客户端证书。 在证书请求被批复并被 kubelet 收回时,一个引用所生成的密钥和所获得证书的 kubeconfig 文件会被写入到通过 --kubeconfig 所指定的文件路径下。 证书和密钥文件会被放到 --cert-dir 所指定的目录中

引导节点拿到的证书是kubelet客户端证书,用来和apiserver通信。kubelet也可以作为服务向外暴露,证书的来源有

  • 使用通过 --tls-private-key-file--tls-cert-file 所设置的密钥和证书
  • 如果没有提供密钥和证书,则创建自签名的密钥和证书
  • 通过 CSR API 从集群服务器请求服务证书

配置完成后最终kubelet使用该kubeconfig文件和apiserver通信,注意这个文件名叫做kubelet-client-current.pem

yaml 复制代码
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0F...
    server: https://192.168.1.10:6443
  name: test-cluster
contexts:
- context:
    cluster: test-cluster
    user: system:node:test-cluster-node-1
  name: system:node:test-cluster-node-1
current-context: system:node:test-cluster-node-1
kind: Config
preferences: {}
users:
- name: system:node:test-cluster-node-1
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

eks集群的kubelet启动引导

eks节点使用bootstrap启动脚本来完成kubelet的初始化,启动的完整参数如下

  • 没有指定--bootstrap-kubeconfig,意味着不需要进行TLS token的初始化

    shell 复制代码
    /usr/bin/kubelet --cloud-provider aws --image-credential-provider-config /etc/eks/ecr-credential-provider/ecr-credential-provider-config --image-credential-provider-bin-dir /etc/eks/ecr-credential-provider --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime remote --container-runtime-endpoint unix:///run/containerd/containerd.sock --node-ip=192.168.27.37 --pod-infra-container-image=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause:3.5 --v=2 
  • /var/lib/kubelet/pki/路径下存在证书

    复制代码
    -rw------- 1 root root 1378 May 20 04:41 kubelet-server-2023-05-20-04-41-35.pem
    lrwxrwxrwx 1 root root   59 May 20 04:41 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2023-05-20-04-41-35.pem

我们启动一个新的eks优化ami查看这些路径下有无这些文件,

复制代码
aws ec2 run-instances --image-id ami-0b779daxxxxxf1 \
	--instance-type m4.large \
	--key-name temp-key \
	--count 1 \
    --subnet-id subnet-027xxxxxxxx60acdd \
    --security-group-ids sg-096xxxxxx7e9 \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=testtmp}]' 'ResourceType=volume,Tags=[{Key=Name,Value=testena}]'

查看kubeconfig实际上是由bootstrap替换得到的

复制代码
$ cat /var/lib/kubelet/kubeconfig
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: MASTER_ENDPOINT
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubelet
  name: kubelet
current-context: kubelet
users:
- name: kubelet
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      command: /usr/bin/aws-iam-authenticator
      args:
        - "token"
        - "-i"
        - "CLUSTER_NAME"
        - --region
        - "AWS_REGION"

/var/lib/kubelet/pki/路径下并无文件,因此实际上是通过kubelet启动时生成的

通常的eks节点kubelet启动日志

我们修改并额外指定kubelet启动参数为/var/lib/kubelet/bootstrap-kubeconfig

复制代码
mv /var/lib/kubelet/kubeconfig /var/lib/kubelet/bootstrap-kubeconfig
KUBELET_EXTRA_ARGS="--bootstrap-kubeconfig /var/lib/kubelet/bootstrap-kubeconfig $KUBELET_EXTRA_ARGS"

此时kubelet无法启动,等待客户端证书签发

/var/lib/kubelet/pki/多了一个文件kubelet-client.key.tmp

  • kubelet生成cliet私钥,使用该私钥生成csr

  • 之后向apiserver发送csr,请求公钥证书验证签名

    $ ll
    total 8
    -rw------- 1 root root 227 May 20 05:59 kubelet-client.key.tmp

再次重启kubelet,仍旧保持同样的结果

手动通过csr

复制代码
$ kubectl get csr
node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0
[ec2-user@ip-172-31-22-99 ~]$ kubectl certificate approve node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0
certificatesigningrequest.certificates.k8s.io/node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0 approved

此时kubelet显示csr已经通过,等待签发证书,但是这一步显然是不行的(可能会绕过sts)

复制代码
"Waiting for client certificate to be issued"
certificate signing request node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0 is approved, waiting to be issued

由于eks的apiserver我们看不到,因此无从知晓kubelet的集权是怎么完成的,目前推测和serviceaccount类似都是使用webhook的方式完成的。

接下来,进入节点并停止kubelet服务,手动使用预置参数启动kubelet

复制代码
# /usr/bin/kubelet --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime-endpoinunix:///run/containerd/containerd.sock --image-credential-provider-config /etc/eks/image-credential-provider/config.json --image-credential-provider-bin-dir /etc/eks/image-credential-provider --node-ip=192.168.31.153 --pod-infra-container-image=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause:3.5 --v=2 --cloud-provider=aws --container-runtime=remote --node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/cluster-name=test124,alpha.eksctl.io/nodegroup-name=test124-ng6,eks.amazonaws.com/nodegroup-image=ami-0b779da1a68e38cf1,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=test124-ng6,eks.amazonaws.com/sourceLaunchTemplateId=lt-0adf6b991b5366a98 --max-pods=58

查看kubelet日志

  • 确实存在csr申请和kubelet证书签发

    复制代码
    server.go:1175] "Started kubelet"
    server.go:155] "Starting to listen" address="0.0.0.0" port=10250
    ...
    log.go:198] http: TLS handshake error from 192.168.9.40:33148: no serving certificate available for the kubelet
    ...
    csr.go:261] certificate signing request csr-2blqg is approved, waiting to be issued
    csr.go:257] certificate signing request csr-2blqg is issued
    
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Certificate expiration is 2024-05-19 08:05:00 +0000 UTC, rotation deadline is 2024-04-09 08:32:10.925495309 +0000 UTC
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Waiting 7800h21m57.699693545s for next certificate rotation
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Certificate expiration is 2024-05-19 08:05:00 +0000 UTC, rotation deadline is 2024-02-19 02:05:12.905986938 +0000 UTC
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Waiting 6593h54m58.680004839s for next certificate rotation
  • 查看证书内容,kubelet的用户组是system:nodes, 用户名为system:node:ip-192-168-31-153.cn-north-1.compute.internal。kube-apiserver就可以基于Node Authorizer来限制kubelet只能读取和修改本节点上的资源

    复制代码
    $ sudo openssl x509 -noout -text -in /var/lib/kubelet/pki/kubelet-server-current.pem
    Certificate:
        Data:
            Version: 3 (0x2)
            Serial Number:
                2c:26:45:8d:d2:56:39:64:33:b4:b0:c6:6f:82:e7:66:14:f7:55:b9
        Signature Algorithm: sha256WithRSAEncryption
            Issuer: CN=kubernetes
            Validity
                Not Before: May 20 08:05:00 2023 GMT
                Not After : May 19 08:05:00 2024 GMT
            Subject: O=system:nodes, CN=system:node:ip-192-168-31-153.cn-north-1.compute.internal
            Subject Public Key Info:
                Public Key Algorithm: id-ecPublicKey
                    Public-Key: (256 bit)
                    pub:
                        04:25:36:df:f1:44:30:00:f7:62:43:7a:f3:cc:21:
                        22:ee:ed:40:40:0c:0b:28:2c:87:16:4f:bd:9b:c5:
                        70:83:e3:15:8a:2c:b9:f1:94:ca:53:95:d8:ee:42:
                        b4:21:ab:85:a6:25:0f:71:b8:2d:c6:b2:08:ce:e0:
                        d9:d1:c3:87:a0
                    ASN1 OID: prime256v1
                    NIST CURVE: P-256

这之后逻辑上kubelet会使用这个证书访问apiserver,但是实际上并没有用到这个证书。为了验证我们手动将/var/lib/kubelet/pki下的文件全部删除

复制代码
rm -rf /var/lib/kubelet/pki

但是经过一段时间后,节点并没有进入not ready状态,并且没有任何影响

那么接下来的问题在于

(1)如果不需要客户端证书,那为什么还需要申请呢?

目前看来是由于kubelet需要对外暴露服务,所以通过 CSR API 从集群服务器请求服务证书

https://kubernetes.io/zh-cn/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#client-and-serving-certificates
kubelet同样对外暴露了HTTPS服务,其客户端主要是kube-apiserver和一些监控组件,如metric-serverkube-apiserver需要访问kubelet来获取容器的日志和执行命令(kubectl logs/exec), 监控组件需要访问kubelet暴露的cadvisor接口来获取监控信息

kubelet在启动时,如果没有指定服务端证书路径,会创建一个自签的CA证书,并使用该CA为自己签发服务端证书

kubelet配置文件配置serverTLSBootstrap为true就可以启用这项特性,在eks节点上kubelet确实配置为"serverTLSBootstrap": true

也就是说,这里为kubelet签发的并不是客户端证书,而是服务端证书。

(2)eks节点究竟是使用什么来进行apiserver的认证呢?

上面的分析表明,eks节点不是通过签发客户端证书和apiserver通信的(手动签发实际上会卡在等待issued)。目前唯一能看到的和kubelet与apiserver通信的配置就只有/var/lib/kubelet/kubeconfig文件

手动修改节点上的default凭证会导致节点进入not ready状态,可能是kubelet在使用/usr/bin/aws-iam-authenticator获取tokne后,请求没有权限导致的。

我们直接拿/usr/bin/aws-iam-authenticator生成的token请求apiserver

shell 复制代码
/usr/bin/aws-iam-authenticator token -i test124
curl -k --header "Authorization: Bearer xxxxxxxxxxxxxxxxxxxx" https://C9611E71A15AC11DE8CF33921D4BC09B.yl4.cn-north-1.eks.amazonaws.com.cn
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:node:ip-192-168-31-153.cn-north-1.compute.internal\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

# 无权限的请求结果
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

那么接下来的问题在于,为什么修改默认凭证后,过了10-15分钟才会出现无权限的问题

查看源码有如下描述,可见token的超时时间为15分钟

https://github.com/kubernetes-sigs/aws-iam-authenticator/blob/master/pkg/token/token.go#L82

复制代码
const (
	// The sts GetCallerIdentity request is valid for 15 minutes regardless of this parameters value after it has been
	// signed, but we set this unused parameter to 60 for legacy reasons (we check for a value between 0 and 60 on the
	// server side in 0.3.0 or earlier).  IT IS IGNORED.  If we can get STS to support x-amz-expires, then we should
	// set this parameter to the actual expiration, and make it configurable.
	requestPresignParam = 60
	// The actual token expiration (presigned STS urls are valid for 15 minutes after timestamp in x-amz-date).
	presignedURLExpiration = 15 * time.Minute
	v1Prefix               = "k8s-aws-v1."
	maxTokenLenBytes       = 1024 * 4
	clusterIDHeader        = "x-k8s-aws-id"
	// Format of the X-Amz-Date header used for expiration
	// https://golang.org/pkg/time/#pkg-constants
	dateHeaderFormat   = "20060102T150405Z"
	kindExecCredential = "ExecCredential"
	execInfoEnvKey     = "KUBERNETES_EXEC_INFO"
	stsServiceID       = "sts"
)

单纯将kubeconfig文件修改后是不行的,可能是由于kubelet已经将配置文件读取到内存中了,之后手动删除authenticator,发现过一段时间后出现以下报错

复制代码
kubelet_node_status.go:539] "Error updating node status, will retry" err="error getting node \"ip-192-168-13-54.cn-north-1.compute.internal\": Get \"https://xxxxxxxxB.yl4.cn-north-1.eks.amazonaws.com.cn/api/v1/nodes/ip-192-168-13-54.cn-north-1.compute.internal?resourceVersion=0&timeout=10s\": getting credentials: exec: fork/exec /usr/bin/aws-iam-authenticator: no such file or directory"

同样在检查5次节点状态后超时,可见kubelet确实是通过authenticator的token来与apiserver通信的

相关推荐
天翼云开发者社区3 小时前
使用 Rust 实现的基础的List 和 Watch 机制
rust·云计算
AKAMAI17 小时前
跳过复杂环节:Akamai应用平台让Kubernetes生产就绪——现已正式发布
人工智能·云原生·云计算
天翼云开发者社区19 小时前
Rust 中的 Tokio 线程同步机制
云计算
天翼云开发者社区1 天前
亮相2025年服贸会,天翼云打造高质量算力服务新生态!
云计算·天翼云
容器魔方3 天前
Bloomberg 正式加入 Karmada 用户组!
云原生·容器·云计算
AKAMAI4 天前
Sport Network 凭借 Akamai 实现卓越成就
人工智能·云原生·云计算
10岁的博客4 天前
《云计算如何驱动企业数字化转型:关键技术与实践案例》
云计算
m0_694845575 天前
教你使用服务器如何搭建数据库
linux·运维·服务器·数据库·云计算
shinelord明5 天前
【数据行业发展】可信数据空间~数据价值的新型基础设施
大数据·架构·云计算·创业创新
XINVRY-FPGA5 天前
XCKU15P-2FFVA1760I AMD 赛灵思 Xilinx Kintex UltraScale+ FPGA
arm开发·嵌入式硬件·阿里云·fpga开发·云计算·硬件工程·fpga