aws eks节点的初始化引导和鉴权逻辑

参考资料

kubernetes集群的kubelet的启动引导

按照官方文档和相关资料的说法,kubelet加入集群通常会进行csr的自动审批,这被称作启动引导令牌(Bootstrap Tokens)认证

kubelet会寻找kubeconfig文件,如果没找到则

  • 寻找bootstrap-kubeconfigw文件开始引导过程
  • 使用bootstrap-kubeconfigw文件中的api server url和token(有限权限)进行身份认证
  • 创建和取回证书签名请求(CSR),此时kube-controller-manager会自动批复该 CSR
  • kubelet 取回签发的证书,创建 kubeconfig文件,包含密钥和已签名的证书

在使用kubeadm初始化control plane时日志中的信息和引导配置一致

https://kubernetes.io/zh-cn/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#bootstrap-initialization

[apiclient] All control plane components are healthy after 15.003839 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[bootstrap-token] Using token: 47wg36.zm1tr8lfosat2943
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key

同样,kubelet启动时需要以下参数来完成初始化

  • --cert-dir="/var/lib/kubelet/pki",path to store the key and certificate it generates (optional, can use default)
  • --kubeconfig="/var/lib/kubelet/kubeconfig",A path to a kubeconfig file that does not yet exist; it will place the bootstrapped config file here
  • --bootstrap-kubeconfig="/var/lib/kubelet/bootstrap-kubeconfig",A path to a bootstrap kubeconfig file to provide the URL for the server and bootstrap credentials, e.g. a bootstrap token
  • Optional: instructions to rotate certificates

其中/var/lib/kubelet/bootstrap-kubeconfig文件的内容格式如下

  • user字段下使用一个token进行身份认证
  • certificate-authority由kubelet用来验证apiserver的服务器证书

https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#kubelet-configuration

yaml 复制代码
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /var/lib/kubernetes/ca.pem
    server: https://my.server.example.com:6443
  name: bootstrap
contexts:
- context:
    cluster: bootstrap
    user: kubelet-bootstrap
  name: bootstrap
current-context: bootstrap
preferences: {}
users:
- name: kubelet-bootstrap
  user:
    token: 07401b.f395accd246ae52d

对于kubeconfig文件

在启动 kubelet 时,如果 --kubeconfig 标志所指定的文件并不存在,会使用通过标志 --bootstrap-kubeconfig 所指定的启动引导 kubeconfig 配置来向 API 服务器请求客户端证书。 在证书请求被批复并被 kubelet 收回时,一个引用所生成的密钥和所获得证书的 kubeconfig 文件会被写入到通过 --kubeconfig 所指定的文件路径下。 证书和密钥文件会被放到 --cert-dir 所指定的目录中

引导节点拿到的证书是kubelet客户端证书,用来和apiserver通信。kubelet也可以作为服务向外暴露,证书的来源有

  • 使用通过 --tls-private-key-file--tls-cert-file 所设置的密钥和证书
  • 如果没有提供密钥和证书,则创建自签名的密钥和证书
  • 通过 CSR API 从集群服务器请求服务证书

配置完成后最终kubelet使用该kubeconfig文件和apiserver通信,注意这个文件名叫做kubelet-client-current.pem

yaml 复制代码
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0F...
    server: https://192.168.1.10:6443
  name: test-cluster
contexts:
- context:
    cluster: test-cluster
    user: system:node:test-cluster-node-1
  name: system:node:test-cluster-node-1
current-context: system:node:test-cluster-node-1
kind: Config
preferences: {}
users:
- name: system:node:test-cluster-node-1
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

eks集群的kubelet启动引导

eks节点使用bootstrap启动脚本来完成kubelet的初始化,启动的完整参数如下

  • 没有指定--bootstrap-kubeconfig,意味着不需要进行TLS token的初始化

    shell 复制代码
    /usr/bin/kubelet --cloud-provider aws --image-credential-provider-config /etc/eks/ecr-credential-provider/ecr-credential-provider-config --image-credential-provider-bin-dir /etc/eks/ecr-credential-provider --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime remote --container-runtime-endpoint unix:///run/containerd/containerd.sock --node-ip=192.168.27.37 --pod-infra-container-image=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause:3.5 --v=2 
  • /var/lib/kubelet/pki/路径下存在证书

    -rw------- 1 root root 1378 May 20 04:41 kubelet-server-2023-05-20-04-41-35.pem
    lrwxrwxrwx 1 root root   59 May 20 04:41 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2023-05-20-04-41-35.pem
    

我们启动一个新的eks优化ami查看这些路径下有无这些文件,

aws ec2 run-instances --image-id ami-0b779daxxxxxf1 \
	--instance-type m4.large \
	--key-name temp-key \
	--count 1 \
    --subnet-id subnet-027xxxxxxxx60acdd \
    --security-group-ids sg-096xxxxxx7e9 \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=testtmp}]' 'ResourceType=volume,Tags=[{Key=Name,Value=testena}]'

查看kubeconfig实际上是由bootstrap替换得到的

$ cat /var/lib/kubelet/kubeconfig
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: MASTER_ENDPOINT
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubelet
  name: kubelet
current-context: kubelet
users:
- name: kubelet
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      command: /usr/bin/aws-iam-authenticator
      args:
        - "token"
        - "-i"
        - "CLUSTER_NAME"
        - --region
        - "AWS_REGION"

/var/lib/kubelet/pki/路径下并无文件,因此实际上是通过kubelet启动时生成的

通常的eks节点kubelet启动日志

我们修改并额外指定kubelet启动参数为/var/lib/kubelet/bootstrap-kubeconfig

mv /var/lib/kubelet/kubeconfig /var/lib/kubelet/bootstrap-kubeconfig
KUBELET_EXTRA_ARGS="--bootstrap-kubeconfig /var/lib/kubelet/bootstrap-kubeconfig $KUBELET_EXTRA_ARGS"

此时kubelet无法启动,等待客户端证书签发

/var/lib/kubelet/pki/多了一个文件kubelet-client.key.tmp

  • kubelet生成cliet私钥,使用该私钥生成csr

  • 之后向apiserver发送csr,请求公钥证书验证签名

    $ ll
    total 8
    -rw------- 1 root root 227 May 20 05:59 kubelet-client.key.tmp

再次重启kubelet,仍旧保持同样的结果

手动通过csr

$ kubectl get csr
node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0
[ec2-user@ip-172-31-22-99 ~]$ kubectl certificate approve node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0
certificatesigningrequest.certificates.k8s.io/node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0 approved

此时kubelet显示csr已经通过,等待签发证书,但是这一步显然是不行的(可能会绕过sts)

"Waiting for client certificate to be issued"
certificate signing request node-csr-NvE0ty-HnP-dA435aze3oo3ubHfHI70ZX_9fTIss0Z0 is approved, waiting to be issued

由于eks的apiserver我们看不到,因此无从知晓kubelet的集权是怎么完成的,目前推测和serviceaccount类似都是使用webhook的方式完成的。

接下来,进入节点并停止kubelet服务,手动使用预置参数启动kubelet

# /usr/bin/kubelet --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime-endpoinunix:///run/containerd/containerd.sock --image-credential-provider-config /etc/eks/image-credential-provider/config.json --image-credential-provider-bin-dir /etc/eks/image-credential-provider --node-ip=192.168.31.153 --pod-infra-container-image=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause:3.5 --v=2 --cloud-provider=aws --container-runtime=remote --node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/cluster-name=test124,alpha.eksctl.io/nodegroup-name=test124-ng6,eks.amazonaws.com/nodegroup-image=ami-0b779da1a68e38cf1,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=test124-ng6,eks.amazonaws.com/sourceLaunchTemplateId=lt-0adf6b991b5366a98 --max-pods=58

查看kubelet日志

  • 确实存在csr申请和kubelet证书签发

    server.go:1175] "Started kubelet"
    server.go:155] "Starting to listen" address="0.0.0.0" port=10250
    ...
    log.go:198] http: TLS handshake error from 192.168.9.40:33148: no serving certificate available for the kubelet
    ...
    csr.go:261] certificate signing request csr-2blqg is approved, waiting to be issued
    csr.go:257] certificate signing request csr-2blqg is issued
    
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Certificate expiration is 2024-05-19 08:05:00 +0000 UTC, rotation deadline is 2024-04-09 08:32:10.925495309 +0000 UTC
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Waiting 7800h21m57.699693545s for next certificate rotation
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Certificate expiration is 2024-05-19 08:05:00 +0000 UTC, rotation deadline is 2024-02-19 02:05:12.905986938 +0000 UTC
    certificate_manager.go:270] kubernetes.io/kubelet-serving: Waiting 6593h54m58.680004839s for next certificate rotation
    
  • 查看证书内容,kubelet的用户组是system:nodes, 用户名为system:node:ip-192-168-31-153.cn-north-1.compute.internal。kube-apiserver就可以基于Node Authorizer来限制kubelet只能读取和修改本节点上的资源

    $ sudo openssl x509 -noout -text -in /var/lib/kubelet/pki/kubelet-server-current.pem
    Certificate:
        Data:
            Version: 3 (0x2)
            Serial Number:
                2c:26:45:8d:d2:56:39:64:33:b4:b0:c6:6f:82:e7:66:14:f7:55:b9
        Signature Algorithm: sha256WithRSAEncryption
            Issuer: CN=kubernetes
            Validity
                Not Before: May 20 08:05:00 2023 GMT
                Not After : May 19 08:05:00 2024 GMT
            Subject: O=system:nodes, CN=system:node:ip-192-168-31-153.cn-north-1.compute.internal
            Subject Public Key Info:
                Public Key Algorithm: id-ecPublicKey
                    Public-Key: (256 bit)
                    pub:
                        04:25:36:df:f1:44:30:00:f7:62:43:7a:f3:cc:21:
                        22:ee:ed:40:40:0c:0b:28:2c:87:16:4f:bd:9b:c5:
                        70:83:e3:15:8a:2c:b9:f1:94:ca:53:95:d8:ee:42:
                        b4:21:ab:85:a6:25:0f:71:b8:2d:c6:b2:08:ce:e0:
                        d9:d1:c3:87:a0
                    ASN1 OID: prime256v1
                    NIST CURVE: P-256
    

这之后逻辑上kubelet会使用这个证书访问apiserver,但是实际上并没有用到这个证书。为了验证我们手动将/var/lib/kubelet/pki下的文件全部删除

rm -rf /var/lib/kubelet/pki

但是经过一段时间后,节点并没有进入not ready状态,并且没有任何影响

那么接下来的问题在于

(1)如果不需要客户端证书,那为什么还需要申请呢?

目前看来是由于kubelet需要对外暴露服务,所以通过 CSR API 从集群服务器请求服务证书

https://kubernetes.io/zh-cn/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#client-and-serving-certificates
kubelet同样对外暴露了HTTPS服务,其客户端主要是kube-apiserver和一些监控组件,如metric-serverkube-apiserver需要访问kubelet来获取容器的日志和执行命令(kubectl logs/exec), 监控组件需要访问kubelet暴露的cadvisor接口来获取监控信息

kubelet在启动时,如果没有指定服务端证书路径,会创建一个自签的CA证书,并使用该CA为自己签发服务端证书

kubelet配置文件配置serverTLSBootstrap为true就可以启用这项特性,在eks节点上kubelet确实配置为"serverTLSBootstrap": true

也就是说,这里为kubelet签发的并不是客户端证书,而是服务端证书。

(2)eks节点究竟是使用什么来进行apiserver的认证呢?

上面的分析表明,eks节点不是通过签发客户端证书和apiserver通信的(手动签发实际上会卡在等待issued)。目前唯一能看到的和kubelet与apiserver通信的配置就只有/var/lib/kubelet/kubeconfig文件

手动修改节点上的default凭证会导致节点进入not ready状态,可能是kubelet在使用/usr/bin/aws-iam-authenticator获取tokne后,请求没有权限导致的。

我们直接拿/usr/bin/aws-iam-authenticator生成的token请求apiserver

shell 复制代码
/usr/bin/aws-iam-authenticator token -i test124
curl -k --header "Authorization: Bearer xxxxxxxxxxxxxxxxxxxx" https://C9611E71A15AC11DE8CF33921D4BC09B.yl4.cn-north-1.eks.amazonaws.com.cn
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:node:ip-192-168-31-153.cn-north-1.compute.internal\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

# 无权限的请求结果
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

那么接下来的问题在于,为什么修改默认凭证后,过了10-15分钟才会出现无权限的问题

查看源码有如下描述,可见token的超时时间为15分钟

https://github.com/kubernetes-sigs/aws-iam-authenticator/blob/master/pkg/token/token.go#L82

const (
	// The sts GetCallerIdentity request is valid for 15 minutes regardless of this parameters value after it has been
	// signed, but we set this unused parameter to 60 for legacy reasons (we check for a value between 0 and 60 on the
	// server side in 0.3.0 or earlier).  IT IS IGNORED.  If we can get STS to support x-amz-expires, then we should
	// set this parameter to the actual expiration, and make it configurable.
	requestPresignParam = 60
	// The actual token expiration (presigned STS urls are valid for 15 minutes after timestamp in x-amz-date).
	presignedURLExpiration = 15 * time.Minute
	v1Prefix               = "k8s-aws-v1."
	maxTokenLenBytes       = 1024 * 4
	clusterIDHeader        = "x-k8s-aws-id"
	// Format of the X-Amz-Date header used for expiration
	// https://golang.org/pkg/time/#pkg-constants
	dateHeaderFormat   = "20060102T150405Z"
	kindExecCredential = "ExecCredential"
	execInfoEnvKey     = "KUBERNETES_EXEC_INFO"
	stsServiceID       = "sts"
)

单纯将kubeconfig文件修改后是不行的,可能是由于kubelet已经将配置文件读取到内存中了,之后手动删除authenticator,发现过一段时间后出现以下报错

kubelet_node_status.go:539] "Error updating node status, will retry" err="error getting node \"ip-192-168-13-54.cn-north-1.compute.internal\": Get \"https://xxxxxxxxB.yl4.cn-north-1.eks.amazonaws.com.cn/api/v1/nodes/ip-192-168-13-54.cn-north-1.compute.internal?resourceVersion=0&timeout=10s\": getting credentials: exec: fork/exec /usr/bin/aws-iam-authenticator: no such file or directory"

同样在检查5次节点状态后超时,可见kubelet确实是通过authenticator的token来与apiserver通信的

相关推荐
嚯——哈哈4 小时前
筑起数字堡垒:解析AWS高防盾(Shield)的全面防护能力
服务器·微服务·云计算·aws
humors2214 小时前
阿里云ECS服务器监控报警配置
运维·服务器·安全·阿里云·云计算
valkyrja1104 小时前
aws 小白入门,VPC 子网、路由表、互联网网关
aws·入门·vpc·路由表·子网·互联网网关
bluetata8 小时前
【云计算网络安全】解析 Amazon 安全服务:构建纵深防御设计最佳实践
安全·web安全·云计算·aws·亚马逊云科技
学Linux的语莫15 小时前
Ansible使用简介和基础使用
linux·运维·服务器·nginx·云计算·ansible
运维&陈同学20 小时前
【zookeeper03】消息队列与微服务之zookeeper集群部署
linux·微服务·zookeeper·云原生·消息队列·云计算·java-zookeeper
云计算DevOps-韩老师21 小时前
【网络云计算】2024第47周-每日【2024/11/21】周考-实操题-RAID6实操解析2
网络·云计算
dessler1 天前
云计算&虚拟化-kvm-扩缩容cpu
linux·运维·云计算
学Linux的语莫1 天前
Ansible Playbook剧本用法
linux·服务器·云计算·ansible
cloud studio AI应用1 天前
腾讯云 AI 代码助手:产品研发过程的思考和方法论
人工智能·云计算·腾讯云