本文介绍使用非页面控制台的方式,快速的在AWS海外区创建AWS EKS资源的流程。
在操作前我们可以先了解下AWS提供的几种主要的自动化资源定义/创建方式:
名称 | 文档 | 简介 |
---|---|---|
CloudFormation | AWS CloudFormation Doc | 基于基础设施即代码概念打造的云端资源定义服务,目的是大幅度简化基础设施管理。我们可以把AWS的云上资源用YAML或者JSON的方式定义好并方便的执行预制逻辑。CloudFormation可以轻松的和CICD系统整合,完成AWS资源的设计和资源的管理 |
CDK | AWS Cloud Development Kit (AWS CDK) v2 Doc | 面向运维/软件开发人员推出的开发工具包,和CloudFormation使用比较僵化的YAML或JSON定义AWS资源不同,CDK主要是通过编程语言(Python、JS、Java等)定义AWS资源,带来更高的灵活性和逻辑控制力,减少了复杂场景下CloudFormation内容编写麻烦和内容冗长的问题。原理是先用编程语言描述耗资源创建逻辑,然后翻译为CloudFormation并提交给AWS进行资源创建 |
AWS Cli v2 | AWS Cli v2 Doc | AWS 通用的命令行工具(工作逻辑类似Linux shell),通过它我们可以使用命令行控制AWS并利用执行脚本来自动执行复杂任务。AWS Cli会通过AWS API调用的方式来执行控制逻辑。 |
AWS eksctl | eksctl.io | AWS EKS专用的命令行工具,用于在 Amazon EKS 上创建和管理 Kubernetes 集群 |
总结:
- CloudFormation和CDK都是基于基础设施即代码的理念进行设计,侧重在资源定义/创建的代码化版本化管理和沿袭
- AWS Cli v2侧重在以命令行(通过API请求)的方式来控制AWS,而不是传统的web页面点击的低效、繁重、易出错的交互方式
- AWS eksctl是专用于AWS EKS控制的命令行工具,进一步简化了AWS Cli的控制逻辑
- 这几种方式可以自由的组合搭配并和CICD系统整合
这几种方式都可以帮助我们创建AWS ESK资源,其中CloudFormation、CDK、Cli是通用的、可以适用于绝大多数AWS服务,eksctl专用于AWS EKS服务
本次操作位于美国弗吉尼亚区(us-east-1),其中涉及到如下的子部分:
- 安装AWS EKS
- 为AWS EKS设置节点组(nodegroup)
- 为AWS EKS设置节点组自动水平扩展
- 通过karpenter实现节点组自动水平扩展
- 实现pod的HPA
- 为AWS EKS设置节点组自动水平扩展
- 为AWS EKS设置Fargate
和 AWS EKS安装[AWS 中国宁夏区] 相比,逻辑和流程基本一样,只是改为了非页面控制台的方式
安装AWS EKS
IAM权限准备
Cluster IAM role设置
首先通过Amazon EKS cluster IAM role来创建role,Cluster IAM role将提供给EKS集群控制面使用。
将如下的trust-policy内容保存为cluster-trust-policy.json
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
创建AWS role名称为AmazonEKSClusterRole
并记住role arn,一般的格式为arn:aws:iam::<aws-account>:role/AmazonEKSClusterRole
bash
aws iam create-role --role-name AmazonEKSClusterRole --assume-role-policy-document file://"cluster-trust-policy.json"
AmazonEKSClusterRole
role arn会在创建后输出,如下图:
在AWS网页控制台的IAM服务中也可以检索到
给AmazonEKSClusterRole
关联AmazonEKSClusterPolicy
策略
bash
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy --role-name AmazonEKSClusterRole
Node IAM role设置
将如下的trust-policy内容保存为node-role-trust-relationship.json
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
创建AWS role名称为AmazonEKSNodeRole
并记住role arn一般的格式为arn:aws:iam::<aws-account>:role/AmazonEKSNodeRole
bash
aws iam create-role --role-name AmazonEKSNodeRole --assume-role-policy-document file://"node-role-trust-relationship.json"
给AmazonEKSNodeRole role关联AmazonEKSWorkerNodePolicy
、AmazonEC2ContainerRegistryReadOnly
、AmazonEKS_CNI_Policy
策略
bash
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy --role-name AmazonEKSNodeRole
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly --role-name AmazonEKSNodeRole
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy --role-name AmazonEKSNodeRole
创建AWS EKS集群和nodegroup
创建EKS集群,参考aws cli create-cluster
请结合你自己的账号的AmazonEKSClusterRole
,VPC内的子网(至少2个不同可用区的子网)和安全组等信息来具体设置
bash
aws eks create-cluster --region us-east-1 --name poc-cluster --kubernetes-version 1.27 --role-arn arn:aws:iam::84xxx141269:role/AmazonEKSClusterRole --resources-vpc-config subnetIds=subnet-0a13bxxx02af606,subnet-0685xxx285a7a88a1,subnet-0346exxxe62e51ca,securityGroupIds=sg-0fcee87ea044f19ae,endpointPublicAccess=true,endpointPrivateAccess=true
执行命令进行创建
命令执行完了,我们会在控制台上看到EKS集群正在创建中....
等待EKS集群创建完成后,我们接着执行下面的语句,为EKS集群添加必须的基础插件 ,参考aws cli create-addon
bash
# 第一个插件 vpc-cni
aws eks create-addon --region us-east-1 --cluster-name poc-cluster --addon-name vpc-cni
# 第二个插件 coredns
aws eks create-addon --region us-east-1 --cluster-name poc-cluster --addon-name coredns
# 第三个插件 kube-proxy
aws eks create-addon --region us-east-1 --cluster-name poc-cluster --addon-name kube-proxy
执行命令添加插件
插件安装成功后的效果
等待EKS集群创建完成后,我们接着执行下面的语句,为EKS集群添加节点组 ,参考aws cli create-nodegroup
请结合你自己的账号的AmazonEKSNodeRole
,VPC内的子网(至少2个不同可用区的子网)和EC2配置等信息来具体设置
bash
aws eks create-nodegroup --region us-east-1 --cluster-name poc-cluster --nodegroup-name od-arm64 --subnets subnet-0a13bxxx02af606 subnet-0685xxx285a7a88a1 subnet-0346exxxe62e51ca --instance-types t4g.medium --disk-size 32 --node-role arn:aws:iam::84xxx141269:role/AmazonEKSNodeRole --ami-type AL2_ARM_64 --capacity-type ON_DEMAND --scaling-config minSize=1,maxSize=1,desiredSize=1
执行命令来创建节点组
执行完后我们需要等待执行成功
到这里EKS集群就创建好了,我们可以发现命令行创建EKS集群,操作更加简单 。此外eksctl.io也可以帮助我们快速创建EKS集群,具体可以自己研究。
安装karpenter
Karpenter是一个为Kubernetes构建的开源自动扩缩容项目。它提高了Kubernetes应用程序的可用性,而无需手动或过度配置计算资源。 Karpenter旨在通过观察不可调度的Pod的聚合资源请求并做出启动和终止节点的决策,以最大限度地减少调度延迟,从而在几秒钟内(而不是几分钟)提供合适的计算资源来满足您的应用程序的需求。
karpenter可以替代传统的cluster-autoscaler
我们可以参考karpenter进行安装,选择Migrating from Cluster Autoscaler安装方式,为现有的EKS集群安装karpenter
设置OIDC
在EKS里面找到OpenID Connect provider URL
,复制它
然后在AWS IAM控制台,进行添加
填入复制的OpenID Connect provider URL
并在Audience里面填充sts.amazonaws.com
安装karpenter
安装前设置环境变量
bash
# 集群名称
CLUSTER_NAME=poc-cluster
# 海外区域填aws,国内填aws-cn
AWS_PARTITION="aws"
# 获取AWS region
AWS_REGION="$(aws configure list | grep region | tr -s " " | cut -d" " -f3)"
# 获取OIDC ENDPOINT
OIDC_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text)"
# 获取aws account id
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
建议在bash环境执行(windows可以安装mingw)
为karpenter将要管理的节点创建role
bash
echo '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}' > node-trust-policy.json
aws iam create-role --role-name "KarpenterNodeRole-${CLUSTER_NAME}" --assume-role-policy-document file://node-trust-policy.json
效果如下:
给创建的role附加policy:
bash
aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKS_CNI_Policy
aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" --policy-arn arn:${AWS_PARTITION}:iam::aws:policy/AmazonSSMManagedInstanceCore
效果如下:
将role附加到EC2 instance profile
bash
aws iam create-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-${CLUSTER_NAME}"
aws iam add-role-to-instance-profile --instance-profile-name "KarpenterNodeInstanceProfile-${CLUSTER_NAME}" --role-name "KarpenterNodeRole-${CLUSTER_NAME}"
效果如下:
为karpenter controller创建role
bash
cat << EOF > controller-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_ENDPOINT#*//}:aud": "sts.amazonaws.com",
"${OIDC_ENDPOINT#*//}:sub": "system:serviceaccount:karpenter:karpenter"
}
}
}
]
}
EOF
aws iam create-role --role-name KarpenterControllerRole-${CLUSTER_NAME} --assume-role-policy-document file://controller-trust-policy.json
效果如下:
为karpenter controller的role附加策略
bash
cat << EOF > controller-policy.json
{
"Statement": [
{
"Action": [
"ssm:GetParameter",
"ec2:DescribeImages",
"ec2:RunInstances",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:DescribeSpotPriceHistory",
"pricing:GetProducts"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "Karpenter"
},
{
"Action": "ec2:TerminateInstances",
"Condition": {
"StringLike": {
"ec2:ResourceTag/karpenter.sh/provisioner-name": "*"
}
},
"Effect": "Allow",
"Resource": "*",
"Sid": "ConditionalEC2Termination"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}",
"Sid": "PassNodeIAMRole"
},
{
"Effect": "Allow",
"Action": "eks:DescribeCluster",
"Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER_NAME}",
"Sid": "EKSClusterEndpointLookup"
}
],
"Version": "2012-10-17"
}
EOF
aws iam put-role-policy --role-name KarpenterControllerRole-${CLUSTER_NAME} --policy-name KarpenterControllerPolicy-${CLUSTER_NAME} --policy-document file://controller-policy.json
效果如下:
为karpenter管理的node所在的子网和安全组添加tag
bash
# 为现有的nodegroup所在的子网打标签
for NODEGROUP in $(aws eks list-nodegroups --cluster-name ${CLUSTER_NAME} \
--query 'nodegroups' --output text); do aws ec2 create-tags \
--tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}" \
--resources $(aws eks describe-nodegroup --cluster-name ${CLUSTER_NAME} \
--nodegroup-name $NODEGROUP --query 'nodegroup.subnets' --output text )
done
# 查询nodegroup
NODEGROUP=$(aws eks list-nodegroups --cluster-name ${CLUSTER_NAME} \
--query 'nodegroups[0]' --output text)
# 查询aws ec2 launch template
LAUNCH_TEMPLATE=$(aws eks describe-nodegroup --cluster-name ${CLUSTER_NAME} \
--nodegroup-name ${NODEGROUP} --query 'nodegroup.launchTemplate.{id:id,version:version}' \
--output text | tr -s "\t" ",")
# 查询security group
SECURITY_GROUPS=$(aws eks describe-cluster \
--name ${CLUSTER_NAME} --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text)
# 为安全组打tag
aws ec2 create-tags \
--tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}" \
--resources ${SECURITY_GROUPS}
效果如下:
编辑aws-auth
bash
kubectl edit configmap aws-auth -n kube-system
安装下列模板补充进去,其中AWS_PARTITION
、AWS_ACCOUNT_ID
、CLUSTER_NAME
是变量,填你自己的实际的值
ruby
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}
username: system:node:{{EC2PrivateDNSName}}
这里注意要将AWS_PARTITION
、AWS_ACCOUNT_ID
、CLUSTER_NAME
替换为你账户实际的值!!!,否则后续启动的node无权限加入EKS.
设置karpenter版本为v0.30.0
,一般选择最新的就好.
bash
export KARPENTER_VERSION=v0.30.0
用helm创建Karpenter deployment yaml模板
bash
helm template karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter \
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
--set settings.aws.clusterName=${CLUSTER_NAME} \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi > karpenter.yaml
修改下载下来的karpenter.yaml
,补充我红色方框的内容,并且注意将value改为你的EKS的nodegroup名称。
正式部署karpenter
bash
kubectl create namespace karpenter
kubectl create -f https://raw.githubusercontent.com/aws/karpenter/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_provisioners.yaml
kubectl create -f https://raw.githubusercontent.com/aws/karpenter/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.k8s.aws_awsnodetemplates.yaml
kubectl create -f https://raw.githubusercontent.com/aws/karpenter/${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_machines.yaml
kubectl apply -f karpenter.yaml
创建一个默认的karpenter provisioner
bash
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: [c, m, r]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
providerRef:
name: default
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelector:
karpenter.sh/discovery: "${CLUSTER_NAME}"
EOF
到这里我们按照完成了,可以执行下面的命令查看日志,一般没有错误信息就行
ini
kubectl logs -f -n karpenter -c controller -l app.kubernetes.io/name=karpenter
测试karpenter
我们可以部署下面的nginx Deployment来触发karpenter对EC2节点的自动伸缩
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: eks-sample-linux-deployment
namespace: default
labels:
app: eks-sample-linux-app
spec:
replicas: 100
selector:
matchLabels:
app: eks-sample-linux-app
template:
metadata:
labels:
app: eks-sample-linux-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
containers:
- name: nginx
image: public.ecr.aws/nginx/nginx:1.23
ports:
- name: http
containerPort: 80
imagePullPolicy: IfNotPresent
nodeSelector:
kubernetes.io/os: linux
效果如下,karpenter马上申请了了一个EC2满足大量的nginx的资源需求
当我们移出这个deployment后,karpenter会自动的在一端实际后销毁EC2. 这里我们模拟了实现了EKS根据压力或者资源需求自动扩展/收索EC2资源的功能,结合HPA可以做到了pod和EC2的弹性.
HPA在AWS EKS上的按照可以参考AWS EKS安装[AWS 中国宁夏区]
设置fargate
将下面的内容保存为pod-execution-role-trust-policy.json
,并将region-code
、aws-account
、cluster-name
替换为你自己实际的值
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:eks:<region-code>:<aws-account>:fargateprofile/<cluster-name>/*"
}
},
"Principal": {
"Service": "eks-fargate-pods.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
创建role名称为AmazonEKSFargatePodExecutionRole,并为这个role附加策略AmazonEKSFargatePodExecutionRolePolicy
arduino
aws iam create-role --role-name AmazonEKSFargatePodExecutionRole --assume-role-policy-document file://"pod-execution-role-trust-policy.json"
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy --role-name AmazonEKSFargatePodExecutionRole
效果如下:
参考aws cli create-fargate-profile口进行fargate profile创建
里面的值请自行修改
arduino
aws eks create-fargate-profile --fargate-profile-name default-profile --cluster-name poc-cluster --pod-execution-role-arn arn:aws:iam::843xxxx269:role/AmazonEKSFargatePodExecutionRole --subnets subnet-0685xxx5a7a88a1 subnet-0a13xxx902af606 subnet-0346xxxx62e51ca --selectors namespace=default-profile,labels={infrastructure=fargate}
创建成功