kubernetes场景基于limits内存百分比实现横向Pod自动扩缩容(HPA)
1. 目的
云原生Kubernetes不支持基于Pod实例limits的内存百分比触发HPA(Horizontal Pod Autoscaler,横向Pod自动扩缩容)。需借助第三方工具及插件,自定义基于Pod实例limits内存百分比的指标(自定义指标),结合技术逻辑与业务实际实现Pod实例资源的横向弹性伸缩,既能满足业务运行需求,又能提升资源利用率,避免资源浪费。
2. 解决方案
在Kubernetes集群中安装自定义指标API应用(Prometheus Adapter),将Prometheus采集到的指标映射为Kubernetes自定义指标,使Kubernetes可基于该自定义指标实现Pod的自动扩缩容(HPA)。
说明:Prometheus Adapter依赖Prometheus,部署前务必确认Kubernetes集群已安装Prometheus应用。
3. 安装Prometheus Adapter
提供两种安装方式,可根据集群是否能访问公网选择:方式1适用于可访问公网的集群,通过Helm工具安装,操作便捷;方式2适用于无法访问公网的集群,通过YAML文件直接部署。
3.1 安装方法1(Helm安装)
3.1.1 准备工作
在可连接目标Kubernetes集群的主机(建议选择已安装kubectl命令行工具的主机)上,执行以下命令安装Helm工具:
bash
curl -fsSL -o get_helm.sh https://rancher-mirror.rancher.cn/helm/get-helm-3.sh && chmod 700 get_helm.sh && ./get_helm.sh
3.1.2 安装步骤
3.1.2.1 添加安装源
执行以下命令添加Prometheus社区的Helm仓库:
bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
注:若集群未安装Prometheus,可执行以下命令安装Prometheus:
helm install prometheus prometheus-community/kube-prometheus-stack
3.1.2.2 自定义安装
- 创建自定义values.yaml文件,配置镜像源、命名空间及Prometheus地址,内容如下:
yaml
image:
repository: k8s.m.daocloud.io/prometheus-adapter/prometheus-adapter # 镜像源
tag: "" # 若不设置,默认使用Chart.yaml中的appVersion
namespaceOverride: "kube-system" # 替换为实际使用的命名空间
# Prometheus访问配置
prometheus:
url: http://prometheus-k8s.kubesphere-monitoring-system.svc # 替换为集群中Prometheus的实际地址
port: 9090 # Prometheus默认端口
path: ""
- 执行安装命令,通过自定义values.yaml配置安装Prometheus Adapter:
bash
helm install prometheus-adapter prometheus-community/prometheus-adapter -f values.yaml
- 配置自定义指标:需修改名为prometheus-adapter的ConfigMap文件,添加limits_memory_utilization_percentage指标(用于计算Pod limits内存使用率),添加内容如下:
yaml
- seriesQuery: 'container_memory_working_set_bytes{container!="POD",namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "limits_memory_utilization_percentage"
metricsQuery: |
(
sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
/
sum(kube_pod_container_resource_limits{resource="memory",<<.LabelMatchers>>}) by (<<.GroupBy>>)
) * 100
3.2 安装方法2(YAML直接部署)
3.2.1 准备工作
执行以下命令下载Prometheus Adapter镜像:
bash
docker pull k8s.m.daocloud.io/prometheus-adapter/prometheus-adapter:v0.12.0
3.2.2 部署应用
3.2.2.1 部署文件
创建prometheus-adapter.yaml部署文件,需重点确认镜像名称和Prometheus地址(以下标注需修改部分),完整内容如下:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter
labels:
app: prometheus-adapter
component: metrics
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-adapter
template:
metadata:
labels:
app: prometheus-adapter
spec:
serviceAccountName: prometheus-adapter
containers:
- name: prometheus-adapter
image: k8s.m.daocloud.io/prometheus-adapter/prometheus-adapter:v0.12.0 # 镜像地址
args:
- --secure-port=6443
- --logtostderr=true
- --prometheus-url=http://prometheus-server.monitoring.svc:9090 # 替换为集群中Prometheus的实际地址
- --metrics-relist-interval=1m
- --v=4
ports:
- containerPort: 6443
name: https
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-adapter
labels:
app: prometheus-adapter
spec:
ports:
- port: 443
targetPort: https
selector:
app: prometheus-adapter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-adapter
rules:
- apiGroups: ["custom.metrics.k8s.io"]
resources: ["*"]
verbs: ["*"]
- apiGroups: ["external.metrics.k8s.io"]
resources: ["*"]
verbs: ["*"]
- apiGroups: [""]
resources: ["nodes", "pods", "services"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-adapter
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-adapter
subjects:
- kind: ServiceAccount
name: prometheus-adapter
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-adapter
---
apiVersion: v1
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: ^container_(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[5m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
seriesFilters:
- isNot: ^container_.*_seconds_total$
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: ^container_(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[5m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
seriesFilters:
- isNot: ^container_.*_total$
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: ^container_(.*)$
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container!="POD"}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_total$
resources:
template: <<.Resource>>
name:
matches: ""
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_seconds_total
resources:
template: <<.Resource>>
name:
matches: ^(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters: []
resources:
template: <<.Resource>>
name:
matches: ^(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
# 自定义指标:limits内存使用率
- metricsQuery: |
(
sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
/
sum(kube_pod_container_resource_limits{resource="memory",<<.LabelMatchers>>}) by (<<.GroupBy>>)
) * 100
name:
as: limits_memory_utilization_percentage
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
seriesQuery: container_memory_working_set_bytes{container!="POD",namespace!="",pod!=""}
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: metrics
app.kubernetes.io/name: prometheus-adapter
app.kubernetes.io/version: v0.12.0
name: prometheus-adapter
3.2.2.2 部署执行
执行以下命令部署Prometheus Adapter,将命名空间替换为实际使用的命名空间(建议部署在monitoring命名空间):
bash
kubectl apply -f prometheus-adapter.yaml -n monitoring
4. 验证方案
通过部署测试应用、创建HPA规则、执行扩容测试,验证自定义指标及HPA功能是否正常生效。
4.1 创建验证应用
创建test-hpa.yml文件,部署测试用test-hpa应用(配置内存limits,用于模拟内存压力),内容如下:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test-hpa
name: test-hpa
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: test-hpa
template:
metadata:
labels:
app: test-hpa
spec:
containers:
- args:
- tail -f /etc/hosts
command:
- /bin/sh
- -c
image: quay.io/jitesoft/ubuntu:latest
imagePullPolicy: IfNotPresent
name: test
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 100m
memory: 16M # 内存limits,用于计算使用率
requests:
cpu: 100m
memory: 4M
执行部署命令:
bash
kubectl apply -f test-hpa.yml
4.2 创建HPA规则
创建hpa.yml文件,配置基于自定义指标limits_memory_utilization_percentage的HPA规则,内容如下:
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: test-hpa
namespace: default
spec:
maxReplicas: 3 # 最大Pod副本数
metrics:
- pods:
metric:
name: limits_memory_utilization_percentage # 自定义指标名(与前面配置一致)
target:
averageValue: "60" # 触发扩容阈值:limits内存使用率达到60%
type: AverageValue
type: Pods
minReplicas: 1 # 最小Pod副本数
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-hpa # 关联的Deployment名称(与测试应用一致)
执行部署命令:
bash
kubectl apply -f hpa.yml
4.3 确认HPA规则
执行以下命令,查看HPA规则状态,若无报错、能正常识别自定义指标,说明规则已生效:
bash
kubectl describe hpa test-hpa
4.4 HPA扩容测试
登录到pod容器,通过以下命令给测试Pod增加内存压力,模拟业务高负载场景:
bash
dd if=/dev/zero of=/dev/null bs=10M count=50000 # 增加Pod内存压力
观察现象:Pod内存使用率会快速上升,当达到60%阈值时,HPA会自动扩容Pod副本数至最大3个;当停止内存压力、内存使用率下降后,HPA会自动缩容至最小1个副本,验证HPA弹性扩缩容功能正常。