kubernetes场景基于limits内存百分比实现横向Pod自动扩缩容(HPA)

kubernetes场景基于limits内存百分比实现横向Pod自动扩缩容(HPA)

1. 目的

云原生Kubernetes不支持基于Pod实例limits的内存百分比触发HPA(Horizontal Pod Autoscaler,横向Pod自动扩缩容)。需借助第三方工具及插件,自定义基于Pod实例limits内存百分比的指标(自定义指标),结合技术逻辑与业务实际实现Pod实例资源的横向弹性伸缩,既能满足业务运行需求,又能提升资源利用率,避免资源浪费。

2. 解决方案

在Kubernetes集群中安装自定义指标API应用(Prometheus Adapter),将Prometheus采集到的指标映射为Kubernetes自定义指标,使Kubernetes可基于该自定义指标实现Pod的自动扩缩容(HPA)。

说明:Prometheus Adapter依赖Prometheus,部署前务必确认Kubernetes集群已安装Prometheus应用。

3. 安装Prometheus Adapter

提供两种安装方式,可根据集群是否能访问公网选择:方式1适用于可访问公网的集群,通过Helm工具安装,操作便捷;方式2适用于无法访问公网的集群,通过YAML文件直接部署。

3.1 安装方法1(Helm安装)

3.1.1 准备工作

在可连接目标Kubernetes集群的主机(建议选择已安装kubectl命令行工具的主机)上,执行以下命令安装Helm工具:

bash 复制代码
curl -fsSL -o get_helm.sh https://rancher-mirror.rancher.cn/helm/get-helm-3.sh && chmod 700 get_helm.sh && ./get_helm.sh

3.1.2 安装步骤

3.1.2.1 添加安装源

执行以下命令添加Prometheus社区的Helm仓库:

bash 复制代码
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

注:若集群未安装Prometheus,可执行以下命令安装Prometheus:

helm install prometheus prometheus-community/kube-prometheus-stack

3.1.2.2 自定义安装
  1. 创建自定义values.yaml文件,配置镜像源、命名空间及Prometheus地址,内容如下:
yaml 复制代码
image:
  repository: k8s.m.daocloud.io/prometheus-adapter/prometheus-adapter  # 镜像源
  tag: ""  # 若不设置,默认使用Chart.yaml中的appVersion
namespaceOverride: "kube-system"  # 替换为实际使用的命名空间
# Prometheus访问配置
prometheus:
  url: http://prometheus-k8s.kubesphere-monitoring-system.svc  # 替换为集群中Prometheus的实际地址
  port: 9090  # Prometheus默认端口
  path: ""
  1. 执行安装命令,通过自定义values.yaml配置安装Prometheus Adapter:
bash 复制代码
helm install prometheus-adapter prometheus-community/prometheus-adapter -f values.yaml
  1. 配置自定义指标:需修改名为prometheus-adapter的ConfigMap文件,添加limits_memory_utilization_percentage指标(用于计算Pod limits内存使用率),添加内容如下:
yaml 复制代码
- seriesQuery: 'container_memory_working_set_bytes{container!="POD",namespace!="",pod!=""}'
  resources:
    overrides:
      namespace: {resource: "namespace"}
      pod: {resource: "pod"}
  name:
    as: "limits_memory_utilization_percentage"
  metricsQuery: |
    (
      sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
      /
      sum(kube_pod_container_resource_limits{resource="memory",<<.LabelMatchers>>}) by (<<.GroupBy>>)
    ) * 100

3.2 安装方法2(YAML直接部署)

3.2.1 准备工作

执行以下命令下载Prometheus Adapter镜像:

bash 复制代码
docker pull k8s.m.daocloud.io/prometheus-adapter/prometheus-adapter:v0.12.0

3.2.2 部署应用

3.2.2.1 部署文件

创建prometheus-adapter.yaml部署文件,需重点确认镜像名称和Prometheus地址(以下标注需修改部分),完整内容如下:

yaml 复制代码
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-adapter
  labels:
    app: prometheus-adapter
    component: metrics
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-adapter
  template:
    metadata:
      labels:
        app: prometheus-adapter
    spec:
      serviceAccountName: prometheus-adapter
      containers:
      - name: prometheus-adapter
        image: k8s.m.daocloud.io/prometheus-adapter/prometheus-adapter:v0.12.0  # 镜像地址
        args:
        - --secure-port=6443
        - --logtostderr=true
        - --prometheus-url=http://prometheus-server.monitoring.svc:9090  # 替换为集群中Prometheus的实际地址
        - --metrics-relist-interval=1m
        - --v=4
        ports:
        - containerPort: 6443
          name: https
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
          requests:
            cpu: 50m
            memory: 64Mi
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-adapter
  labels:
    app: prometheus-adapter
spec:
  ports:
  - port: 443
    targetPort: https
  selector:
    app: prometheus-adapter
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-adapter
rules:
- apiGroups: ["custom.metrics.k8s.io"]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["external.metrics.k8s.io"]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["nodes", "pods", "services"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-adapter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-adapter
subjects:
- kind: ServiceAccount
  name: prometheus-adapter
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-adapter
---
apiVersion: v1
data:
  config.yaml: |
    rules:
    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
      seriesFilters: []
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: ^container_(.*)_seconds_total$
        as: ""
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[5m])) by (<<.GroupBy>>)
    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
      seriesFilters:
      - isNot: ^container_.*_seconds_total$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: ^container_(.*)_total$
        as: ""
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[5m])) by (<<.GroupBy>>)
    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
      seriesFilters:
      - isNot: ^container_.*_total$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: ^container_(.*)$
        as: ""
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container!="POD"}) by (<<.GroupBy>>)
    - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
      seriesFilters:
      - isNot: .*_total$
      resources:
        template: <<.Resource>>
      name:
        matches: ""
        as: ""
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
    - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
      seriesFilters:
      - isNot: .*_seconds_total
      resources:
        template: <<.Resource>>
      name:
        matches: ^(.*)_total$
        as: ""
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
    - seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
      seriesFilters: []
      resources:
        template: <<.Resource>>
      name:
        matches: ^(.*)_seconds_total$
        as: ""
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
    # 自定义指标:limits内存使用率
    - metricsQuery: |
        (
          sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
          /
          sum(kube_pod_container_resource_limits{resource="memory",<<.LabelMatchers>>}) by (<<.GroupBy>>)
        ) * 100
      name:
        as: limits_memory_utilization_percentage
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      seriesQuery: container_memory_working_set_bytes{container!="POD",namespace!="",pod!=""}
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/name: prometheus-adapter
    app.kubernetes.io/version: v0.12.0
  name: prometheus-adapter
3.2.2.2 部署执行

执行以下命令部署Prometheus Adapter,将命名空间替换为实际使用的命名空间(建议部署在monitoring命名空间):

bash 复制代码
kubectl apply -f prometheus-adapter.yaml -n monitoring

4. 验证方案

通过部署测试应用、创建HPA规则、执行扩容测试,验证自定义指标及HPA功能是否正常生效。

4.1 创建验证应用

创建test-hpa.yml文件,部署测试用test-hpa应用(配置内存limits,用于模拟内存压力),内容如下:

yaml 复制代码
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test-hpa
  name: test-hpa
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-hpa
  template:
    metadata:
      labels:
        app: test-hpa
    spec:
      containers:
      - args:
        - tail -f /etc/hosts
        command:
        - /bin/sh
        - -c
        image: quay.io/jitesoft/ubuntu:latest
        imagePullPolicy: IfNotPresent
        name: test
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 16M  # 内存limits,用于计算使用率
          requests:
            cpu: 100m
            memory: 4M

执行部署命令:

bash 复制代码
kubectl apply -f test-hpa.yml

4.2 创建HPA规则

创建hpa.yml文件,配置基于自定义指标limits_memory_utilization_percentage的HPA规则,内容如下:

yaml 复制代码
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: test-hpa
  namespace: default
spec:
  maxReplicas: 3  # 最大Pod副本数
  metrics:
  - pods:
      metric:
        name: limits_memory_utilization_percentage  # 自定义指标名(与前面配置一致)
      target:
        averageValue: "60"  # 触发扩容阈值:limits内存使用率达到60%
        type: AverageValue
    type: Pods
  minReplicas: 1  # 最小Pod副本数
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-hpa  # 关联的Deployment名称(与测试应用一致)

执行部署命令:

bash 复制代码
kubectl apply -f hpa.yml

4.3 确认HPA规则

执行以下命令,查看HPA规则状态,若无报错、能正常识别自定义指标,说明规则已生效:

bash 复制代码
kubectl describe hpa test-hpa

4.4 HPA扩容测试

登录到pod容器,通过以下命令给测试Pod增加内存压力,模拟业务高负载场景:

bash 复制代码
dd if=/dev/zero of=/dev/null bs=10M count=50000  # 增加Pod内存压力

观察现象:Pod内存使用率会快速上升,当达到60%阈值时,HPA会自动扩容Pod副本数至最大3个;当停止内存压力、内存使用率下降后,HPA会自动缩容至最小1个副本,验证HPA弹性扩缩容功能正常。

相关推荐
秦渝兴2 小时前
从手工高可用到全容器化:我的 Keepalived+Nginx+Tomcat+MySQL 项目迁移实战
linux·运维·mysql·nginx·容器·tomcat
sugar15692 小时前
Trae快速构建自己项目的docker镜像
docker·容器·trae
AI精钢3 小时前
Sora死了
人工智能·云原生·aigc
一叶飘零_sweeeet3 小时前
从 1.5G 到 98M:Java 云原生容器化与 Docker 镜像优化全链路实战
docker·云原生
泡沫·3 小时前
docker的基本认识
运维·docker·容器
H_老邪3 小时前
Docker 学习之路-从入门到放弃-Jenkins:4
容器·jenkins
sugar15693 小时前
Trae 添加项目规则,快速完成crmeb项目本地开发环境搭建
docker·容器·trae
H_老邪3 小时前
Docker 学习之路-从入门到放弃:3
学习·docker·容器
F1FJJ3 小时前
Shield CLI v0.3.3 新增 PostgreSQL 插件:浏览器里管理 PG 数据库
网络·网络协议·docker·postgresql·容器·go