k8s自动弹性伸缩之HPA实践

Kubernetes HPA（Horizontal Pod Autoscaler） 是 Kubernetes 提供的自动弹性伸缩机制，用于根据实时负载动态调整 Pod 副本数量，以应对流量波动并优化资源利用率。HPA 可以根据 CPU 利用率自动伸缩一个 Replication Controller、 Deployment 或者Replica Set 、statefulset中的 Pod 数量。

HPA是基于kube-controll-manager服务，周期性的检测Pod的CPU使用率，默认30s检测一次

HPA和副本控制器replication controller以及deployment controller，都属于K8S的资源对象。通过跟踪分析副本控制器和deployment的Pod的负载变化，针对性的调整目标Pod的副本数。

阀值：正常情况下，Pod的副本数，以及达到阀值后，Pod的扩容最大数量

ini 复制代码

Kubernetes采用request和limit两种限制类型来对资源进行分配。

   request(资源需求)：即运行Pod的节点必须满足运行Pod的最基本需求才能运行Pod。
   limit(资源限额)：即运行Pod期间，可能内存使用量会增加，那最多能使用多少内存，这就是资源限额。

资源类型:
        CPU 的单位是核心数，内存的单位是字节。
          一个容器申请0.5个CPU，就相当于申请1个CPU的一半，你也可以加个后缀m 表示千分之一的概念。比如说100m的CPU，100豪的CPU和0.1个CPU都是一样的。

        内存单位：
                K、M、G、T、P、E             #通常是以1000为换算标准的。
                Ki、Mi、Gi、Ti、Pi、Ei        #通常是以1024为换算标准的。

一、HPA 工作原理

指标采集：
- Metrics Server：默认采集 Pod 的 CPU 和内存使用率（需预先部署）。
- 自定义指标：通过 Prometheus、Kubernetes Custom Metrics API 等扩展，支持 QPS、并发连接数等业务指标。
决策逻辑：
- 周期性（默认15秒）检查目标资源（如 Deployment）的指标数据。
- 根据当前指标值与目标阈值的比例，计算期望的 Pod 副本数。
- 计算公式：
  
  复制
  ini 复制代码
```
期望副本数 = ceil(当前副本数 × (当前指标值 / 目标指标值))
```
- 最终副本数受 minReplicas 和 maxReplicas 限制。
扩缩容触发：
- 扩容：当指标超过目标阈值时增加 Pod 副本。
- 缩容：当指标低于目标阈值时减少 Pod 副本（默认有5分钟冷却时间防止抖动）。

二、HPA 配置步骤

1. 部署 Metrics Server

Metrics Server 是 Kubernetes 内置的高效的容器资源指标来源。Metrics Server 可以帮助我们监控集群中节点和容器的资源使用情况。

bash 复制代码

# 安装 Metrics Server（Kubernetes 1.20+）
[root@master ~]# mkdir hpa
[root@master hpa]# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.1/components.yaml
[root@master hpa]# vim components.yaml #修改配置文件
1、修改镜像：
registry.k8s.io/metrics-server/metrics-server:v0.7.1
修改为
registry.aliyuncs.com/google_containers/metrics-server:v0.7.1
2.禁用证书验证，在134行左右添加
spec:
      containers:
      - args:
        - --kubelet-insecure-tls  #禁用证书验证
        - --cert-dir=/tmp
        - --secure-port=10250
        
[root@master hpa]# kubectl apply -f components.yaml

# 验证安装
[root@master hpa]# kubectl get pod -A | grep metrics
kube-system  metrics-server-b59d5fcc6-jl9jh   1/1     Running   0      55s

[root@master hpa]# kubectl top nodes
NAME     CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)   
master   173m         8%       1281Mi          70%         
node01   57m          2%       875Mi           47%         
node02   53m          2%       862Mi           47%         
node03   65m          3%       826Mi           45%

2. 创建 Deployment 并设置资源请求

bash 复制代码

[root@master hpa]# cat test-dep.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          requests:   # 必须设置 requests 供 HPA 计算使用率
            cpu: 100m #至少node节点需要有100毫核的cpu
            memory: 128Mi #至少node节点需要有128M的内存使用
            
[root@master hpa]# kubectl apply -f test-hpa-dep.yml           
[root@master hpa]# kubectl get pod 
NAME                               READY   STATUS    RESTARTS      AGE
nginx-deployment-fdfd96544-fz6z4   1/1     Running   0             33s
nginx-deployment-fdfd96544-w9pv4   1/1     Running   0             33s

定义service
[root@master hpa]# cat nginx-svc.yml 
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    nodePort: 30018
  selector:
    app: nginx

[root@master hpa]# kubectl apply -f nginx-svc.yml 
service/nginx-svc created
[root@master hpa]# kubectl get svc 
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP        23h
nginx-svc    NodePort    10.103.190.174   <none>        80:30018/TCP   3s

3. 创建 HPA 资源

核心字段详解

1. `scaleTargetRef`

作用：指定 HPA 控制的目标对象。
必填字段 ：
- apiVersion：目标对象的 API 版本（如 apps/v1）。
- kind：目标对象类型（如 Deployment、StatefulSet）。
- name：目标对象的名称。

2. `minReplicas` 和 `maxReplicas`

作用：限制 Pod 副本数的范围。
默认值 ：minReplicas 默认为 1。
注意事项 ：
- maxReplicas 必须大于等于 minReplicas。
- 避免设置过大的 maxReplicas，防止资源耗尽。

3. `metrics`

作用：定义触发扩缩容的指标及其目标值。

支持的指标类型：

类型	说明
`Resource`	基于 Pod 资源使用率（如 CPU、内存）。
`Pods`	基于 Pod 自定义指标（如每秒 HTTP 请求数）。
`Object`	基于 Kubernetes 对象（如 Ingress）的指标（如请求延迟）。
`External`	基于集群外部系统的指标（如消息队列长度）。

字段解析：
- type ：指标类型（如 Resource）。
- resource.name ：资源名称（cpu 或 memory）。
- target ：定义指标的目标值计算方式：
  - type ：
    - Utilization：百分比形式（如 CPU 使用率 50%）。
    - Value：直接设定目标值（如 100m CPU）。
    - AverageValue：所有 Pod 指标的平均值（如平均每秒请求数）。

target其它字段

ini 复制代码

[root@master hpa]# kubectl explain hpa.spec.metrics.resource.target
GROUP:      autoscaling
KIND:       HorizontalPodAutoscaler
VERSION:    v2

FIELD: target <MetricTarget>


DESCRIPTION:
    target specifies the target value for the given metric
    MetricTarget defines the target value, average value, or average utilization
    of a specific metric
    
FIELDS:
  averageUtilization    <integer>  #定义所有pod的cpu使用率的平均值
    averageUtilization is the target value of the average of the resource metric
    across all relevant pods, represented as a percentage of the requested value
    of the resource for the pods. Currently only valid for Resource metric
    source type

  averageValue  <Quantity>
    averageValue is the target value of the average of the metric across all
    relevant pods (as a quantity)

  type  <string> -required-
    type represents whether the metric type is Utilization, Value, or
    AverageValue

  value <Quantity>
    value is the target value of the metric (as a quantity).

案例

bash 复制代码

[root@master hpa]# cat hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa  #hpa资源的名字
spec:
  scaleTargetRef:  #指定要伸缩的目标资源
    apiVersion: apps/v1
    kind: Deployment #指定伸缩的资源类型，支持deployment、statefulset
    name: nginx-deployment #要伸缩的资源名字
  minReplicas: 2 #缩容时最小 Pod 副本数 默认1
  maxReplicas: 10 #扩容时最大 Pod 副本数（必须 >= minReplicas
  metrics: # 扩缩依据的指标列表
  - type: Resource #指标类型（Resource/Pods/Object/External）
    resource:  #定义资源
      name: cpu #资源名称(cpu/memory)
      target: #目标值设定
        type: Utilization #设置为百分比
        averageUtilization: 30  # 目标 CPU 平均使用率 30%，达到这个值开始扩容，低于这个值开始缩容
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50  # 目标内存使用率 50%

[root@master hpa]# kubectl apply -f hpa.yml 
horizontalpodautoscaler.autoscaling/nginx-hpa created

4. 查看 HPA 状态

bash 复制代码

[root@master hpa]# kubectl get hpa
NAME        REFERENCE                     TARGETS                       MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa   Deployment/nginx-deployment   cpu: 0%/50%, memory: 2%/70%   2         10        2          81s

5.测试与验证

ini 复制代码

任意个节点安装httpd-tools使用ab工具进行压力测试
[root@master hpa]# yum install -y httpd-tools
通过访问nodeport进行压力测试
[root@master hpa]# ab -c 1000 -n 10000000 http://192.168.209.134:30018/

在开一个master的终端
[root@master hpa]# watch kubectl get hpa,deployment,pods
Every 2.0s: kubectl get hpa,deployment,pods  master: Thu Mar 20 16:09:10 2025

NAME                                            REFERENCE                     TARGETS
        MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/nginx-hpa   Deployment/nginx-deployment   cpu: 74%/30%, memory: 2%/50%   2         10        10         41m

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-deployment   10/10   10           10          68m

NAME                                    READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-6d758bfc44-4zpjh   1/1     Running   0          106s
pod/nginx-deployment-6d758bfc44-9pbt9   1/1     Running   0          2m2s
pod/nginx-deployment-6d758bfc44-d2hkc   1/1     Running   0          2m18s
pod/nginx-deployment-6d758bfc44-gg2sw   1/1     Running   0          68m
pod/nginx-deployment-6d758bfc44-j7nnq   1/1     Running   0          2m2s
pod/nginx-deployment-6d758bfc44-j9j7w   1/1     Running   0          68m
pod/nginx-deployment-6d758bfc44-jjbvk   1/1     Running   0          2m2s
pod/nginx-deployment-6d758bfc44-s4lls   1/1     Running   0          106s
pod/nginx-deployment-6d758bfc44-t6656   1/1     Running   0          2m18s
pod/nginx-deployment-6d758bfc44-xmcvd   1/1     Running   0          2m2

在压力测试结束之后需要等待5分钟然后在查看pod，此时会恢复为2个。因为此时cpu使用率已经低于阈值30%，默认恢复时的冷却时间为5分钟，防止直接缩容造成业务不稳定(抖动)。

三、HPA 高级配置

1. 基于自定义指标扩缩容

需安装 Prometheus Adapter 并定义指标规则。
示例（基于 HTTP 请求速率 QPS）：

复制代码

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: 100  # 每个 Pod 平均 QPS 达 100 时触发扩容

2. 行为控制（HPA Behavior）

调整扩缩容速度和冷却时间（Kubernetes 1.18+）：

关键参数 ：
- stabilizationWindowSeconds：扩缩容后的冷却时间，避免频繁波动。
- policies：扩缩容策略，支持 Percent（按比例）或 Pods（按固定数量）。
- periodSeconds：策略生效的时间窗口。

bash 复制代码

behavior: #定义扩缩容时间与速度
  scaleDown:  #缩容
    stabilizationWindowSeconds: 300  # 缩容冷却时间 5 分钟
    policies:
    - type: Percent  #按比例
      value: 10                      # 每次最多减少 10% 的副本
      periodSeconds: 60 #策略生效的时间
  scaleUp: #扩容
    stabilizationWindowSeconds: 60
    policies:
    - type: Pods #按pod的固定个数
      value: 4                       # 每次最多增加 4 个副本
      periodSeconds: 60

完整案例

基于上一个deployment例子，在原有hpa配置的基础上加上Behavior配置

ini 复制代码

[root@master hpa]# cat hpa.yml 
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
   scaleDown:
    stabilizationWindowSeconds: 120
    policies:
    - type: Percent
      value: 10
      periodSeconds: 60
   scaleUp:
    stabilizationWindowSeconds: 30
    policies:
    - type: Pods
      value: 4
      periodSeconds: 60
      
[root@master hpa]# kubectl apply -f hpa.yml
[root@master hpa]# kubectl get hpa 
NAME        REFERENCE                     TARGETS                       MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa   Deployment/nginx-deployment   cpu: 0%/30%, memory: 2%/50%   2         10        2          108s

测试

ini 复制代码

[root@master hpa]# ab -c 1000 -n 1000000 http://192.168.209.134:30018/

再开一个master终端执行命令查看结果

ini 复制代码

[root@master ~]# watch kubectl get hpa,deployment,pods

四、典型应用场景

场景1：Web 服务流量突增

配置：基于 CPU 使用率（目标 70%）和 QPS（目标 200/秒）。
效果：访问量激增时自动扩容至 10 个 Pod，流量下降后缩容至 2 个 Pod。

场景2：批处理任务队列

配置：基于消息队列中的待处理任务数（自定义指标）。
效果：任务堆积时扩容 Worker Pod，任务完成后自动缩容。

五、最佳实践

合理设置资源请求（Requests） ：
- 资源请求值直接影响使用率计算，需根据应用实际负载调整。
避免过度频繁扩缩 ：
- 通过 behavior 调整冷却时间和步长，防止抖动。
监控与告警 ：
- 使用 Prometheus + Grafana 监控 HPA 状态和指标趋势。
- 设置扩缩事件告警（如副本数达到最大值）。
多指标组合 ：
- 同时监控 CPU、内存和业务指标（如 QPS），避免单一指标误判。
测试与验证 ：
- 使用压测工具（如 k6、wrk）模拟流量，验证扩缩容策略的有效性。

六、常见问题排查

问题现象	可能原因	解决方案
HPA 状态显示 `<unknown>`	Metrics Server 未安装或配置错误	检查 Metrics Server 日志和网络连通性
Pod 数量不变化	指标未达到阈值，或 `minReplicas=maxReplicas`	检查 HPA 配置的阈值和副本数限制
扩缩延迟过高	Metrics Server 采集间隔或 HPA 计算延迟	调整 Metrics Server 采集频率（需修改部署参数）
自定义指标无法识别	Prometheus Adapter 配置错误	检查 Adapter 的指标映射规则和 Prometheus 数据源

总结

HPA 是 Kubernetes 实现自动化弹性伸缩的核心工具：

基础场景：基于 CPU/内存指标快速响应资源压力。
高级场景 ：结合自定义指标（如 QPS、队列长度）实现业务驱动的扩缩容。
通过合理配置和监控，HPA 可显著提升系统资源利用率，同时保障服务稳定性与可用性。

k8s自动弹性伸缩之HPA实践

一、HPA 工作原理

二、HPA 配置步骤

1. 部署 Metrics Server

2. 创建 Deployment 并设置资源请求

3. 创建 HPA 资源

核心字段详解

1. scaleTargetRef

2. minReplicas 和 maxReplicas

3. metrics

4. 查看 HPA 状态

三、HPA 高级配置

1. 基于自定义指标扩缩容

2. 行为控制（HPA Behavior）

完整案例

四、典型应用场景

场景1：Web 服务流量突增

场景2：批处理任务队列

五、最佳实践

六、常见问题排查

总结

1. `scaleTargetRef`

2. `minReplicas` 和 `maxReplicas`

3. `metrics`