K8S-HPA自动扩缩容实战指南

Kubernetes HPA 自动扩缩容实战指南：从 metrics-server 到多指标弹性伸缩

导读：凌晨 3 点，你的线上服务突然流量暴涨------Pod 一个个 OOM Killed，服务大面积中断。如果有一套机制能自动检测到负载上升并立即扩容，这种事故完全可以避免。这就是 Kubernetes HPA（Horizontal Pod Autoscaler）的价值。本文将从 metrics-server 部署开始，手把手带你实现基于 CPU 和内存的双指标自动扩缩容，并对比 HPA 与 VPA 的适用场景，提供可直接落地的生产级配置方案。

一、为什么需要自动扩缩容？

1.1 手动扩缩的痛点

传统运维中，应对流量高峰通常依赖人工干预：

复制代码

场景：电商大促活动

时间线：
18:00  运维手动扩容到 20 个副本
20:00  流量超出预期，继续手动扩到 50 个副本
02:00  活动结束，运维忘记缩容...
08:00  领导看到账单：50 个副本空跑了一整夜 💸

问题：
  ❌ 响应不及时：流量突增时来不及扩容
  ❌ 人力成本高：需要 7×24 值班监控
  ❌ 资源浪费：流量低谷忘记缩容
  ❌ 容易出错：压力下手动操作容易失误

1.2 K8S 自动扩缩方案

K8S 提供了三层弹性伸缩能力：

层级	组件	作用	伸缩对象
Pod 水平伸缩	HPA	根据指标自动增减 Pod 副本数	Pod 数量（增加/减少）
Pod 垂直伸缩	VPA	自动调整 Pod 的 CPU/内存资源	Pod 资源配额（调大/调小）
节点伸缩	Cluster Autoscaler	自动增减集群 Node 数量	节点数量（云环境）

本文重点：HPA ------ 最常用、最成熟的自动扩缩容方案。

1.3 HPA vs VPA 核心区别

对比项	HPA（水平伸缩）	VPA（垂直伸缩）
伸缩方向	增减 Pod 副本数	调整 Pod 的 CPU/内存
调整对象	Deployment / StatefulSet 等工作负载	单个 Pod 的 resources
是否重启	不重启，仅增减副本	通常需要重启 Pod
触发指标	CPU / 内存使用率、自定义指标（QPS 等）	历史资源使用趋势
核心目标	应对流量波动，保证服务容量	优化资源利用率，避免浪费
成熟度	稳定版（GA）	Beta（测试版）
适用场景	无状态服务、流量波动大的业务	有状态服务、资源需求不稳定的业务

生产建议：优先使用 HPA，VPA 目前仍处于 Beta 阶段，生产环境慎用。

二、前置条件：部署 metrics-server

2.1 为什么需要 metrics-server？

HPA 的扩缩决策依赖于监控指标数据 （CPU 使用率、内存使用量等）。而 K8S 默认不提供 这些数据------执行 kubectl top 会报错：

bash 复制代码

kubectl top pods
error: Metrics API not available

kubectl top nodes
error: Metrics API not available

metrics-server 就是提供这些监控数据的组件。它的数据流如下：

复制代码

┌──────────┐    /metrics/resource    ┌────────────────┐    Metrics API    ┌───────────┐
│  Kubelet  │ ──────────────────────→ │ metrics-server │ ───────────────→ │   HPA     │
│ (每节点)  │   CPU + 内存原始数据     │  (聚合+缓存)   │  提供标准化API    │ (决策引擎) │
└──────────┘                          └────────────────┘                  └───────────┘

关键理解：metrics-server 本身不存储历史数据，它只是 kubelet 指标的"搬运工"。如果需要历史监控和告警，应该使用 Prometheus。

2.2 部署 metrics-server

bash 复制代码

# 1. 下载资源清单（高可用版本）
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml

# 2. 关键修改：添加 --kubelet-insecure-tls 参数
# 否则会因为 kubelet 自签证书报 x509 错误
# 在 Deployment 的 args 中添加：
#   - --kubelet-insecure-tls

# 3. 所有节点导入镜像
docker load -i metrics-server-v0.7.2.tar.gz

# 4. 部署
kubectl apply -f high-availability-1.21+.yaml

# 5. 确认组件正常运行
kubectl get pods -n kube-system -l k8s-app=metrics-server -o wide
NAME                                  READY   STATUS    RESTARTS   AGE
pod/metrics-server-6b4f784878-4jvjt   1/1     Running   0          49s
pod/metrics-server-6b4f784878-6dgvf   1/1     Running   0          49s

2.3 验证 metrics-server 工作

bash 复制代码

# 查看节点资源使用情况
kubectl top nodes
NAME        CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
master231   198m         9%     3006Mi          79%
worker232   180m         9%     3124Mi          82%
worker233   217m         5%     5549Mi          71%

# 查看 Pod 资源使用情况
kubectl top pods -A
NAMESPACE              NAME                                     CPU(cores)   MEMORY(bytes)
kube-system            metrics-server-6b4f784878-4jvjt         3m           16Mi
kube-system            metrics-server-6b4f784878-6dgvf         2m           14Mi

# 也可以通过 API 直接查看（验证 Metrics API 是否可用）
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/worker233 | python3 -m json.tool
{
    "kind": "NodeMetrics",
    "usage": {
        "cpu": "352m",
        "memory": "2257272Ki"
    }
}

排障提示 ：如果 kubectl top 仍然报错，检查：① metrics-server Pod 是否 Running；② 是否添加了 --kubelet-insecure-tls；③ kubelet 的 10250 端口是否可达。

三、HPA 核心原理

3.1 扩缩算法

HPA 的核心是一个简单的公式：

复制代码

期望副本数 = ceil(当前副本数 × (当前指标值 / 目标指标值))

举例：

复制代码

当前状态：
  - 副本数：2
  - CPU 使用率：180%（所有副本平均值）
  - 目标 CPU 使用率：90%

计算过程：
  期望副本数 = ceil(2 × 180 / 90) = ceil(4) = 4

结论：HPA 会将副本数从 2 扩容到 4

3.2 扩缩行为参数

yaml 复制代码

spec:
  behavior:                        # K8S 1.18+ 支持
    scaleDown:                     # 缩容行为
      stabilizationWindowSeconds: 300   # 缩容稳定窗口：5分钟内不再次缩容
      policies:
      - type: Percent
        value: 10                  # 每次最多缩容 10% 的副本
        periodSeconds: 60          # 每 60 秒评估一次
    scaleUp:                       # 扩容行为
      stabilizationWindowSeconds: 0     # 扩容立即生效（不等待）
      policies:
      - type: Percent
        value: 100                 # 每次最多扩容 100% 的副本（即翻倍）
        periodSeconds: 15          # 每 15 秒评估一次
      - type: Pods
        value: 4                   # 或每次最多增加 4 个 Pod
        periodSeconds: 15
      selectPolicy: Max            # 取多个策略中更激进的那个

默认行为：如果不配置 behavior，HPA 默认每 15 秒扩容一次（无冷却），每 5 分钟缩容一次（有冷却期）。

3.3 HPA 工作流程

复制代码

                metrics-server
                     │
                     ▼
              ┌──────────────┐
              │   HPA 控制器  │ ←── 周期性（默认15s）查询 Metrics API
              │   (评估指标)   │
              └──────┬───────┘
                     │
              ┌──────┴───────┐
              │ 当前 > 目标？  │
              └──────┬───────┘
               是/        \否
              扩容         缩容
               │            │
               ▼            ▼
     调整 Deployment 副本数
               │
               ▼
     Deployment Controller
     创建/删除 Pod

四、HPA 实战：基于 CPU 指标

4.1 前提条件

使用 HPA 的硬性要求 ：Pod 的 resources.requests.cpu 必须配置，否则 HPA 无法计算 CPU 使用率。

yaml 复制代码

# 必须有 requests.cpu！
resources:
  requests:
    cpu: 200m          # ← 必须配置
    memory: 300Mi

为什么？ CPU 使用率的计算公式是 实际使用量 / requests。如果没有 requests，分母为零，无法计算百分比。

4.2 完整 YAML

yaml 复制代码

# 01-deploy-hpa.yaml
---
# Deployment：被 HPA 管理的工作负载
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-stress
spec:
  replicas: 1                     # 初始副本数（HPA 会自动调整）
  selector:
    matchLabels:
      app: stress
  template:
    metadata:
      labels:
        app: stress
    spec:
      containers:
      - image: harbor250.test.com/oldboyedu-tools/stress:v0.1
        name: c1
        args: [tail, -f, /etc/hosts]
        resources:
          requests:
            cpu: 200m              # ⚠️ HPA 计算CPU使用率的分母
            memory: 300Mi
          limits:
            cpu: 500m
            memory: 500Mi

---
# HPA：自动扩缩容规则
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: stress-hpa
spec:
  maxReplicas: 5                  # 最大副本数上限
  minReplicas: 2                  # 最小副本数下限
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deploy-stress           # 关联的 Deployment
  targetCPUUtilizationPercentage: 95   # CPU 使用率目标值

4.3 部署并观察

bash 复制代码

kubectl apply -f 01-deploy-hpa.yaml

# 初始状态：Pod 从 1 个自动扩到 minReplicas=2 个
kubectl get deploy,hpa,po
NAME                            READY   UP-TO-DATE   AVAILABLE
deployment.apps/deploy-stress   1/1     1            1

NAME                                             REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   <unknown>/95%   2   5   0

# 等待 metrics-server 采集到数据后
kubectl get deploy,hpa,po
NAME                            READY   UP-TO-DATE   AVAILABLE
deployment.apps/deploy-stress   2/2     2            2

NAME                                             REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   0%/95%    2   5   2

注意：HPA 创建后，TARGETS 初始显示 <unknown>，需要等待 metrics-server 完成首次数据采集（通常 30-60 秒）。HPA 发现当前副本数（1）小于 minReplicas（2），会自动扩到 2。

4.4 压力测试：触发自动扩容

bash 复制代码

# 进入 Pod 执行 CPU 压力测试
kubectl exec deploy-stress-xxx-f9rff -- stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10m
stress: info: [7] dispatching hogs: 8 cpu, 4 io, 2 vm, 0 hdd

bash 复制代码

# 观察 HPA 自动扩容
kubectl get deploy,hpa,po
# CPU 达到 105% > 目标 95%，触发扩容
NAME                                             REFERENCE                  TARGETS    MINPODS   MAXPODS   REPLICAS
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   105%/95%   2         5         2
                                                                    ↑ 超过阈值！   ↑ 当前2个  ↑ 上限5个
# 等待一段时间后，Pod 自动扩到 3 个
NAME                            READY   UP-TO-DATE   AVAILABLE
deployment.apps/deploy-stress   3/3     3            3

# 继续增加压力（多个 Pod 同时压测）
kubectl exec deploy-stress-xxx-rzgsm -- stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10m
kubectl exec deploy-stress-xxx-zxgp6 -- stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10m

# Pod 继续扩容，最终达到 maxReplicas=5
NAME                            READY   UP-TO-DATE   AVAILABLE
deployment.apps/deploy-stress   5/5     5            5

NAME                                             REFERENCE                  TARGETS    MINPODS   MAXPODS   REPLICAS
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   200%/95%   2         5         5
                                                                    ↑ 持续高负载  ↑ 达到上限

4.5 停止压力：观察自动缩容

bash 复制代码

# 停止所有 stress 进程（删除所有 Pod 让 Deployment 重建）
kubectl delete pods --all -l app=stress

# 等待 CPU 下降后，观察 HPA 自动缩容
kubectl get hpa -w
NAME                                             REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   200%/95%   2         5         5
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   50%/95%    2         5         5
# 等待稳定窗口（默认5分钟）后开始缩容
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   0%/95%     2         5         4
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   0%/95%     2         5         3
horizontalpodautoscaler.autoscaling/stress-hpa   Deployment/deploy-stress   0%/95%     2         5         2
# 最终回到 minReplicas

注意缩容延迟 ：HPA 的缩容有稳定窗口（默认 5 分钟），这是为了防止指标波动导致频繁缩容/扩容（"抖动"）。扩容则是立即生效的。

五、HPA 实战：基于内存指标

5.1 autoscaling/v1 vs autoscaling/v2

API 版本	支持的指标	用法
`autoscaling/v1`	仅 CPU	`targetCPUUtilizationPercentage`
`autoscaling/v2`	CPU + 内存 + 自定义指标	`metrics` 列表
`autoscaling/v2beta2`	同 v2（已合并）	同 v2

内存 HPA 必须使用 autoscaling/v2，v1 不支持。

5.2 完整 YAML

yaml 复制代码

# 02-deploy-hpa-memory.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-xiuxian-memory
spec:
  replicas: 1
  selector:
    matchLabels:
      apps: xiuxian
  template:
    metadata:
      labels:
        apps: xiuxian
    spec:
      containers:
      - name: c1
        image: registry.cn-hangzhou.aliyuncs.com/eci_open/nginx:latest
        resources:
          requests:
            memory: 100Mi           # ⚠️ 内存使用率的计算基准
            cpu: 100m

---
# 基于 autoscaling/v2 的 HPA（支持内存指标）
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-memory
spec:
  minReplicas: 2
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deploy-xiuxian-memory
  metrics:
  - type: Resource
    resource:
      name: memory                # 指定内存指标
      target:
        type: Utilization
        averageUtilization: 60    # 内存使用率目标值：60%

5.3 验证内存扩容

bash 复制代码

kubectl apply -f 02-deploy-hpa-memory.yaml

# 观察 HPA 变化（压力测试后）
kubectl get hpa -w
NAME                                             REFERENCE                   TARGETS    MINPODS   MAXPODS   REPLICAS
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   48%/60%    2         5         2
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   93%/60%    2         5         2
# ↑ 内存达到93%，超过目标60%，触发扩容
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   49%/60%    2         5         4
# ↑ 扩到4个副本后，内存使用率下降
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   140%/60%   2         5         4
# ↑ 负载持续增加
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   71%/60%    2         5         5
# ↑ 达到maxReplicas上限
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   93%/60%    2         5         5
# 停止压测后，内存逐渐下降
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   3%/60%     2         5         5
# 等待5分钟稳定窗口后缩容
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   3%/60%     2         5         4
horizontalpodautoscaler.autoscaling/hpa-memory   Deployment/deploy-xiuxian   2%/60%     2         5         2

六、HPA 实战：CPU + 内存双指标

生产环境中推荐同时配置 CPU 和内存双指标，任一指标超过阈值都会触发扩容。

6.1 完整 YAML

yaml 复制代码

# 03-deploy-hpa-xiuxian.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-xiuxian
spec:
  replicas: 1
  selector:
    matchLabels:
      apps: xiuxian
  template:
    metadata:
      labels:
        apps: xiuxian
    spec:
      containers:
      - name: c1
        image: registry.cn-hangzhou.aliyuncs.com/eci_open/nginx:latest
        resources:
          requests:
            memory: 100Mi
            cpu: 100m

---
# HPA-CPU（autoscaling/v1，简洁写法）
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-cpu
spec:
  maxReplicas: 10
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deploy-xiuxian
  targetCPUUtilizationPercentage: 95

---
# HPA-Memory（autoscaling/v2，支持多指标）
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-memory
spec:
  minReplicas: 2
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deploy-xiuxian
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 90

---
# Service：暴露服务用于压力测试
apiVersion: v1
kind: Service
metadata:
  name: svc-xiuxian
spec:
  type: LoadBalancer
  selector:
    apps: xiuxian
  ports:
  - port: 80

6.2 使用 ab 工具进行压力测试

bash 复制代码

# 安装 ab 工具
apt -y install apache2-utils

# 对 Service IP 发起并发请求
ab -n 1000000 -c 100 http://10.0.0.151/

# 观察 HPA 变化
kubectl get hpa -w

6.3 响应式创建 HPA（命令行方式）

除了编写 YAML，还可以直接用命令快速创建：

bash 复制代码

# 一条命令创建 HPA
kubectl autoscale deployment deploy-stress \
  --min=2 \
  --max=5 \
  --cpu-percent=95

# 导出为 YAML 查看格式
kubectl autoscale deployment deploy-stress \
  --min=2 --max=5 --cpu-percent=95 \
  -o yaml --dry-run=client

七、生产环境 HPA 最佳实践

7.1 配置检查清单

在使用 HPA 之前，确保以下条件全部满足：

检查项	要求	验证命令
metrics-server 已部署	Pod Running	`kubectl get pods -n kube-system -l k8s-app=metrics-server`
Metrics API 可用	`kubectl top` 正常返回数据	`kubectl top nodes`
Pod 配置了 requests	`resources.requests.cpu` 必须有值	`kubectl describe deploy <name>`
Pod 配置了 limits	建议配置（防止单 Pod 吃光资源）	同上
minReplicas 合理	不建议设为 1（单点故障风险）	审查 HPA YAML
maxReplicas 合理	不超过集群资源承载能力	`kubectl describe nodes`

7.2 目标阈值设置建议

指标	建议目标值	说明
CPU 使用率	70%-80%	留出余量应对突发流量，但不能太低浪费资源
内存使用率	75%-85%	内存超限会 OOM Kill，比 CPU 更危险，余量稍大
自定义指标	根据业务 SLA	如 QPS 目标值设为单 Pod 承载能力的 70%

7.3 behavior 配置推荐

yaml 复制代码

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300     # 缩容冷却：5分钟
      policies:
      - type: Pods
        value: 1                          # 每次最多缩 1 个 Pod（保守缩容）
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0       # 扩容不等待
      policies:
      - type: Pods
        value: 2                          # 每次最多扩 2 个 Pod
        periodSeconds: 15
      - type: Percent
        value: 50                         # 或每次扩 50%
        periodSeconds: 15
      selectPolicy: Max                    # 取更激进的策略

设计原则 ：扩容要快，缩容要慢。快速扩容保证服务不中断，慢速缩容防止指标波动导致"抖动"。

7.4 HPA 不适用的场景

场景	原因	替代方案
有状态服务（数据库）	Pod 不能随意增减	手动管理 / VPA
批处理任务	不是常驻服务	Job / CronJob
需要秒级响应	HPA 评估周期最短 15 秒	自定义脚本 + 预留缓冲
单 Pod 服务	无法水平扩展	VPA / 垂直扩容

八、排障指南

8.1 常见问题

问题	原因	解决方案
TARGETS 显示 `<unknown>`	metrics-server 未采集到数据	检查 metrics-server 是否 Running
HPA 不扩容	requests.cpu 未配置	给 Pod 添加 `resources.requests.cpu`
HPA 不缩容	处于稳定窗口期	等待 5 分钟或调整 stabilizationWindowSeconds
Pod 一直扩容到 maxReplicas	目标值设置过低或 requests 过小	调高目标值或调大 requests
`kubectl top` 报错	metrics-server 部署问题	检查 `--kubelet-insecure-tls` 参数

8.2 排障命令速查

bash 复制代码

# 查看 HPA 详细事件
kubectl describe hpa <hpa-name>

# 持续观察 HPA 变化
kubectl get hpa -w

# 查看 Pod 资源使用
kubectl top pods

# 查看节点资源剩余
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"

# 查看 HPA 管理的 Deployment 副本数
kubectl get deploy <deploy-name>

九、面试高频考点速查

问题	答案要点
HPA 的扩缩公式？	`期望副本数 = ceil(当前副本数 × 当前值 / 目标值)`
HPA 的前提条件？	必须部署 metrics-server ，Pod 必须配置 resources.requests.cpu
autoscaling/v1 和 v2 的区别？	v1 仅支持 CPU；v2 支持 CPU + 内存 + 自定义指标
为什么扩容快、缩容慢？	扩容无冷却期（保护服务），缩容有 5 分钟稳定窗口（防抖动）
HPA 能管理 StatefulSet 吗？	可以，但不推荐（有状态服务不适合水平伸缩）
CPU 使用率怎么计算的？	`实际使用量 / requests.cpu × 100%`
VPA 和 HPA 能同时用吗？	不能同时用于同一个资源，会互相冲突
metrics-server 和 Prometheus 的区别？	metrics-server 只提供实时数据（无历史）；Prometheus 存储历史数据并支持告警

十、总结

本文从部署到实战，系统讲解了 K8S HPA 自动扩缩容：

前置条件：metrics-server 是 HPA 的基础，没有它 HPA 无法获取指标数据
CPU 指标 HPA ：最常用的扩缩策略，使用 autoscaling/v1 即可，Pod 必须配置 requests.cpu
内存指标 HPA ：需要 autoscaling/v2，Pod 必须配置 requests.memory
双指标 HPA：生产推荐同时配置 CPU + 内存，任一超阈值即触发扩容
behavior 配置 ：核心原则是"扩容要快，缩容要慢"

生产建议：

所有需要弹性伸缩的服务都必须配置 HPA

minReplicas 至少设为 2，避免单点故障

requests 和 limits 同时配置，保持合理比例（如 requests:limits = 1:2）

预先用压力测试验证扩缩行为是否符合预期

监控 HPA 的 TARGETS 指标，确保决策数据准确

环境信息：

Kubernetes: v1.23.17
metrics-server: v0.7.2
部署方式: kubeadm
CNI: Flannel / Calico