IT策士 10余年一线大厂经验,专注 IT 思维、架构、职场进阶。我会在各个平台持续发布最新文章,助你少走弯路。
在第 44 篇中,我们把 Flask + Redis 计数器应用的基础架构搬上了 Kubernetes------Redis 持久化部署、Flask 三副本 Deployment、ConfigMap 和 Secret 管理配置、Service 提供内部服务发现。现在,应用已经能在集群内稳定运行了。
但"能在集群内跑"离"生产就绪"还有距离。外部用户还无法访问;流量突增时还不会自动扩容;更新镜像版本还需要手动操作;出问题时缺少监控和日志。
今天这篇,我们就完成迁移的最后一步------为应用加上 Ingress 外部入口、HPA 自动伸缩、滚动更新验证,以及 Prometheus 监控和 Loki 日志接入,让这个贯穿案例真正达到生产级标准。
一、回顾第 44 篇的进度
第 44 篇完成时,集群中已有以下资源:
bash
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/flask-deployment-xxxxxxxxx-xxxxx 1/1 Running 0 10m
pod/flask-deployment-xxxxxxxxx-yyyyy 1/1 Running 0 10m
pod/flask-deployment-xxxxxxxxx-zzzzz 1/1 Running 0 10m
pod/redis-xxxxxxxxx-xxxxx 1/1 Running 0 15m
NAME TYPE CLUSTER-IP PORT(S) AGE
service/flask-service ClusterIP 10.96.200.80 5000/TCP 10m
service/redis-service ClusterIP 10.96.100.50 6379/TCP 15m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/flask-deployment 3/3 3 3 10m
deployment.apps/redis 1/1 1 1 15m
内部通信正常,计数器递增正常。但 flask-service 只是 ClusterIP,外部无法访问。接下来我们首先解决这个问题。
二、配置 Ingress:让外部流量进来
2.1 确保 Ingress Controller 已部署
bash
minikube addons enable ingress
# 🌟 The 'ingress' addon is enabled
kubectl get pods -n ingress-nginx
# NAME READY STATUS RESTARTS AGE
# ingress-nginx-controller-7c8b6f9d5f-abcde 1/1 Running 0 30s
2.2 创建 Ingress 资源
bash
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: flask-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: counter.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: flask-service
port:
number: 5000
bash
kubectl apply -f ingress.yaml
# ingress.networking.k8s.io/flask-ingress created
kubectl get ingress
# NAME CLASS HOSTS ADDRESS PORTS AGE
# flask-ingress nginx counter.local 192.168.49.2 80 10s
2.3 验证外部访问
bash
# 获取 Minikube IP
MINIKUBE_IP=$(minikube ip)
# 通过域名访问(手动指定 Host 头)
curl -H "Host: counter.local" http://$MINIKUBE_IP
# Hello World! I have been seen 1 times.
curl -H "Host: counter.local" http://$MINIKUBE_IP
# Hello World! I have been seen 2 times.
外部流量通过 Ingress → flask-service → Flask Pod 的完整链路已打通。同时,配置了路径重写注解 rewrite-target: /,确保 Flask 应用收到的路径始终是根路径。
三、启用 HPA:自动弹性伸缩
3.1 确保 Metrics Server 已启用
bash
minikube addons enable metrics-server
# 🌟 The 'metrics-server' addon is enabled
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# flask-deployment-xxxxxxxxx-xxxxx 25m 95Mi
3.2 创建 HPA
HPA 资源根据 CPU 使用率自动调整 Deployment 的副本数:
bash
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: flask-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: flask-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
bash
kubectl apply -f hpa.yaml
# horizontalpodautoscaler.autoscaling/flask-hpa created
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# flask-hpa Deployment/flask-deployment 5%/50% 2 10 3 30s
当前 CPU 使用率仅 5%,副本数保持 3(minReplicas 设为 2,当前 3 个副本满足要求)。如果负载升高,CPU 超过 50%,HPA 会自动增加副本数。
3.3 压力测试:验证 HPA 触发与自动伸缩
在一个终端中对应用发起持续并发请求,模拟流量高峰:
bash
# 安装 Apache Bench(如果未安装)
# macOS: 自带 ab
# Ubuntu: sudo apt-get install -y apache2-utils
# 发起 10000 个请求,并发 100
ab -n 10000 -c 100 -H "Host: counter.local" http://$(minikube ip)/
在另一个终端中持续观察 HPA 状态和 Pod 数量变化:
bash
kubectl get hpa -w
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# flask-hpa Deployment/flask-deployment 5%/50% 2 10 3 1m
# flask-hpa Deployment/flask-deployment 120%/50% 2 10 3 1m
# flask-hpa Deployment/flask-deployment 120%/50% 2 10 6 2m
# flask-hpa Deployment/flask-deployment 65%/50% 2 10 8 3m
# flask-hpa Deployment/flask-deployment 40%/50% 2 10 8 4m
同时观察 Pod 数量的实时变化:
bash
kubectl get pods -l app=flask-counter -w
# flask-deployment-xxxxxxxxx-xxxxx 1/1 Running 0 10m
# flask-deployment-xxxxxxxxx-yyyyy 1/1 Running 0 10m
# flask-deployment-xxxxxxxxx-zzzzz 1/1 Running 0 10m
# flask-deployment-xxxxxxxxx-aaaaa 0/1 ContainerCreating 0 2s
# flask-deployment-xxxxxxxxx-bbbbb 0/1 ContainerCreating 0 2s
# flask-deployment-xxxxxxxxx-cccccc 0/1 ContainerCreating 0 2s
# flask-deployment-xxxxxxxxx-aaaaa 1/1 Running 0 15s
# flask-deployment-xxxxxxxxx-bbbbb 1/1 Running 0 15s
# flask-deployment-xxxxxxxxx-cccccc 1/1 Running 0 15s
HPA 伸缩过程:压测开始 → CPU 飙升至 120% → HPA 检测到超过 50% 阈值 → 副本数从 3 逐步扩展到 6 → 然后是 8 → CPU 负载被分摊后降至 65% → 随着压测结束,负载回到 40%。这正是 K8s 弹性伸缩的核心价值------应用在流量高峰时自动获得更多 Pod 分摊压力,完全无需人工介入。
bash
# 查看 HPA 事件日志
kubectl describe hpa flask-hpa
输出中可以看到 HPA 做出扩容决策的详细事件日志------从检测到 CPU 超阈值,到计算期望副本数,再到 Deployment 执行扩容的完整时间线。
四、滚动更新:零停机发布新版本
4.1 构建 v4.0 镜像
修改 app.py,将返回值改为 v4.0 的文案以示区分:
bash
return f'Hello K8s! You are visitor #{count}.\n'
构建并加载新镜像:
bash
docker build -t flask-redis-counter:4.0 .
输出关键行:
bash
[+] Building 32.5s (15/15) FINISHED
=> => naming to docker.io/library/flask-redis-counter:4.0
# 对比 v3.0 和 v4.0 镜像体积
docker images | grep flask-redis-counter
# flask-redis-counter 4.0 b2c3d4e5f6a7 138MB
# flask-redis-counter 3.0 a1b2c3d4e5f6 138MB
体积完全一致(138MB),这正是多阶段构建和缓存机制带来的可复现性。
bash
minikube image load flask-redis-counter:4.0
4.2 触发滚动更新
bash
kubectl set image deployment/flask-deployment flask=flask-redis-counter:4.0
# deployment.apps/flask-deployment image updated
4.3 实时观察滚动更新
bash
kubectl rollout status deployment/flask-deployment
# Waiting for deployment "flask-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
# Waiting for deployment "flask-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
# deployment "flask-deployment" successfully rolled out
同时观察 Pod 的逐步替换过程:
bash
kubectl get pods -l app=flask-counter -w
# flask-deployment-xxxxxxxxx-xxxxx 1/1 Running 0 15m
# flask-deployment-xxxxxxxxx-yyyyy 1/1 Running 0 15m
# flask-deployment-xxxxxxxxx-zzzzz 1/1 Running 0 15m
# flask-deployment-yyyyyyyy-aaaaa 0/1 ContainerCreating 0 2s
# flask-deployment-yyyyyyyy-aaaaa 1/1 Running 0 15s
# flask-deployment-xxxxxxxxx-xxxxx 1/1 Terminating 0 0s
# flask-deployment-yyyyyyyy-bbbbb 0/1 ContainerCreating 0 2s
# flask-deployment-yyyyyyyy-bbbbb 1/1 Running 0 15s
# flask-deployment-xxxxxxxxx-yyyyy 1/1 Terminating 0 0s
# flask-deployment-yyyyyyyy-cccccc 0/1 ContainerCreating 0 2s
# flask-deployment-yyyyyyyy-cccccc 1/1 Running 0 15s
# flask-deployment-xxxxxxxxx-zzzzz 1/1 Terminating 0 0s
新旧 Pod 逐个交替------先创建一个新 Pod,等待它就绪后,再删除一个旧 Pod。在整个更新过程中,始终有足够数量的 Pod 在处理请求,实现了真正的零停机发布。
4.4 验证更新效果
bash
curl -H "Host: counter.local" http://$(minikube ip)
# Hello K8s! You are visitor #156. ← v4.0 的新文案,计数器继续累加
镜像版本从 3.0 更新到了 4.0,但 Redis 中的计数器数据完好无损------这正是 PVC 持久化存储的价值。
4.5 模拟回滚
bash
# 查看版本历史
kubectl rollout history deployment/flask-deployment
# REVISION CHANGE-CAUSE
# 1 <none>
# 2 <none>
# 回滚到上一个版本
kubectl rollout undo deployment/flask-deployment
# deployment.apps/flask-deployment rolled back
curl -H "Host: counter.local" http://$(minikube ip)
# Hello World! I have been seen 157 times. ← 回到 v3.0 的文案
从触发更新到回滚,整个过程无需停机,用户请求始终得到正常响应。
五、接入 Prometheus 监控
5.1 为 Flask 应用添加指标端点
在第 41 篇中我们已经为 Flask 应用集成了 Prometheus 指标端点 /metrics。使用 ServiceMonitor 让 Prometheus 自动发现 Flask 应用的指标:
bash
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: flask-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: flask-counter
endpoints:
- port: http
path: /metrics
interval: 15s
bash
kubectl apply -f servicemonitor.yaml
5.2 查看 Grafana 仪表板
在 Grafana(通过 kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80 访问)中导入 Flask 仪表板后,可以看到:
*Grafana 仪表板展示:flask_http_request_total 计数器持续线性增长,QPS 稳定在 120 左右。node_cpu_seconds_total 面板显示集群节点 CPU 使用率在 15% 到 65% 之间波动。container_memory_usage_bytes 面板显示 Flask Pod 内存使用稳定在 100MiB 到 128MiB 之间,符合我们设置的 Requests/Limits。*
六、接入 Loki 日志
6.1 确保 Loki Stack 已部署
bash
kubectl get pods -n logging
# NAME READY STATUS RESTARTS AGE
# loki-0 1/1 Running 0 1m
# promtail-xxxxx 1/1 Running 0 1m
6.2 在 Grafana 中查询日志
在 Grafana Explore 中使用 LogQL 查询:
bash
{app="flask-counter"} |= "ERROR"
通过日志可以快速定位到具体的错误请求时间、来源 IP 和错误详情。
6.3 监控告警联动
Prometheus 中配置的 Pod 重启告警规则生效后,Alertmanager 会在条件触发时发送通知。例如,当故意将 Flask Deployment 镜像改为不存在的版本触发 ImagePullBackOff 时,Alertmanager 会检测到 Pod 频繁重启并发送告警:
*Slack 通知示例:Alert: PodFrequentlyRestarting, Severity: warning, Pod flask-deployment-xxxxxxxx-xxxxx has restarted 3 times in the last 5 minutes.*
七、完整迁移成果对比
八、技术演进路线图
从本系列第 1 篇到现在,我们走过的完整技术演进路线如下:
-
单机容器化(Docker):将应用打包为镜像,实现"一次构建,处处运行"。
-
单机编排(Compose):用声明式 YAML 管理多容器应用,一键启停。
-
集群编排(K8s 核心对象):通过 Deployment、Service、ConfigMap、PVC 等对象,将应用提升到跨节点、声明式、自愈的集群级管理水平。
-
生产级运维(监控、日志、CI/CD、安全):接入 Prometheus 监控、Loki 日志、HPA 自动伸缩、NetworkPolicy 网络隔离,让应用具备生产环境所需的可观测性和安全性。
九、命令速查表
十、本篇总结
-
外部访问:通过 Ingress 配置域名路由,外部流量经 Ingress Controller 进入集群,根据 Host 头精确分发到对应 Service。
-
弹性伸缩 :部署 HPA,设定 CPU 50% 阈值,通过
ab压力测试验证副本数从 3 自动扩展到 8 的完整过程。 -
零停机更新与回滚 :通过
kubectl set image触发滚动更新,观察新旧 Pod 逐个交替的过程。验证了更新中服务不中断、rollout undo一键回滚的能力。 -
可观测性闭环:将应用接入 Prometheus(监控指标)和 Loki(聚合日志),在 Grafana 同一平台中实现指标与日志的联动分析。Alertmanager 配置的告警规则在 Pod 异常重启时自动发送通知。
下一篇------第 46 篇:CI/CD 集成:GitOps 理念与 ArgoCD,我们将学习如何将这套部署流程自动化------代码推送后自动构建镜像、自动部署到 K8s,真正实现 Git 到集群的端到端持续交付。
想了解更多还可以去各个平台搜索「IT策士」,一起升级 IT 思维 !