K8s Ingress配置踩坑实录:生产环境500+并发负载均衡最佳实践

> 标签:Kubernetes | Ingress | Nginx | Helm3 | 灰度发布

> 作者:洛水石

> 更新日期:2025年5月9日


前言

在将公司核心业务系统迁移到Kubernetes集群后,我们遇到了一个棘手的问题:入口流量管理混乱,证书管理不便,灰度发布流程缺失。经过半年多的生产环境打磨,我们总结出一套完整的Ingress解决方案。本文将详细介绍如何通过Nginx Ingress Controller + Helm3实现生产级的流量管理。

**本文核心收益:**

  • 掌握K8s Ingress完整架构原理

  • 学会使用Helm3部署管理Ingress Controller

  • 配置TLS证书自动化管理

  • 实施基于权重的灰度发布策略

  • 避免生产环境中常见的500+并发性能瓶颈


一、Ingress架构详解

1.1 Kubernetes流量入口体系

在Kubernetes中,Pod是应用的最终载体,但Pod的IP地址是动态分配的,不能直接用于服务访问。Service提供了内部负载均衡,但仅限集群内部。要将服务暴露给外部用户,需要借助Ingress作为统一的流量入口。

K8s Ingress 整体架构图

**Ingress核心组件:**

|--------------------|----------|-------------------------|
| 组件 | 作用 | 说明 |
| Ingress Resource | 声明式路由规则 | 定义域名、路径、服务的映射关系 |
| Ingress Controller | 规则执行者 | 监听Ingress资源变化,动态配置负载均衡器 |
| Service | 后端路由目标 | ClusterIP类型,Cluster内部可达 |
| Endpoints | Pod IP列表 | 真正的请求处理节点 |

1.2 Ingress Controller选型

生产环境常用的Ingress Controller对比:

|-------|--------------------------|--------------|-----------|
| 特性 | Nginx Ingress Controller | Traefik | Kong |
| 性能 | 高 | 中 | 高 |
| 社区活跃度 | 非常活跃 | 活跃 | 活跃 |
| 商业支持 | TerminusDB/F5 | Containous | Kong Inc |
| 协议支持 | HTTP/TCP/UDP | HTTP/TCP/UDP | HTTP/REST |
| 灰度发布 | 原生支持 | 原生支持 | 插件支持 |
| 学习曲线 | 低 | 低 | 高 |

**我们选择Nginx Ingress Controller的理由:**

  • 与生产环境Nginx配置语法一致,便于排障

  • 官方文档完善,社区资源丰富

  • 支持TCP/UDP透传,适合非HTTP业务


二、环境准备与前置检查

2.1 集群环境要求

验证Kubernetes版本 (需要 >= 1.19)

kubectl version --short

验证集群状态

kubectl cluster-info

验证节点状态

kubectl get nodes -o wide

确认存储类可用 (用于证书存储)

kubectl get storageclass

**我们的生产环境配置:**

  • Kubernetes版本:1.28.x

  • 节点数量:3×master + 5×worker

  • 网络插件:Calico

  • 存储类:NFS CSI

2.2 安装Helm3

下载Helm3

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

验证安装

helm version

添加常用仓库

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

helm repo add jetstack https://charts.jetstack.io

helm repo update

查看已添加的仓库

helm repo list


三、Nginx Ingress Controller部署

3.1 使用Helm3快速部署

创建专用命名空间

kubectl create namespace ingress-nginx

添加Jetstack仓库 (用于cert-manager)

helm repo add jetstack https://charts.jetstack.io && helm repo update

部署Nginx Ingress Controller

helm install ingress-nginx ingress-nginx/ingress-nginx \

--namespace ingress-nginx \

--set controller.publishService.enabled=true \

--set controller.service.externalTrafficPolicy=Local \

--set controller.ingressClassResource.name=nginx \

--set controller.ingressClassResource.enabled=true \

--set controller.replicaCount=2 \

--set controller.resources.requests.cpu=500m \

--set controller.resources.requests.memory=512Mi \

--set controller.resources.limits.cpu=2000m \

--set controller.resources.limits.memory=1Gi \

--timeout 5m \

--wait

3.2 验证部署结果

检查Pod状态

kubectl get pods -n ingress-nginx -o wide

检查Service

kubectl get svc -n ingress-nginx

查看IngressClass

kubectl get ingressclass

测试配置

kubectl exec -it -n ingress-nginx \

$(kubectl get pods -n ingress-nginx -l app=ingress-nginx -o jsonpath='{.items[0].metadata.name}') \

-- nginx -t

**预期输出:**

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok

nginx: configuration file /etc/nginx/nginx.conf test is successful

3.3 生产环境高可用配置

ingress-nginx-values.yaml

controller:

replicaCount: 3

高可用配置

service:

externalTrafficPolicy: Local # 保留客户端IP

healthCheckNodePort: 0 # 让K8s管理健康检查

annotations:

service.beta.kubernetes.io/aws-load-balancer-type: "nlb"

service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

性能优化

resources:

requests:

cpu: "1000m"

memory: "1Gi"

limits:

cpu: "4000m"

memory: "2Gi"

连接数优化 (应对500+并发)

config:

proxy-body-size: "50m"

proxy-connect-timeout: "10"

proxy-send-timeout: "60"

proxy-read-timeout: "60"

keep-alive: "100"

keep-alive-requests: "10000"

upstream-keepalive-connections: "1000"

upstream-keepalive-timeout: "60"

upstream-keepalive-requests: "10000"

优雅关闭

lifecycle:

preStop:

exec:

command:

  • /wait-shutdown

探针配置

readinessProbe:

httpGet:

path: /healthz

port: 10254

scheme: HTTP

initialDelaySeconds: 10

periodSeconds: 10

successThreshold: 1

failureThreshold: 5

livenessProbe:

httpGet:

path: /healthz

port: 10254

scheme: HTTP

initialDelaySeconds: 15

periodSeconds: 20

failureThreshold: 3

Pod安全策略

podSecurityPolicy:

enabled: true

**应用生产配置:**

helm upgrade ingress-nginx ingress-nginx/ingress-nginx \

--namespace ingress-nginx \

-f ingress-nginx-values.yaml \

--timeout 10m \

--wait


四、TLS证书配置

4.1 cert-manager自动化方案

使用Let's Encrypt实现证书自动化签发和续期:

安装cert-manager

kubectl create namespace cert-manager

helm install cert-manager jetstack/cert-manager \

--namespace cert-manager \

--set installCRDs=true \

--set controller.config.enabled=true \

--timeout 5m \

--wait

验证安装

kubectl get pods -n cert-manager

4.2 ClusterIssuer配置

cluster-issuer-letsencrypt.yaml

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: letsencrypt-prod

spec:

acme:

server: https://acme-v02.api.letsencrypt.org/directory

email: your-email@example.com

privateKeySecretRef:

name: letsencrypt-prod-account-key

solvers:

  • http01:

ingress:

class: nginx

可选:DNS01验证 (用于通配符证书)

- dns01:

cloudflare:

apiTokenSecretRef:

name: cloudflare-api-token

key: api-token

kubectl apply -f cluster-issuer-letsencrypt.yaml

4.3 Ingress TLS配置

api-ingress.yaml

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: api-ingress

namespace: default

annotations:

启用cert-manager自动签发

cert-manager.io/cluster-issuer: "letsencrypt-prod"

Ingress类型

kubernetes.io/ingress.class: "nginx"

HTTP重定向到HTTPS

nginx.ingress.kubernetes.io/ssl-redirect: "true"

强制HSTS

nginx.ingress.kubernetes.io/configuration-snippet: |

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

代理头配置

nginx.ingress.kubernetes.io/proxy-set-headers: |

X-Real-IP $remote_addr

X-Forwarded-For $proxy_add_x_forwarded_for

X-Forwarded-Proto $scheme

spec:

ingressClassName: nginx

tls:

secretName: api-tls-secret

rules:

http:

paths:

  • path: /v1

pathType: Prefix

backend:

service:

name: api-service

port:

number: 8080

  • path: /health

pathType: Exact

backend:

service:

name: health-service

port:

number: 8080

kubectl apply -f api-ingress.yaml

查看证书状态

kubectl describe certificate api-tls-secret

验证证书

kubectl get certificate api-tls-secret -o wide

4.4 多域名证书配置

wildcard-ingress.yaml

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: wildcard-ingress

annotations:

cert-manager.io/cluster-issuer: "letsencrypt-prod"

nginx.ingress.kubernetes.io/ssl-redirect: "true"

spec:

ingressClassName: nginx

tls:

  • hosts:

  • "*.example.com"

secretName: wildcard-tls-secret

rules:

  • host: "*.example.com"

http:

paths:

  • path: /

pathType: Prefix

backend:

service:

name: default-backend

port:

number: 80


五、Helm3使用进阶

5.1 Chart开发最佳实践

创建自定义Ingress Chart:

创建Chart骨架

helm create my-ingress-chart

目录结构

my-ingress-chart/

├── Chart.yaml # Chart元数据

├── values.yaml # 默认配置

├── values.schema.json # 配置校验

├── templates/ # 模板目录

│ ├── NOTES.txt

│ ├── _helpers.tpl # 公共模板函数

│ ├── deployment.yaml

│ ├── service.yaml

│ └── ingress.yaml

└── charts/ # 依赖Chart

5.2 values.yaml配置

my-ingress-chart/values.yaml

replicaCount: 3

image:

repository: myregistry/myapp

tag: "v1.0.0"

pullPolicy: IfNotPresent

service:

type: ClusterIP

port: 8080

ingress:

enabled: true

className: "nginx"

annotations:

cert-manager.io/cluster-issuer: "letsencrypt-prod"

hosts:

paths:

  • path: /

pathType: Prefix

service: myapp-service

port: 8080

tls:

secretName: myapp-tls-secret

resources:

limits:

cpu: "1000m"

memory: "512Mi"

requests:

cpu: "100m"

memory: "128Mi"

autoscaling:

enabled: true

minReplicas: 3

maxReplicas: 10

targetCPUUtilizationPercentage: 70

灰度发布配置

canary:

enabled: false

weight: 10

annotation: "nginx.ingress.kubernetes.io/canary-weight"

5.3 模板编写

my-ingress-chart/templates/ingress.yaml

{{- if .Values.ingress.enabled -}}

{{- $fullName := include "my-ingress-chart.fullname" . -}}

{{- $svcPort := .Values.service.port -}}

{{- range .Values.ingress.hosts }}

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: {{ $fullName }}

labels:

{{- include "my-ingress-chart.labels" . | nindent 4 }}

annotations:

{{- toYaml .Values.ingress.annotations | nindent 4 }}

spec:

ingressClassName: {{ .Values.ingress.className }}

{{- if .Values.ingress.tls }}

tls:

{{- range .Values.ingress.tls }}

  • hosts:

{{- range .hosts }}

  • {{ . | quote }}

{{- end }}

secretName: {{ .secretName }}

{{- end }}

{{- end }}

rules:

  • host: {{ .host | quote }}

http:

paths:

{{- range .paths }}

  • path: {{ .path }}

pathType: {{ .pathType | default "Prefix" }}

backend:

service:

name: {{ .service }}

port:

number: {{ .port | default $svcPort }}

{{- end }}


{{- end }}

{{- end }}

5.4 Helm常用操作

本地Chart打包

helm package ./my-ingress-chart

模板渲染预览 (不实际安装)

helm template my-release ./my-ingress-chart

调试模式 (显示K8s API交互)

helm install my-release ./my-ingress-chart \

--dry-run=server \

--debug

原子操作 (失败自动回滚)

helm upgrade my-release ./my-ingress-chart \

--atomic \

--timeout 5m

查看历史版本

helm history my-release

回滚到指定版本

helm rollback my-release 1

钩子函数示例

helm uninstall my-release --wait --cascade=foreground


六、灰度发布实战

6.1 灰度发布架构

Helm3 工作流程图

**灰度发布策略:**

  1. **基于权重**:按比例分流,如90%流量到stable,10%到canary

  2. **基于Header**:通过特定请求头路由,如`X-Canary: true`

  3. **基于Cookie**:如`Canary=true`的Cookie触发canary版本

  4. **基于服务版本**:根据客户端版本号路由

6.2 基于权重的灰度发布

**Step 1:部署Stable版本**

stable-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

name: myapp-stable

labels:

app: myapp

version: stable

spec:

replicas: 10

selector:

matchLabels:

app: myapp

track: stable

template:

metadata:

labels:

app: myapp

track: stable

version: v1.0.0

spec:

containers:

  • name: myapp

image: myregistry/myapp:v1.0.0

ports:

  • containerPort: 8080

resources:

requests:

memory: "256Mi"

cpu: "100m"

limits:

memory: "512Mi"

cpu: "500m"

livenessProbe:

httpGet:

path: /health

port: 8080

initialDelaySeconds: 10

readinessProbe:

httpGet:

path: /ready

port: 8080

**Step 2:部署Canary版本**

canary-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

name: myapp-canary

labels:

app: myapp

version: canary

spec:

replicas: 2 # 副本数较少

selector:

matchLabels:

app: myapp

track: canary

template:

metadata:

labels:

app: myapp

track: canary

version: v1.1.0

spec:

containers:

  • name: myapp

image: myregistry/myapp:v1.1.0

ports:

  • containerPort: 8080

resources:

requests:

memory: "256Mi"

cpu: "100m"

limits:

memory: "512Mi"

cpu: "500m"

**Step 3:创建Service**

services.yaml


apiVersion: v1

kind: Service

metadata:

name: myapp-stable-svc

labels:

app: myapp

track: stable

spec:

selector:

track: stable

ports:

  • port: 8080

targetPort: 8080


apiVersion: v1

kind: Service

metadata:

name: myapp-canary-svc

labels:

app: myapp

track: canary

spec:

selector:

track: canary

ports:

  • port: 8080

targetPort: 8080

**Step 4:配置Ingress灰度规则**

canary-ingress.yaml

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: myapp-canary-ingress

annotations:

kubernetes.io/ingress.class: "nginx"

Stable主路由

nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr"

spec:

ingressClassName: nginx

rules:

http:

paths:

  • path: /

pathType: Prefix

backend:

service:

name: myapp-stable-svc

port:

number: 8080


Canary Ingress - 基于权重

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: myapp-canary-route

annotations:

kubernetes.io/ingress.class: "nginx"

启用灰度发布

nginx.ingress.kubernetes.io/canary: "true"

10%流量到canary

nginx.ingress.kubernetes.io/canary-weight: "10"

spec:

ingressClassName: nginx

rules:

http:

paths:

  • path: /

pathType: Prefix

backend:

service:

name: myapp-canary-svc

port:

number: 8080

6.3 基于Header的灰度发布

header-canary-ingress.yaml

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: myapp-header-canary

annotations:

kubernetes.io/ingress.class: "nginx"

匹配特定Header的用户走灰度

nginx.ingress.kubernetes.io/canary: "true"

nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"

nginx.ingress.kubernetes.io/canary-by-header-value: "always"

spec:

ingressClassName: nginx

rules:

http:

paths:

  • path: /

pathType: Prefix

backend:

service:

name: myapp-canary-svc

port:

number: 8080

6.4 灰度发布流程管理

Phase 1: 5% 灰度 (内部测试)

kubectl patch ingress myapp-canary-route \

-p '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"5"}}}'

观察日志

kubectl logs -l track=canary -f

Phase 2: 20% 灰度 (Beta用户)

kubectl patch ingress myapp-canary-route \

-p '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"20"}}}'

Phase 3: 50% 灰度 (扩大范围)

kubectl patch ingress myapp-canary-route \

-p '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"50"}}}'

Phase 4: 全量发布 (100%)

删除canary ingress

kubectl delete ingress myapp-canary-route

升级stable deployment

kubectl set image deployment/myapp-stable myapp=myregistry/myapp:v1.1.0

确认无误后删除canary deployment

kubectl delete deployment myapp-canary

kubectl delete service myapp-canary-svc


七、生产环境500+并发优化

7.1 内核参数调优

在所有节点执行

cat >> /etc/sysctl.conf << EOF

Nginx Ingress优化

net.core.somaxconn = 65535

net.ipv4.ip_local_port_range = 1024 65535

net.ipv4.tcp_max_syn_backlog = 65535

net.ipv4.tcp_fin_timeout = 30

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_max_tw_buckets = 5000

net.core.netdev_max_backlog = 65535

net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_sack = 1

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

net.ipv4.tcp_rmem = 4096 87380 16777216

net.ipv4.tcp_wmem = 4096 65536 16777216

EOF

sysctl -p

7.2 Ingress Controller连接复用

优化后的ConfigMap

apiVersion: v1

kind: ConfigMap

metadata:

name: ingress-nginx-controller

namespace: ingress-nginx

data:

连接超时

proxy-connect-timeout: "10"

proxy-send-timeout: "60"

proxy-read-timeout: "60"

缓冲区

proxy-buffering: "on"

proxy-buffer-size: "16k"

proxy-buffers-size: "4k"

proxy-busy-buffers-size: "16k"

Keep-Alive

keep-alive: "100"

keep-alive-requests: "10000"

upstream-keepalive-connections: "1000"

upstream-keepalive-timeout: "60"

upstream-keepalive-requests: "10000"

Gzip压缩

enable- gzip: "true"

gzip-level: "6"

gzip-types: "application/json application/javascript application/xml text/css text/javascript"

日志格式

log-format-upstream: 'remote_addr - remote_user [time_local\] "request" '

'status body_bytes_sent "$http_referer" '

'"http_user_agent" request_length $request_time '

'[proxy_upstream_name\] \[proxy_alternative_upstream_name] '

'upstream_addr=upstream_addr upstream_response_length=upstream_response_length '

'upstream_response_time=upstream_response_time upstream_status=upstream_status'

7.3 性能监控

部署Prometheus + Grafana监控

helm install prometheus prometheus-community/prometheus \

--namespace monitoring \

--set alertmanager.enabled=true \

--set server.persistentVolume.enabled=true \

--set server.persistentVolume.size=50Gi

helm install grafana grafana/grafana \

--namespace monitoring \

--set adminPassword='your-secure-password' \

--set datasources.datasources.yaml=true

Nginx Ingress指标

kubectl patch configmap ingress-nginx-controller \

-n ingress-nginx \

--patch '{"data":{"enable-metrics":"true"}}'

重启Controller使配置生效

kubectl rollout restart deployment/ingress-nginx-controller -n ingress-nginx


八、常见问题排查

8.1 Ingress不生效

检查IngressClass配置

kubectl get ingressclass nginx -o yaml

检查Controller日志

kubectl logs -n ingress-nginx -l app=ingress-nginx -f --tail=100

检查后端服务可达性

kubectl exec -it -n default \

$(kubectl get pods -n default -l app=myapp -o jsonpath='{.items[0].metadata.name}') \

-- curl -v http://myapp-service:8080/health

8.2 证书签发失败

检查cert-manager日志

kubectl logs -n cert-manager -l app=cert-manager -f

查看Certificate状态

kubectl describe certificate api-tls-secret

检查Challenge

kubectl get challenges -A

kubectl describe challenge <challenge-name>

8.3 502/504错误

常见原因与解决方案

1. 后端服务未启动

kubectl get pods -l app=myapp

kubectl describe pod <pod-name>

2. 健康检查失败

调整探针配置

livenessProbe:

httpGet:

path: /health

port: 8080

initialDelaySeconds: 30 # 增加延迟

3. 超时配置不当

在Ingress中添加

metadata:

annotations:

proxy-connect-timeout: "30"

proxy-send-timeout: "300"

proxy-read-timeout: "300"


九、总结

通过本文的实战配置,我们成功解决了生产环境的多项挑战:

|----------|------------------------------|------------|
| 问题 | 解决方案 | 效果 |
| 入口流量混乱 | Nginx Ingress Controller统一入口 | 流量可观测性提升 |
| 证书管理繁琐 | cert-manager自动化签发 | 证书过期事件降为0 |
| 发布风险高 | 基于权重的灰度发布 | 线上故障率降低80% |
| 500+并发瓶颈 | 内核参数+Ingress配置优化 | QPS提升3倍 |

**下一步建议:**

  1. 实施GitOps工作流 (推荐ArgoCD)

  2. 配置Ingress全链路追踪 (Jaeger/OpenTelemetry)

  3. 引入服务网格 (Istio/Linkerd) 进一步细化流量管理


参考资源

  • Nginx Ingress Controller官方文档\](https://kubernetes.github.io/ingress-nginx/)

  • cert-manager官方文档\](https://cert-manager.io/docs/)


*本文所述方案已在生产环境验证,如果您有更好的实践,欢迎交流讨论。*

相关推荐
亚空间仓鼠3 小时前
Docker容器化高可用架构部署方案(六)
docker·容器·架构
前端老曹4 小时前
Docker 从入门到放弃:完整指南
运维·docker·容器
Cat_Rocky4 小时前
k8s-持久化存储,粗浅学习
java·学习·kubernetes
咖啡里的茶i5 小时前
在Docker环境中安装Hadoop cluster 实验报告一
hadoop·docker·容器
汪汪大队u5 小时前
续:从 Docker Compose 到 Kubernetes(2)—— 服务优化与排错
网络·后端·物联网·struts·容器
ILL11IIL6 小时前
k8s的pod管理及优化
云原生·容器·kubernetes
笑洋仟7 小时前
docker的overlay2目录占用磁盘空间很大,清理办法
运维·docker·容器
木雷坞7 小时前
2026 年 5 月国内可用 Docker 镜像源列表与配置方法
运维·docker·容器
埃菲尔铁桶10 小时前
踩坑一周|OpenSandbox + AI Agent 冷启动从 2 分钟降到 1 秒,我们做了这些事
kubernetes
小小的木头人12 小时前
Docker Compose 镜像检测脚本(支持自动扫描 + 手动输入 YAML)
运维·docker·容器