一、我们为什么需要云原生
2019年,我们的系统还运行在传统的物理机上。部署一次应用,需要:
- 申请服务器(1-2周)
- 配置环境(1-2天)
- 部署应用(半天)
- 配置监控(半天)
如果遇到突发流量,比如做活动,根本来不及扩容。活动结束后,服务器又闲置了。
后来,我们开始尝试Docker容器化。容器的好处立竿见影:
- 环境一致:开发、测试、生产完全一致
- 秒级启动:比物理机快了100倍
- 弹性伸缩:几分钟就能扩缩容
但这还不够。我们需要的不仅是容器化,而是云原生------一套完整的以云为基础的架构理念和方法论。
二、容器化实践
2.1 Docker最佳实践
dockerfile
# 多阶段构建:减小镜像大小
FROM maven:3.8-openjdk-8 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn clean package -DskipTests
# 运行时镜像:只包含运行时
FROM eclipse-temurin:8-jre-alpine
WORKDIR /app
# 安全:创建非root用户
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# 复制构建产物
COPY --from=builder /app/target/*.jar app.jar
# 设置权限
RUN chown -R appuser:appgroup /app
USER appuser
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s \
CMD wget -qO- http://localhost:8080/actuator/health || exit 1
# JVM优化
ENV JAVA_OPTS="-XX:+UseG1GC -XX:MaxRAMFraction=2 -XX:+ExitOnOutOfMemoryError"
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
2.2 Docker Compose本地开发环境
yaml
version: '3.8'
services:
app:
build: .
ports:
- "8080:8080"
environment:
- SPRING_PROFILES_ACTIVE=local
- SPRING_DATASOURCE_URL=jdbc:mysql://mysql:3306/testdb
- SPRING_REDIS_HOST=redis
depends_on:
- mysql
- redis
volumes:
- ./logs:/app/logs
networks:
- app-network
mysql:
image: mysql:8.0
environment:
- MYSQL_ROOT_PASSWORD=root123
- MYSQL_DATABASE=testdb
ports:
- "3306:3306"
volumes:
- mysql-data:/var/lib/mysql
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
networks:
- app-network
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
networks:
- app-network
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686"
networks:
- app-network
volumes:
mysql-data:
redis-data:
networks:
app-network:
driver: bridge
三、Kubernetes核心概念
3.1 Pod配置
yaml
apiVersion: v1
kind: Pod
metadata:
name: order-service-pod
labels:
app: order-service
version: v1
spec:
# 优雅终止
terminationGracePeriodSeconds: 60
# 初始化容器
initContainers:
- name: init-db
image: busybox:1.36
command:
- sh
- -c
- |
echo "Waiting for database to be ready..."
until nc -z mysql 3306; do
sleep 2
done
echo "Database is ready!"
containers:
- name: order-service
image: registry.example.com/order-service:v1.2.3
ports:
- containerPort: 8080
name: http
- containerPort: 9090
name: grpc
# 资源限制
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
# 健康检查
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
failureThreshold: 3
# 环境变量
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: JAVA_OPTS
value: "-Xmx768m -Xms512m -XX:+UseG1GC"
# Volume挂载
volumeMounts:
- name: app-logs
mountPath: /app/logs
- name: config
mountPath: /app/config
readOnly: true
# 数据卷
volumes:
- name: app-logs
emptyDir: {}
- name: config
configMap:
name: order-service-config
# 亲和性调度
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- order-service
topologyKey: kubernetes.io/hostname
3.2 Deployment配置
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
labels:
app: order-service
spec:
replicas: 3
# 滚动更新策略
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 最多超出1个Pod
maxUnavailable: 0 # 不能少于期望副本数
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
version: v1
spec:
containers:
- name: order-service
image: registry.example.com/order-service:v1.2.3
ports:
- containerPort: 8080
# 滚动更新探针
readinessGates:
- conditionType: "PrometheusReady"
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
四、服务网格Istio
yaml
# Istio VirtualService:流量管理
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: order-service
subset: v2
weight: 100
- route:
- destination:
host: order-service
subset: v1
weight: 90
- destination:
host: order-service
subset: v2
weight: 10 # 金丝雀:10%流量到v2
---
# DestinationRule:熔断配置
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: order-service
spec:
host: order-service
trafficPolicy:
outlierDetection:
consecutiveGatewayErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 100
http2MaxRequests: 1000
五、踩坑实录
坑1:Pod启动顺序问题
我们有个服务需要连接数据库,但Pod启动时数据库还没完全准备好,导致启动失败,CrashLoopBackOff。
解决:使用Init Container等待依赖服务就绪:
yaml
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c', 'until nc -z mysql 3306; do sleep 2; done']
坑2:OOMKilled但没有告警
Pod被OOMKill杀掉,但没有告警,因为Pod直接消失了。
解决:
- 设置合理的资源limits
- 添加OOMKilled告警
- 使用PreStop Hook优雅关闭
六、总结
云原生架构的核心要点:
- 容器化:环境一致,快速部署
- 编排:Kubernetes自动化管理
- 微服务:独立部署,独立扩缩容
- 服务网格:流量管理,可观测性
- 声明式配置:GitOps,以代码管理基础设施
云原生不是目的,提升交付效率才是。
个人观点,仅供参考