开篇引言
在前两篇《FastAPI进阶》和《FastAPI进阶2》中,我们深入探讨了FastAPI的核心机制、异步任务处理、gRPC通信、分布式锁、缓存策略以及WebSocket实时通信,并通过企业级订单管理系统展示了这些技术的综合应用。
现在,随着业务规模的持续增长和云原生技术的普及,开发者面临着新的挑战:
如何将FastAPI应用容器化并部署到Kubernetes?
如何实现服务的自动扩缩容和负载均衡?
如何建立完整的可观测性体系(监控、日志、追踪)?
如何实施CI/CD流水线实现持续交付? - 如何保障生产环境的高可用和灾难恢复?
本文将深入探讨云原生架构与DevOps最佳实践,带你从"部署应用"进阶到"运维自动化"。我们将聚焦于:
1. Docker容器化 - 构建一致的应用运行环境
2. Kubernetes部署 - 实现容器编排和自动化管理
3. 服务网格 - 使用Istio实现流量管理和安全
4. 可观测性 - 监控、日志、分布式追踪一体化
5. CI/CD流水线 - 自动化测试、构建、部署
6. 高可用架构 - 多可用区部署、故障恢复、蓝绿部署
这些技术点的价值在于:它们是实现云原生转型、提升运维效率、保障系统稳定性的必备能力,也是区分初级运维和DevOps工程师的关键技能。
一、Docker容器化
概念解析
Docker是一个开源的应用容器引擎,让开发者可以打包应用及其依赖包到一个轻量级、可移植的容器中,实现"一次构建,到处运行"。
为什么需要容器化?
传统部署方式的问题:
环境不一致:开发、测试、生产环境差异导致"在我的机器上能跑"
依赖冲突:不同应用依赖同一库的不同版本
部署复杂:手动配置环境,容易出错
资源浪费:每个应用独占虚拟机,资源利用率低
容器化的优势:
|-------|-----------|---------|
| 特性 | 传统部署 | 容器化部署 |
| 环境一致性 | 低(环境差异大) | 高(完全一致) |
| 启动速度 | 分钟级 | 秒级 |
| 资源占用 | 高(独占虚拟机) | 低(共享内核) |
| 可移植性 | 差(绑定特定环境) | 好(跨平台) |
| 扩缩容 | 慢 | 快 |
完整代码实现
1. 项目结构
fastapi-k8s-app/
├── app/
│ ├── __init__.py
│ ├── main.py
│ ├── config.py
│ ├── models.py
│ └── dependencies.py
├── tests/
│ ├── __init__.py
│ ├── test_main.py
│ └── conftest.py
├── Dockerfile # Docker镜像构建文件
├── docker-compose.yml # 本地开发环境
├── .dockerignore # Docker构建忽略文件
├── requirements.txt # Python依赖
├── kubernetes/
│ ├── deployment.yaml # Kubernetes部署配置
│ ├── service.yaml # Kubernetes服务配置
│ ├── ingress.yaml # Kubernetes入口配置
│ ├── configmap.yaml # 配置管理
│ └── secrets.yaml # 敏感信息管理
└── scripts/
├── entrypoint.sh # 容器启动脚本
└── healthcheck.sh # 健康检查脚本
2. Dockerfile(多阶段构建)
bash
# ==================== 阶段1:构建阶段 ====================
FROM python:3.11-slim as builder
# 设置工作目录
WORKDIR /build
# 安装构建依赖
RUN apt-get update && \
apt-get install -y --no-install-recommends \
gcc \
g++ \
make \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# 复制依赖文件
COPY requirements.txt .
# 安装Python依赖到临时目录
RUN pip install --no-cache-dir --user -r requirements.txt
# ==================== 阶段2:运行阶段 ====================
FROM python:3.11-slim
# 设置环境变量
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PATH="/root/.local/bin:$PATH"
# 设置工作目录
WORKDIR /app
# 安装运行时依赖
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libpq5 \
curl \
&& rm -rf /var/lib/apt/lists/*
# 创建非root用户
RUN useradd -m -u 1000 appuser && \
chown -R appuser:appuser /app
# 从构建阶段复制依赖
COPY --from=builder /root/.local /root/.local
# 复制应用代码
COPY --chown=appuser:appuser app ./app
COPY --chown=appuser:appuser scripts ./scripts
COPY --chown=appuser:appuser requirements.txt .
# 设置脚本权限
RUN chmod +x scripts/*.sh
# 切换到非root用户
USER appuser
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# 暴露端口
EXPOSE 8000
# 设置入口点
ENTRYPOINT ["scripts/entrypoint.sh"]
# 启动命令
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
多阶段构建的优势:
1. 减小镜像大小:构建工具不包含在最终镜像中
2. 提高安全性:减少攻击面
3. 加速构建:利用Docker层缓存
4. 分离关注点:构建环境和运行环境分离
3. .dockerignore
python
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
Testing
python
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
IDE
python
.vscode/
.idea/
*.swp
*.swo
*~
Git
python
.git/
.gitignore
# Docker
Dockerfile
docker-compose.yml
.dockerignore
Kubernetes
python
kubernetes/
# CI/CD
.github/
.gitlab-ci.yml
Documentation
python
*.md
docs/
# Logs
*.log
OS
python
.DS_Store
Thumbs.db
4. docker-compose.yml(本地开发)
python
version: '3.8'
services:
# FastAPI应用
api:
build:
context: .
dockerfile: Dockerfile
container_name: fastapi-app
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://appuser:apppass@postgres:5432/fastapi
- REDIS_URL=redis://redis:6379/0
- LOG_LEVEL=DEBUG
volumes:
- ./app:/app/app:ro # 开发模式:挂载代码实现热重载
- ./logs:/app/logs
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- app-network
# PostgreSQL数据库
postgres:
image: postgres:15-alpine
container_name: fastapi-postgres
environment:
POSTGRES_DB: fastapi
POSTGRES_USER: appuser
POSTGRES_PASSWORD: apppass
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- app-network
# Redis缓存
redis:
image: redis:7-alpine
container_name: fastapi-redis
command: redis-server --appendonly yes
ports:
- "6379:6379"
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- app-network
# Celery Worker
celery-worker:
build:
context: .
dockerfile: Dockerfile
container_name: fastapi-celery-worker
command: celery -A app.tasks worker --loglevel=info --concurrency=4
environment:
- DATABASE_URL=postgresql://appuser:apppass@postgres:5432/fastapi
- REDIS_URL=redis://redis:6379/0
volumes:
- ./app:/app/app:ro
- ./logs:/app/logs
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
restart: unless-stopped
networks:
- app-network
# Celery Beat(定时任务)
celery-beat:
build:
context: .
dockerfile: Dockerfile
container_name: fastapi-celery-beat
command: celery -A app.tasks beat --loglevel=info
environment:
- DATABASE_URL=postgresql://appuser:apppass@postgres:5432/fastapi
- REDIS_URL=redis://redis:6379/0
volumes:
- ./app:/app/app:ro
depends_on:
redis:
condition: service_healthy
restart: unless-stopped
networks:
- app-network
# Prometheus监控
prometheus:
image: prom/prometheus:latest
container_name: fastapi-prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
restart: unless-stopped
networks:
- app-network
# Grafana可视化
grafana:
image: grafana/grafana:latest
container_name: fastapi-grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
depends_on:
- prometheus
restart: unless-stopped
networks:
- app-network
volumes:
postgres-data:
driver: local
redis-data:
driver: local
prometheus-data:
driver: local
grafana-data:
driver: local
networks:
app-network:
driver: bridge
5. 脚本文件
scripts/entrypoint.sh
bash
#!/bin/bash
set -e
echo "=========================================="
echo "FastAPI应用启动脚本"
echo "=========================================="
# 等待依赖服务就绪
echo "等待数据库就绪..."
until python -c "import psycopg2; psycopg2.connect('${DATABASE_URL}')" 2>/dev/null; do
echo "数据库未就绪,等待5秒..."
sleep 5
done
echo "数据库已就绪!"
echo "等待Redis就绪..."
until python -c "import redis; redis.from_url('${REDIS_URL}').ping()" 2>/dev/null; do
echo "Redis未就绪,等待5秒..."
sleep 5
done
echo "Redis已就绪!"
# 运行数据库迁移
echo "执行数据库迁移..."
if [ -f "alembic.ini" ]; then
alembic upgrade head
else
echo "未找到alembic配置,跳过迁移"
fi
# 创建日志目录
mkdir -p logs
# 检查是否需要安装开发依赖
if [ "${ENVIRONMENT:-production}" = "development" ]; then
echo "开发模式:安装开发依赖"
pip install --quiet uvicorn[standard] pytest pytest-asyncio pytest-cov
fi
# 启动应用
echo "=========================================="
echo "启动FastAPI应用..."
echo "环境: ${ENVIRONMENT:-production}"
echo "调试模式: ${DEBUG:-false}"
echo "=========================================="
# 执行传入的命令
exec "$@"
scripts/healthcheck.sh
bash
#!/bin/bash
# 健康检查脚本
curl -f http://localhost:8000/health || exit 1
6. Docker最佳实践
镜像优化
bash
# 1. 使用多阶段构建减小镜像大小
# 已在Dockerfile中实现
# 2. 使用Alpine基础镜像(注意兼容性)
FROM python:3.11-alpine
# 3. 清理apt缓存
RUN apt-get update && \
apt-get install -y --no-install-recommends libpq5 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# 4. 使用.dockerignore排除不必要的文件
# 已在.dockerignore中实现
# 5. 合并RUN命令减少镜像层数
RUN apt-get update && \
apt-get install -y libpq5 && \
pip install --no-cache-dir -r requirements.txt && \
apt-get clean
安全加固
bash
# 1. 使用非root用户运行
RUN useradd -m -u 1000 appuser
USER appuser
# 2. 只开放必要的端口
EXPOSE 8000
# 3. 使用最小化的基础镜像
FROM python:3.11-slim
# 4. 定期更新基础镜像
FROM python:3.11.5-slim # 使用具体版本而非latest
# 5. 扫描镜像漏洞
# 使用trivy
# trivy image fastapi-app:latest
Docker命令速查
bash
# ==================== 构建镜像 ====================
# 构建镜像
docker build -t fastapi-app:latest .
# 构建镜像(不使用缓存)
docker build --no-cache -t fastapi-app:latest .
# 构建镜像(指定Dockerfile)
docker build -f Dockerfile.prod -t fastapi-app:prod .
# ==================== 运行容器 ====================
# 运行容器
docker run -d -p 8000:8000 --name fastapi-container fastapi-app:latest
# 运行容器(挂载卷)
docker run -d -p 8000:8000 -v ./logs:/app/logs fastapi-app:latest
# 运行容器(设置环境变量)
docker run -d -p 8000:8000 -e DEBUG=true fastapi-app:latest
# ==================== Docker Compose ====================
# 启动所有服务
docker-compose up -d
# 停止所有服务
docker-compose down
# 查看日志
docker-compose logs -f api
# 重启服务
docker-compose restart api
# 重新构建并启动
docker-compose up -d --build
# ==================== 容器管理 ====================
# 查看运行中的容器
docker ps
# 查看所有容器
docker ps -a
# 查看容器日志
docker logs fastapi-container
# 实时查看日志
docker logs -f fastapi-container
# 进入容器
docker exec -it fastapi-container /bin/bash
# 停止容器
docker stop fastapi-container
# 删除容器
docker rm fastapi-container
# ==================== 镜像管理 ====================
# 查看本地镜像
docker images
# 删除镜像
docker rmi fastapi-app:latest
# 删除未使用的镜像
docker image prune -a
# 推送镜像到仓库
docker tag fastapi-app:latest registry.example.com/fastapi-app:latest
docker push registry.example.com/fastapi-app:latest
# 拉取镜像
docker pull registry.example.com/fastapi-app:latest
# ==================== 网络管理 ====================
# 查看网络
docker network ls
# 创建网络
docker network create app-network
# 连接容器到网络
docker network connect app-network fastapi-container
# ==================== 卷管理 ====================
# 查看卷
docker volume ls
# 创建卷
docker volume create pg-data
# 删除卷
docker volume rm pg-data
# ==================== 清理 ====================
# 清理停止的容器
docker container prune
# 清理未使用的镜像
docker image prune -a
# 清理未使用的卷
docker volume prune
# 清理未使用的网络
docker network prune
# 清理所有未使用的资源
docker system prune -a --volumes
二、Kubernetes部署
概念解析
Kubernetes(K8s)是一个开源的容器编排平台,用于自动化部署、扩展和管理容器化应用。
Kubernetes核心概念:
|------------|------------------|----------|
| 概念 | 说明 | 类比 |
| Pod | 最小部署单元,包含一个或多个容器 | 服务器上的进程组 |
| Deployment | 管理Pod的副本数量和更新策略 | 应用管理器 |
| Service | 为Pod提供稳定的网络访问 | 负载均衡器 |
| ConfigMap | 存储配置数据 | 配置文件 |
| Secret | 存储敏感信息(密码、密钥) | 加密的配置文件 |
| Ingress | 管理外部访问规则 | 网关/路由器 |
| Namespace | 资源隔离 | 虚拟集群 |
| PV/PVC | 持久化存储 | 硬盘分区 |
为什么需要Kubernetes?
|-------|----------------|------------------|
| 挑战 | Docker Compose | Kubernetes |
| 单机部署 | ✓ | ✓ |
| 多机部署 | ✗ | ✓ |
| 自动扩缩容 | 手动 | 自动 |
| 自愈能力 | 有限 | 强大 |
| 服务发现 | 简单 | 完善 |
| 配置管理 | 环境变量 | ConfigMap/Secret |
| 滚动更新 | 简单 | 完善 |
完整代码实现
1. Deployment(应用部署)
bash
# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-app
namespace: fastapi
labels:
app: fastapi-app
version: v1
spec:
# 副本数量
replicas: 3
# 选择器
selector:
matchLabels:
app: fastapi-app
# 策略:滚动更新
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25% # 最多比副本数多25%
maxUnavailable: 25% # 最多允许25%不可用
# 模板
template:
metadata:
labels:
app: fastapi-app
version: v1
spec:
# 服务账户
serviceAccountName: fastapi-app-sa
# 容器
containers:
- name: fastapi-app
image: fastapi-app:latest
imagePullPolicy: Always
# 端口
ports:
- name: http
containerPort: 8000
protocol: TCP
# 环境变量
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: fastapi-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: fastapi-secrets
key: redis-url
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: fastapi-config
key: log-level
- name: ENVIRONMENT
value: "production"
# 资源限制
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# 健康检查
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
# 优雅关闭
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# 亲和性:尽量分散到不同节点
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- fastapi-app
topologyKey: kubernetes.io/hostname
# 优先级
priorityClassName: high-priority
# 节点选择器
# nodeSelector:
# node-type: application
---
# PodDisruptionBudget:确保最小可用副本
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: fastapi-app-pdb
namespace: fastapi
spec:
minAvailable: 2
selector:
matchLabels:
app: fastapi-app
2. Service(服务暴露)
bash
# kubernetes/service.yaml
apiVersion: v1
kind: Service
metadata:
name: fastapi-app
namespace: fastapi
labels:
app: fastapi-app
spec:
# 类型:ClusterIP(集群内部访问)
type: ClusterIP
# 选择器
selector:
app: fastapi-app
# 端口映射
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
# NodePort:节点端口访问(开发测试用)
apiVersion: v1
kind: Service
metadata:
name: fastapi-app-nodeport
namespace: fastapi
labels:
app: fastapi-app
spec:
type: NodePort
selector:
app: fastapi-app
ports:
- name: http
port: 80
targetPort: http
nodePort: 30080 # 30000-32767
# LoadBalancer:负载均衡器(云厂商支持)
apiVersion: v1
kind: Service
metadata:
name: fastapi-app-lb
namespace: fastapi
labels:
app: fastapi-app
annotations:
# AWS负载均衡器类型
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
# 云厂商特定的注解
spec:
type: LoadBalancer
selector:
app: fastapi-app
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
3. Ingress(入口规则)
bash
# kubernetes/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: fastapi-app-ingress
namespace: fastapi
annotations:
# 使用Nginx Ingress Controller
kubernetes.io/ingress.class: nginx
# SSL证书
cert-manager.io/cluster-issuer: letsencrypt-prod
# 重定向HTTP到HTTPS
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
# 限流
nginx.ingress.kubernetes.io/limit-rps: "100"
# 超时设置
nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
nginx.ingress.kubernetes.io/proxy-send-timeout: "30"
nginx.ingress.kubernetes.io/proxy-read-timeout: "30"
# 启用CORS
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-origin: "*"
# 启用压缩
nginx.ingress.kubernetes.io/enable-gzip: "true"
spec:
# TLS配置
tls:
- hosts:
- api.example.com
secretName: fastapi-app-tls
# 规则
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: fastapi-app
port:
number: 80
# Canary部署:金丝雀发布
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: fastapi-app-canary-ingress
namespace: fastapi
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10" # 10%流量
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: fastapi-app-v2
port:
number: 80
4. ConfigMap(配置管理)
bash
# kubernetes/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fastapi-config
namespace: fastapi
data:
# 应用配置
app-name: "FastAPI应用"
app-version: "1.0.0"
# 日志配置
log-level: "INFO"
log-format: "json"
# 性能配置
max-workers: "4"
max-connections: "100"
# 超时配置
request-timeout: "30"
worker-timeout: "300"
# 完整配置文件
config.yaml: |
app:
name: "FastAPI应用"
version: "1.0.0"
debug: false
database:
pool_size: 20
max_overflow: 10
pool_timeout: 30
pool_recycle: 3600
redis:
pool_size: 10
socket_timeout: 5
socket_connect_timeout: 5
cors:
allow_origins:
- "https://example.com"
allow_methods:
- "GET"
- "POST"
- "PUT"
- "DELETE"
allow_headers:
- "Content-Type"
- "Authorization"
5. Secret(敏感信息)
bash
# kubernetes/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: fastapi-secrets
namespace: fastapi
type: Opaque
data:
# Base64编码的敏感信息
# echo -n "postgresql://user:pass@host/db" | base64
database-url: cG9zdGdyZXNxbDovL3VzZXI6cGFzc0Bob3N0L2Ri
redis-url: cmVkaXM6Ly9wYXNzd29yZEByZWRpczoxMjM0LzA=
jwt-secret-key: eW91ci1qd3Qtc2VjcmV0LWtleS1oZXJl
api-key: eW91ci1hcGkta2V5
# 或者使用Kubernetes Secret生成工具
# kubectl create secret generic fastapi-secrets \
# --from-literal=database-url='postgresql://user:pass@host/db' \
# --from-literal=redis-url='redis://password@redis:1234/0' \
# --dry-run=client -o yaml > secrets.yaml
6. HPA(自动扩缩容)
bash
# kubernetes/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: fastapi-app-hpa
namespace: fastapi
spec:
# 目标Deployment
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: fastapi-app
# 最小/最大副本数
minReplicas: 3
maxReplicas: 10
# 指标
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # CPU使用率超过70%时扩容
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # 内存使用率超过80%时扩容
# 扩缩容行为
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # 每次最多扩容100%
periodSeconds: 60
- type: Pods
value: 2 # 每次最多扩容2个Pod
periodSeconds: 60
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300 # 缩容等待5分钟
policies:
- type: Percent
value: 10 # 每次最多缩容10%
periodSeconds: 300
selectPolicy: Min
7. 完整部署流程
bash
# ==================== 1. 创建命名空间 ====================
kubectl create namespace fastapi
# ==================== 2. 创建Secret(敏感信息) ====================
kubectl create secret generic fastapi-secrets \
--from-literal=database-url='postgresql://appuser:apppass@postgres-service:5432/fastapi' \
--from-literal=redis-url='redis://redis-service:6379/0' \
--from-literal=jwt-secret-key='your-secret-key-here' \
--namespace=fastapi
# ==================== 3. 创建ConfigMap(配置) ====================
kubectl apply -f kubernetes/configmap.yaml
# ==================== 4. 部署应用 ====================
kubectl apply -f kubernetes/deployment.yaml
# ==================== 5. 创建服务 ====================
kubectl apply -f kubernetes/service.yaml
# ==================== 6. 创建Ingress ====================
kubectl apply -f kubernetes/ingress.yaml
# ==================== 7. 创建HPA(自动扩缩容) ====================
kubectl apply -f kubernetes/hpa.yaml
# ==================== 8. 查看部署状态 ====================
# 查看Pod状态
kubectl get pods -n fastapi
# 查看Deployment状态
kubectl get deployment -n fastapi
# 查看Service状态
kubectl get svc -n fastapi
# 查看Ingress状态
kubectl get ingress -n fastapi
# 查看HPA状态
kubectl get hpa -n fastapi
# ==================== 9. 查看日志 ====================
# 查看所有Pod日志
kubectl logs -n fastapi -l app=fastapi-app --tail=100
# 实时查看日志
kubectl logs -n fastapi -l app=fastapi-app -f
# 查看特定Pod日志
kubectl logs -n fastapi <pod-name> --tail=100
# ==================== 10. 更新部署 ====================
# 更新镜像
kubectl set image deployment/fastapi-app \
fastapi-app=fastapi-app:v2.0.0 \
-n fastapi
# 或者修改配置文件后应用
kubectl apply -f kubernetes/deployment.yaml
# ==================== 11. 回滚部署 ====================
# 查看历史版本
kubectl rollout history deployment/fastapi-app -n fastapi
# 回滚到上一个版本
kubectl rollout undo deployment/fastapi-app -n fastapi
# 回滚到指定版本
kubectl rollout undo deployment/fastapi-app -n fastapi --to-revision=2
# ==================== 12. 扩缩容 ====================
# 手动扩容
kubectl scale deployment/fastapi-app --replicas=5 -n fastapi
# 手动缩容
kubectl scale deployment/fastapi-app --replicas=3 -n fastapi
# ==================== 13. 删除部署 ====================
# 删除所有资源
kubectl delete -f kubernetes/
# 删除命名空间(删除所有相关资源)
kubectl delete namespace fastapi
Kubernetes最佳实践
1. 资源限制
bash
resources:
requests:
cpu: 100m # 最小CPU请求(100毫核)
memory: 128Mi # 最小内存请求
limits:
cpu: 500m # 最大CPU限制
memory: 512Mi # 最大内存限制
建议参数:
API服务:100m-500m CPU,128Mi-512Mi 内存
Worker服务:500m-2000m CPU,512Mi-2Gi 内存
数据库:1000m-4000m CPU,2Gi-8Gi 内存
2. 健康检查
bash
livenessProbe: # 存活检查:失败时重启Pod
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30 # 容器启动后30秒开始检查
periodSeconds: 10 # 每10秒检查一次
timeoutSeconds: 5 # 超时时间5秒
failureThreshold: 3 # 连续失败3次后重启
readinessProbe: # 就绪检查:失败时从Service中移除
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 10 # 容器启动后10秒开始检查
periodSeconds: 5 # 每5秒检查一次
timeoutSeconds: 3 # 超时时间3秒
failureThreshold: 2 # 连续失败2次后移除
3. 优雅关闭
bash
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"] # 等待15秒
# 或使用HTTP请求
lifecycle:
preStop:
httpGet:
path: /shutdown
port: 8000
4. 亲和性调度
bash
# Pod间亲和:Pod尽量调度在一起
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- fastapi-app
topologyKey: kubernetes.io/zone
# Pod间反亲和:Pod尽量分散调度
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- fastapi-app
topologyKey: kubernetes.io/hostname
# 节点亲和:Pod调度到特定节点
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- application
5. 污点和容忍度
bash
# 给节点添加污点
kubectl taint nodes node1 key=value:NoSchedule
# 容器容忍污点
tolerations:
- key: key
operator: Equal
value: value
effect: NoSchedule
三、服务网格
概念解析
服务网格(Service Mesh)是一个专门处理服务间通信的基础设施层,它负责:
流量管理:负载均衡、灰度发布、A/B测试
安全:服务间认证、授权、加密
可观测性:指标、日志、分布式追踪
Isti是目前最流行的服务网格实现。
为什么需要服务网格?
|-------|----------|-------|
| 问题 | 传统方案 | 服务网格 |
| 服务间认证 | 每个服务自行实现 | 统一认证 |
| 流量控制 | 手动配置 | 自动管理 |
| 可观测性 | 分散的日志 | 统一追踪 |
| 灰度发布 | 复杂的部署 | 简单的配置 |
| 熔断重试 | 每个服务实现 | 统一策略 |
Istio完整部署
1. 安装Istio
bash
# 下载Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
# 安装Istio
istioctl install --set profile=demo
# 启用自动注入
kubectl label namespace default istio-injection=enabled
kubectl label namespace fastapi istio-injection=enabled
2. VirtualService(虚拟服务)
bash
# kubernetes/istio/virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: fastapi-app
namespace: fastapi
spec:
hosts:
- "api.example.com"
gateways:
- fastapi-gateway
http:
# 基础路由
- match:
- uri:
prefix: /api
route:
- destination:
host: fastapi-app
subset: v1
weight: 100
# 灰度发布:v2版本接收10%流量
- match:
- uri:
prefix: /api
route:
- destination:
host: fastapi-app
subset: v1
weight: 90
- destination:
host: fastapi-app
subset: v2
weight: 10
# 重试策略
- match:
- uri:
prefix: /api
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,connect-failure,refused-stream
route:
- destination:
host: fastapi-app
subset: v1
# 故障注入
- match:
- uri:
prefix: /api/test
fault:
delay:
percentage:
value: 10
fixedDelay: 5s
route:
- destination:
host: fastapi-app
subset: v1
# 超时设置
- match:
- uri:
prefix: /api
timeout: 30s
route:
- destination:
host: fastapi-app
subset: v1
3. DestinationRule(目标规则)
bash
# kubernetes/istio/destinationrule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: fastapi-app
namespace: fastapi
spec:
host: fastapi-app
# 负载均衡策略
trafficPolicy:
loadBalancer:
simple: ROUND_ROBIN # 轮询
# simple: LEAST_CONN # 最少连接
# simple: RANDOM # 随机
# 连接池设置
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
maxRequestsPerConnection: 3
idleTimeout: 180s
h2UpgradePolicy: UPGRADE
# 熔断器设置
outlierDetection:
consecutiveGatewayFailure: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
minHealthPercent: 40
# 子集定义
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
4. Gateway(网关)
bash
# kubernetes/istio/gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: fastapi-gateway
namespace: fastapi
spec:
selector:
istio: ingressgateway # 使用Istio默认网关
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "api.example.com"
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
privateKey: /etc/istio/ingressgateway-certs/tls.key
hosts:
- "api.example.com"
5. ServiceEntry(外部服务)
bash
# kubernetes/istio/serviceentry.yaml
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-api
namespace: fastapi
spec:
hosts:
- "external-api.example.com"
ports:
- number: 443
name: https
protocol: HTTPS
location: MESH_EXTERNAL
resolution: DNS
6. AuthorizationPolicy(授权策略)
bash
# kubernetes/istio/authorization.yaml
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: fastapi-app-authz
namespace: fastapi
spec:
selector:
matchLabels:
app: fastapi-app
action: ALLOW
rules:
# 允许JWT认证的请求
- when:
- key: request.auth.claims[iss]
values:
- "https://auth.example.com"
# 允许特定路径
- to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/health", "/api/ready"]
# 拒绝特定IP
# action: DENY
# rules:
# - when:
# - key: source.ip
# values: ["192.168.1.100"]
7. PeerAuthentication(mTLS配置)
bash
# kubernetes/istio/peerauthentication.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: fastapi-app-mtls
namespace: fastapi
spec:
selector:
matchLabels:
app: fastapi-app
mtls:
mode: STRICT # 强制mTLS
# mode: PERMISSIVE # 兼容模式
# mode: DISABLE # 禁用mTLS
Istio实战场景
场景1:灰度发布
bash
# VirtualService配置
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: fastapi-app-canary
spec:
http:
- route:
- destination:
host: fastapi-app
subset: v1
weight: 95 # 95%流量到v1
- destination:
host: fastapi-app
subset: v2
weight: 5 # 5%流量到v2(金丝雀发布)
场景2:A/B测试
bash
# VirtualService配置
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: fastapi-app-abtest
spec:
http:
- match:
- headers:
x-test-group:
exact: "A"
route:
- destination:
host: fastapi-app
subset: feature-a
- match:
- headers:
x-test-group:
exact: "B"
route:
- destination:
host: fastapi-app
subset: feature-b
场景3:熔断降级
bash
# DestinationRule配置
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: fastapi-app-circuitbreaker
spec:
host: fastapi-app
trafficPolicy:
outlierDetection:
consecutive5xxErrors: 5 # 连续5次5xx错误
interval: 30s # 统计窗口30秒
baseEjectionTime: 30s # 熔断30秒
maxEjectionPercent: 50 # 最多50%实例被熔断
minHealthPercent: 30 # 最少30%实例健康
四、可观测性
概念解析
可观测性(Observability)是指通过系统的外部输出来理解系统内部状态的能力。可观测性包括三大支柱:
|-------------|---------------|---------------------|-------------|
| 支柱 | 说明 | 工具 | 数据类型 |
| 指标(Metrics) | 数值型数据,用于监控和告警 | Prometheus, Grafana | 计数器、仪表盘、直方图 |
| 日志(Logs) | 事件记录,用于问题诊断 | ELK Stack, Loki | 结构化日志 |
| 追踪(Tracing) | 请求链路,用于性能分析 | Jaeger, Zipkin | 分布式追踪 |
完整代码实现
1. Prometheus监控
集成FastAPI应用
python
# app/main.py
from fastapi import FastAPI
from prometheus_fastapi_instrumentator import Instrumentator
app = FastAPI()
# Prometheus指标收集
instrumentator = Instrumentator()
instrumentator.instrument(app).expose(app, endpoint="/metrics")
@app.get("/api/users/{user_id}")
async def get_user(user_id: int):
# 业务逻辑
return {"user_id": user_id}
Prometheus配置
python
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alerts/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
# FastAPI应用监控
- job_name: 'fastapi-app'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- fastapi
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Kubernetes集群监控
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# etcd监控
- job_name: 'etcd'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kube-system
scheme: https
tls_config:
ca_file: /etc/prometheus/secrets/etcd-ca
cert_file: /etc/prometheus/secrets/etcd-cert
key_file: /etc/prometheus/secrets/etcd-key
insecure_skip_verify: true
告警规则
python
# prometheus/alerts/app-alerts.yml
groups:
- name: fastapi-app
interval: 30s
rules:
# 服务不可用
- alert: ServiceDown
expr: up{job="fastapi-app"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "服务 {{ $labels.instance }} 不可用"
description: "服务 {{ $labels.instance }} 已经宕机超过1分钟"
# 高错误率
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "服务错误率过高"
description: "服务 {{ $labels.instance }} 的5xx错误率超过5%"
# 高延迟
- alert: HighLatency
expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "服务延迟过高"
description: "服务 {{ $labels.instance }} 的P99延迟超过1秒"
# 内存使用率过高
- alert: HighMemoryUsage
expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "内存使用率过高"
description: "容器 {{ $labels.instance }} 的内存使用率超过90%"
# CPU使用率过高
- alert: HighCPUUsage
expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "CPU使用率过高"
description: "容器 {{ $labels.instance }} 的CPU使用率超过80%"
2. Grafana可视化
数据源配置
python
# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
仪表盘配置
python
{
"dashboard": {
"title": "FastAPI应用监控",
"panels": [
{
"title": "请求速率",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (status)"
}
],
"type": "graph"
},
{
"title": "请求延迟",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
}
],
"type": "graph"
},
{
"title": "错误率",
"targets": [
{
"expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))"
}
],
"type": "graph"
}
]
}
}
3. ELK日志收集
Logstash配置
python
# logstash/pipeline/fastapi.conf
input {
beats {
port => 5044
}
}
filter {
# 解析JSON日志
if [fields][app] == "fastapi" {
json {
source => "message"
target => "fastapi"
}
# 提取时间戳
date {
match => ["[fastapi][timestamp]", "ISO8601"]
target => "@timestamp"
}
# 提取请求信息
grok {
match => {
"[fastapi][message]" => "%{WORD:method} %{URIPATHPARAM:path} %{INT:status}"
}
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "fastapi-logs-%{+YYYY.MM.dd}"
}
}
Filebeat配置
python
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /app/logs/*.log
fields:
app: fastapi
fields_under_root: true
json.keys_under_root: true
json.add_error_key: true
output.logstash:
hosts: ["logstash:5044"]
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
4. 分布式追踪
Jaeger集成
python
# app/main.py
from fastapi import FastAPI, Request
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
app = FastAPI()
# 配置Jaeger
resource = Resource.create({
SERVICE_NAME: "fastapi-app"
})
trace.set_tracer_provider(TracerProvider(resource=resource))
jaeger_exporter = JaegerExporter(
agent_host_name="jaeger",
agent_port=6831,
)
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(jaeger_exporter)
)
# 自动追踪
FastAPIInstrumentor.instrument_app(app)
@app.get("/api/users/{user_id}")
async def get_user(user_id: int, request: Request):
# 自动生成分布式追踪
return {"user_id": user_id}
可观测性最佳实践
1. 结构化日志
python
import logging
import json
from datetime import datetime
class JsonFormatter(logging.Formatter):
"""JSON格式日志"""
def format(self, record):
log_obj = {
"timestamp": datetime.utcnow().isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
"module": record.module,
"function": record.funcName,
"line": record.lineno,
}
if record.exc_info:
log_obj["exception"] = self.formatException(record.exc_info)
return json.dumps(log_obj)
# 配置日志
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)
# 使用
logger.info("User login", extra={"user_id": 123, "ip": "192.168.1.1"})
2. 指标设计
python
from prometheus_client import Counter, Histogram, Gauge, Summary
# 计数器:单调递增的计数
REQUEST_COUNT = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status']
)
# 仪表盘:瞬时值
ACTIVE_CONNECTIONS = Gauge(
'active_connections',
'Active database connections'
)
# 直方图:分布统计
REQUEST_LATENCY = Histogram(
'http_request_duration_seconds',
'HTTP request latency',
buckets=[0.1, 0.5, 1.0, 2.0, 5.0]
)
# 摘要:类似直方图
REQUEST_SIZE = Summary(
'http_request_size_bytes',
'HTTP request size'
)
3. 上下文追踪
python
import uuid
from contextvars import ContextVar
# 请求ID上下文
request_id_var: ContextVar[str] = ContextVar('request_id', default='')
@app.middleware("http")
async def request_id_middleware(request: Request, call_next):
# 生成请求ID
request_id = str(uuid.uuid4())
request_id_var.set(request_id)
# 添加到响应头
response = await call_next(request)
response.headers["X-Request-ID"] = request_id
return response
# 在其他地方使用
current_request_id = request_id_var.get()
五、CI/CD流水线
概念解析
CI/CD(Continuous Integration/Continuous Deployment)即持续集成/持续部署,是一套自动化流程:
1. CI(持续集成) :代码提交后自动构建、测试
2. CD(持续部署):测试通过后自动部署到生产环境
CI/CD的价值:
|------|-------------|
| 价值 | 说明 |
| 提高效率 | 自动化减少人工操作 |
| 快速反馈 | 及时发现代码问题 |
| 降低风险 | 小步快跑,快速回滚 |
| 提高质量 | 强制执行测试和代码审查 |
GitLab CI/CD完整配置
python
# .gitlab-ci.yml
stages:
- lint
- test
- build
- deploy-staging
- deploy-production
variables:
DOCKER_IMAGE: fastapi-app
DOCKER_TAG: $CI_COMMIT_SHORT_SHA
REGISTRY_URL: registry.example.com
# ==================== 阶段1:代码检查 ====================
lint:
stage: lint
image: python:3.11
script:
- pip install black flake8 mypy
- black --check app/
- flake8 app/
- mypy app/
only:
- merge_requests
- main
- develop
# ==================== 阶段2:单元测试 ====================
test:
stage: test
image: python:3.11
services:
- postgres:15
- redis:7
variables:
POSTGRES_DB: test_db
POSTGRES_USER: test_user
POSTGRES_PASSWORD: test_pass
DATABASE_URL: postgresql://test_user:test_pass@postgres:5432/test_db
REDIS_URL: redis://redis:6379/0
before_script:
- pip install -r requirements.txt
- pip install pytest pytest-cov pytest-asyncio
script:
- pytest tests/ -v --cov=app --cov-report=xml --cov-report=html
coverage: '/TOTAL.*\s+(\d+%)$/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
paths:
- htmlcov/
expire_in: 1 week
only:
- merge_requests
- main
- develop
# ==================== 阶段3:构建Docker镜像 ====================
build:
stage: build
image: docker:24
services:
- docker:24-dind
before_script:
- docker login -u $REGISTRY_USER -p $REGISTRY_PASSWORD $REGISTRY_URL
script:
- docker build -t $REGISTRY_URL/$DOCKER_IMAGE:$DOCKER_TAG .
- docker push $REGISTRY_URL/$DOCKER_IMAGE:$DOCKER_TAG
- docker tag $REGISTRY_URL/$DOCKER_IMAGE:$DOCKER_TAG $REGISTRY_URL/$DOCKER_IMAGE:latest
- docker push $REGISTRY_URL/$DOCKER_IMAGE:latest
only:
- main
- develop
# ==================== 阶段4:部署到测试环境 ====================
deploy-staging:
stage: deploy-staging
image: bitnami/kubectl:latest
environment:
name: staging
url: https://staging-api.example.com
before_script:
- kubectl config use-context staging
script:
- kubectl set image deployment/fastapi-app \
fastapi-app=$REGISTRY_URL/$DOCKER_IMAGE:$DOCKER_TAG \
-n fastapi-staging
- kubectl rollout status deployment/fastapi-app -n fastapi-staging
only:
- develop
# ==================== 阶段5:部署到生产环境 ====================
deploy-production:
stage: deploy-production
image: bitnami/kubectl:latest
environment:
name: production
url: https://api.example.com
before_script:
- kubectl config use-context production
script:
- kubectl set image deployment/fastapi-app \
fastapi-app=$REGISTRY_URL/$DOCKER_IMAGE:$DOCKER_TAG \
-n fastapi
- kubectl rollout status deployment/fastapi-app -n fastapi
when: manual # 手动触发
only:
- main
# ==================== 额外任务:安全扫描 ====================
security-scan:
stage: build
image: aquasec/trivy:latest
script:
- trivy image --exit-code 1 --severity HIGH,CRITICAL $REGISTRY_URL/$DOCKER_IMAGE:$DOCKER_TAG
allow_failure: true
only:
- main
- develop
# ==================== 额外任务:性能测试 ====================
performance-test:
stage: test
image: python:3.11
script:
- pip install locust
- locust -f tests/performance/locustfile.py --headless --users 100 --spawn-rate 10 --run-time 1m
only:
- merge_requests
GitHub Actions配置
python
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
# ==================== 代码检查 ====================
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install black flake8 mypy
- name: Run Black
run: black --check app/
- name: Run Flake8
run: flake8 app/
- name: Run MyPy
run: mypy app/
# ==================== 单元测试 ====================
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_DB: test_db
POSTGRES_USER: test_user
POSTGRES_PASSWORD: test_pass
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov pytest-asyncio
- name: Run tests
env:
DATABASE_URL: postgresql://test_user:test_pass@localhost:5432/test_db
REDIS_URL: redis://localhost:6379/0
run: |
pytest tests/ -v --cov=app --cov-report=xml --cov-report=html
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
# ==================== 构建Docker镜像 ====================
build:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: |
${{ secrets.DOCKER_USERNAME }}/fastapi-app:latest
${{ secrets.DOCKER_USERNAME }}/fastapi-app:${{ github.sha }}
# ==================== 部署到生产环境 ====================
deploy:
runs-on: ubuntu-latest
needs: build
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Set up kubectl
uses: azure/setup-kubectl@v3
with:
version: 'v1.28.0'
- name: Configure kubeconfig
run: |
mkdir -p $HOME/.kube
echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > $HOME/.kube/config
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/fastapi-app \
fastapi-app=${{ secrets.DOCKER_USERNAME }}/fastapi-app:${{ github.sha }} \
-n fastapi
kubectl rollout status deployment/fastapi-app -n fastapi
Helm Charts
Chart.yaml
python
# helm/fastapi-app/Chart.yaml
apiVersion: v2
name: fastapi-app
description: A Helm chart for FastAPI application
type: application
version: 1.0.0
appVersion: "1.0.0"
values.yaml
python
# helm/fastapi-app/values.yaml
replicaCount: 3
image:
repository: fastapi-app
pullPolicy: Always
tag: "latest"
service:
type: ClusterIP
port: 80
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: fastapi-app-tls
hosts:
- api.example.com
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
config:
logLevel: "INFO"
maxWorkers: "4"
secrets:
databaseUrl: ""
redisUrl: ""
jwtSecretKey: ""
postgresql:
enabled: true
auth:
database: fastapi
username: appuser
password: apppass
persistence:
enabled: true
size: 20Gi
redis:
enabled: true
auth:
enabled: false
persistence:
enabled: true
size: 10Gi
部署命令
python
# 添加Helm仓库
helm repo add fastapi https://charts.example.com
helm repo update
# 安装应用
helm install fastapi-app ./helm/fastapi-app \
--namespace fastapi \
--create-namespace \
--set image.tag=v1.0.0 \
--set secrets.databaseUrl="postgresql://..." \
--set secrets.redisUrl="redis://..."
# 升级应用
helm upgrade fastapi-app ./helm/fastapi-app \
--namespace fastapi \
--set image.tag=v1.1.0
# 回滚
helm rollback fastapi-app 1
# 卸载
helm uninstall fastapi-app --namespace fastapi
六、高可用架构
概念解析
高可用(High Availability, HA)是指系统通过冗余设计和故障转移机制,确保服务在部分组件故障时仍然可用。
高可用指标:
|-------|---------|---------|----------|
| 指标 | 优秀 | 良好 | 一般 |
| 可用性 | 99.99% | 99.9% | 99.5% |
| 年停机时间 | <53分钟 | <8.8小时 | <43.8小时 |
| 月停机时间 | <4.3分钟 | <43分钟 | <3.6小时 |
高可用架构原则:
1. 消除单点故障:每个组件至少有2个实例
2. 故障自动检测:健康检查和监控告警
3. 故障自动恢复:自动重启和流量切换
4. 数据冗余:主从复制、多副本存储
5. 优雅降级:部分功能不可用时提供基本服务
完整代码实现
1. 多可用区部署
python
# kubernetes/multizone/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-app
namespace: fastapi
spec:
replicas: 6 # 6个副本,分布在3个可用区
template:
spec:
# Pod反亲和:分散到不同节点
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- fastapi-app
topologyKey: topology.kubernetes.io/zone
# 节点亲和:调度到特定可用区的节点
nodeSelector:
node-type: application
# 容忍度:允许调度到有污点的节点
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
2. 数据库高可用
PostgreSQL主从复制
python
# kubernetes/postgres/primary.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-primary
namespace: fastapi
spec:
serviceName: postgres-primary
replicas: 1
selector:
matchLabels:
app: postgres-primary
template:
metadata:
labels:
app: postgres-primary
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_REPLICATION_MODE
value: "primary"
- name: POSTGRES_REPLICATION_USER
value: "repl_user"
- name: POSTGRES_REPLICATION_PASSWORD
value: "repl_password"
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
---
# kubernetes/postgres/replica.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-replica
namespace: fastapi
spec:
serviceName: postgres-replica
replicas: 2 # 2个副本
selector:
matchLabels:
app: postgres-replica
template:
metadata:
labels:
app: postgres-replica
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_REPLICATION_MODE
value: "replica"
- name: POSTGRES_PRIMARY_HOST
value: "postgres-primary-service"
- name: POSTGRES_REPLICATION_USER
value: "repl_user"
- name: POSTGRES_REPLICATION_PASSWORD
value: "repl_password"
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
读写分离
python
# app/database.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
import random
class DatabaseManager:
"""数据库管理器(支持读写分离)"""
def __init__(self, primary_url: str, replica_urls: list[str]):
self.primary_engine = create_async_engine(primary_url)
self.replica_engines = [
create_async_engine(url) for url in replica_urls
]
def get_session(self, read_only: bool = False) -> AsyncSession:
"""
获取数据库会话
Args:
read_only: 是否只读(使用副本)
"""
if read_only and self.replica_engines:
# 随机选择一个副本
engine = random.choice(self.replica_engines)
else:
# 使用主库
engine = self.primary_engine
AsyncSessionLocal = sessionmaker(
engine,
class_=AsyncSession,
expire_on_commit=False
)
return AsyncSessionLocal()
# 使用
db_manager = DatabaseManager(
primary_url="postgresql+asyncpg://...",
replica_urls=[
"postgresql+asyncpg://replica1:5432/db",
"postgresql+asyncpg://replica2:5432/db"
]
)
# 读操作(使用副本)
async def get_users(db: AsyncSession = Depends(lambda: db_manager.get_session(read_only=True))):
users = await db.execute(select(User))
return users.scalars().all()
# 写操作(使用主库)
async def create_user(user_data: UserCreate):
async with db_manager.get_session(read_only=False) as db:
user = User(**user_data)
db.add(user)
await db.commit()
await db.refresh(user)
return user
3. Redis高可用
Redis Sentinel
python
# kubernetes/redis/sentinel.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-sentinel
namespace: fastapi
spec:
replicas: 3
selector:
matchLabels:
app: redis-sentinel
template:
metadata:
labels:
app: redis-sentinel
spec:
containers:
- name: sentinel
image: redis:7
command:
- redis-sentinel
- /etc/redis/sentinel.conf
volumeMounts:
- name: sentinel-config
mountPath: /etc/redis
volumes:
- name: sentinel-config
configMap:
name: redis-sentinel-config
Sentinel配置
python
# redis-sentinel.conf
port 26379
sentinel monitor mymaster redis-master-service 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel parallel-syncs mymaster 1
使用Sentinel
python
import redis.sentinel
sentinel = redis.sentinel.Sentinel([
('sentinel1', 26379),
('sentinel2', 26379),
('sentinel3', 26379)
])
# 获取主库
master = sentinel.master_for('mymaster', socket_timeout=0.1)
# 获取从库
slave = sentinel.slave_for('mymaster', socket_timeout=0.1)
# 自动故障转移
master.set('key', 'value') # 如果主库故障,自动切换
4. 蓝绿部署
python
# kubernetes/blue-green/deployment-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-app-green
namespace: fastapi
spec:
replicas: 3
selector:
matchLabels:
app: fastapi-app
color: green
template:
metadata:
labels:
app: fastapi-app
color: green
spec:
containers:
- name: fastapi-app
image: fastapi-app:v2.0.0 # 新版本
---
# kubernetes/blue-green/service-green.yaml
apiVersion: v1
kind: Service
metadata:
name: fastapi-app-green
namespace: fastapi
spec:
selector:
app: fastapi-app
color: green
ports:
- port: 80
targetPort: 8000
切换流量
python
# 部署绿色环境
kubectl apply -f kubernetes/blue-green/deployment-green.yaml
kubectl apply -f kubernetes/blue-green/service-green.yaml
# 等待绿色环境就绪
kubectl wait --for=condition=available deployment/fastapi-app-green -n fastapi --timeout=300s
# 切换流量到绿色环境
kubectl patch service fastapi-app -n fastapi -p '{"spec":{"selector":{"color":"green"}}}'
# 确认无误后,删除蓝色环境
kubectl delete -f kubernetes/blue-green/deployment-blue.yaml
5. 自动故障转移
python
# app/healthcheck.py
from fastapi import FastAPI, HTTPException
from sqlalchemy.ext.asyncio import AsyncSession
import asyncio
app = FastAPI()
@app.get("/health")
async def health_check(db: AsyncSession = Depends(get_db)):
"""
健康检查端点
"""
try:
# 检查数据库连接
await db.execute(select(func.now()))
# 检查Redis连接
redis.ping()
# 检查其他依赖
# ...
return {
"status": "healthy",
"timestamp": datetime.now().isoformat()
}
except Exception as e:
# 健康检查失败,返回503
raise HTTPException(
status_code=503,
detail=f"Service unhealthy: {str(e)}"
)
@app.get("/ready")
async def readiness_check():
"""
就绪检查端点
"""
# 检查应用是否准备好接收流量
# ...
return {
"status": "ready"
}
# Kubernetes配置
# livenessProbe: 健康检查,失败时重启
# readinessProbe: 就绪检查,失败时从Service中移除
灾难恢复
备份策略
bash
#!/bin/bash
# scripts/backup.sh
# PostgreSQL备份
kubectl exec -n fastapi postgres-primary-0 -- pg_dump -U appuser fastapi > backup_$(date +%Y%m%d_%H%M%S).sql
# Redis备份
kubectl exec -n fastapi redis-master-0 -- redis-cli BGSAVE
# 上传到S3
aws s3 cp backup_$(date +%Y%m%d_%H%M%S).sql s3://backups/postgres/
恢复策略
bash
#!/bin/bash
# scripts/restore.sh
# 从S3下载
aws s3 cp s3://backups/postgres/backup_20240101_120000.sql .
# 恢复数据库
cat backup_20240101_120000.sql | kubectl exec -i -n fastapi postgres-primary-0 -- psql -U appuser fastapi
七、常见问题与解决方案
问题1:Docker镜像构建缓慢
症状:构建Docker镜像需要很长时间,影响CI/CD效率。
解决方案:
bash
# 1. 使用多阶段构建
FROM python:3.11-slim as builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
FROM python:3.11-slim
COPY --from=builder /root/.local /root/.local
COPY app ./app
...
# 2. 利用Docker层缓存
# 先复制依赖文件,再复制代码
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app ./app
# 3. 使用BuildKit加速
# DOCKER_BUILDKIT=1 docker build .
# 4. 使用缓存挂载
RUN --mount=type=cache,target=/root/.cache pip install -r requirements.txt
问题2:Kubernetes Pod频繁重启
症状:Pod处于CrashLoopBackOff状态,反复重启。
解决方案:
bash
# 1. 查看Pod日志
kubectl logs <pod-name> -n fastapi
# 2. 查看Pod事件
kubectl describe pod <pod-name> -n fastapi
# 3. 检查资源限制
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# 4. 检查健康检查
livenessProbe:
initialDelaySeconds: 30 # 增加初始延迟
periodSeconds: 10
failureThreshold: 3
# 5. 添加优雅关闭
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
问题3:Istio服务网格性能问题
症状:启用Istio后,服务间调用延迟增加。
解决方案:
bash
# 1. 优化mTLS模式
# PERMISSIVE模式在开发环境更好
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
spec:
mtls:
mode: PERMISSIVE
# 2. 调整连接池
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http2MaxRequests: 100
maxRequestsPerConnection: 10
# 3. 减少Sidecar资源占用
# 只注入必要的命名空间
kubectl label namespace <namespace> istio-injection=enabled
# 4. 使用Ambient Mesh(Istio 1.18+)
# 更轻量级的sidecar替代方案
问题4:Prometheus存储空间不足
症状:Prometheus Pod因磁盘空间不足被驱逐。
解决方案:
bash
# 1. 配置数据保留策略
prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
# 保留15天数据
retention: 15d
# 2. 使用远程存储
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
data:
prometheus.yml: |
remote_write:
- url: "http://thanos-receive:19291/api/v1/receive"
queue_config:
capacity: 10000
max_shards: 200
min_shards: 1
max_samples_per_send: 5000
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 100ms
# 3. 配置存储大小
volumes:
- name: prometheus-data
persistentVolumeClaim:
claimName: prometheus-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
问题5:CI/CD流水线失败率高
症状:CI/CD流水线经常因为测试失败或构建错误而失败。
解决方案:
bash
# 1. 使用缓存加速构建
.cache_template: &cache_template
cache:
key:
files:
- requirements.txt
paths:
- .venv
- $HOME/.cache/pip
# 2. 并行执行测试
test:
stage: test
parallel: 4 # 并行运行4个测试任务
script:
- pytest tests/ -k "test_${CI_NODE_INDEX}"
# 3. 添加重试机制
test:
stage: test
retry:
max: 2
when:
- script_failure
- runner_system_failure
# 4. 使用测试矩阵
test:
stage: test
parallel:
matrix:
PYTHON_VERSION: ["3.9", "3.10", "3.11"]
script:
- python -m pytest tests/
八、总结与扩展学习
本文重点回顾
本文深入探讨了FastAPI云原生架构与DevOps最佳实践的6个核心主题:
1. Docker容器化
多阶段构建、镜像优化 - Docker Compose本地开发 - 安全加固和最佳实践
2. Kubernetes部署
Deployment、Service、Ingress - ConfigMap、Secret管理 - HPA自动扩缩 容
3. 服务网格
Istio流量管理、灰度发布 - mTLS安全认证 - 熔断降级策略
4. 可观测性
Prometheus监控、Grafana可视化 - ELK日志收集 - Jaeger分布式追踪
5. CI/CD流水线
GitLab CI/CD、GitHub Actions - 自动化测试、构建、部署 - Helm Charts管 理
6. 高可用架构
多可用区部署 - 数据库主从复制 - 蓝绿部署、自动故障转移
进一步学习资源
认证考试:
CKA(Certified Kubernetes Administrator)
CKAD(Certified Kubernetes Application Developer)
CKS(Certified Kubernetes Security Specialist)
学习方向建议
短期目标(1-3个月):
掌握Docker容器化技术
理解Kubernetes核心概念
实现基础的CI/CD流水线
中期目标(3-6个月):
掌握服务网格架构
建立完整的可观测性体系
实现生产级高可用架构
长期目标(6个月以上):
研究云原生技术演进
参与开源项目贡献
成为云原生架构师
结语
云原生架构与DevOps实践是现代应用开发的必经之路。掌握Docker、Kubernetes、服务网格、可观测性、CI/CD等技术,将使你能够构建高可用、高性能、可扩展的企业级应用。
记住:
实践出真知:不要只看文档,动手部署
持续学习:技术发展迅速,保持学习热情
安全第一:始终考虑安全性和可维护性
自动化一切:减少人工操作,提高效率
希望《FastAPI进阶》系列能够帮助你从初学者成长为资深工程师,在云原生时代展现你的技术实力!
感谢你的阅读,祝你技术之路越走越宽!
《FastAPI进阶》系列文章