Docker Compose 配置文件完全指南:从基础到生产级安全实践
目录
一、基础配置解析
1.1 文件版本与基础结构
yaml
version: '3.8'
| 版本 | Docker Engine | 特性支持 |
|---|---|---|
| 3.8 | 19.03.0+ | 支持 healthcheck、deploy 等现代指令 |
| 3.7 | 18.06.0+ | 支持 init、volume 扩展配置 |
| 2.4 | 17.12.0+ | 支持 runtime、sysctls |
建议 :新项目统一使用
3.8或更高版本。
1.2 核心服务配置解析
以下是一个基础配置的详细说明:
yaml
services:
mqttTrans:
image: opmqttmd:v1.0
container_name: mqttTrans
volumes:
- /cmaapp/cmamoc_ai_op/conf:/root/app/conf:ro # 只读挂载
- /cmadata/cmamoc_ai_op/log/mqtt:/root/app/logs
network_mode: "host"
environment:
- TZ=Asia/Shanghai
restart: always
command: >
./start.sh -D
healthcheck:
test: ["CMD", "pgrep", "-f", "OpMqttMD.bin"]
interval: 30s
timeout: 10s
retries: 3
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
配置项详解
| 配置项 | 说明 | 注意事项 |
|---|---|---|
image |
指定镜像 | 建议使用具体标签,避免使用 latest |
container_name |
固定容器名 | 方便操作,但无法横向扩展 |
volumes |
数据卷挂载 | :ro 表示只读,保护配置不被篡改 |
network_mode: host |
使用宿主机网络 | 性能最高,但端口可能冲突 |
environment |
环境变量 | 敏感信息建议使用 secrets |
restart |
重启策略 | always 除非手动停止,否则总是重启 |
healthcheck |
健康检查 | 实现容器自愈机制 |
logging |
日志配置 | 限制日志大小,防止磁盘占满 |
二、安全性增强配置
2.1 以非 Root 用户运行容器
安全风险:容器默认以 root 运行,若被入侵,攻击者将获得宿主机 root 权限。
方法一:在 Dockerfile 中创建用户(推荐)
dockerfile
# Dockerfile
FROM node:18-alpine
# 创建非 root 用户
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
COPY --chown=nodejs:nodejs . .
USER nodejs
EXPOSE 3000
CMD ["node", "server.js"]
方法二:在 docker-compose.yml 中指定用户
yaml
services:
app:
image: myapp:v1.0
user: "1001:1001" # UID:GID
# 或从环境变量读取
# user: "${APP_UID:-1001}:${APP_GID:-1001}"
方法三:使用 Docker 的 user 命名空间
bash
# /etc/docker/daemon.json
{
"userns-remap": "default"
}
2.2 敏感信息管理
❌ 错误做法:明文存储密码
yaml
# 不安全!
environment:
- DB_PASSWORD=mysecretpassword
- API_KEY=sk-1234567890abcdef
✅ 正确做法一:使用 .env 文件(基础)
bash
# .env 文件 (加入 .gitignore,不要提交到版本控制)
DB_PASSWORD=mysecretpassword
API_KEY=sk-1234567890abcdef
yaml
# docker-compose.yml
services:
app:
image: myapp:v1.0
env_file:
- .env
# 或选择性引用
environment:
- DB_PASSWORD=${DB_PASSWORD}
- API_KEY=${API_KEY}
✅ 正确做法二:使用 Docker Secrets(Swarm 模式)
yaml
# docker-compose.yml (Swarm 模式)
version: '3.8'
services:
app:
image: myapp:v1.0
secrets:
- db_password
- api_key
secrets:
db_password:
external: true # 已创建的 secret
api_key:
file: ./secrets/api_key.txt # 从文件创建
创建 secret:
bash
# 创建 secret
echo "mysecretpassword" | docker secret create db_password -
# 查看 secret 列表
docker secret ls
# 删除 secret
docker secret rm db_password
✅ 正确做法三:使用 HashiCorp Vault(企业级)
yaml
services:
app:
image: myapp:v1.0
environment:
- VAULT_ADDR=http://vault:8200
volumes:
- vault-token:/run/secrets
2.3 资源限制(防止 DoS 攻击)
yaml
services:
app:
image: myapp:v1.0
deploy:
resources:
limits:
cpus: '1.0' # 限制最多使用 1 核 CPU
memory: 512M # 限制最多使用 512MB 内存
reservations:
cpus: '0.5' # 预留 0.5 核 CPU
memory: 256M # 预留 256MB 内存
# 非 Swarm 模式使用以下配置
mem_limit: 512m
memswap_limit: 512m
cpus: 1.0
资源限制对比:
| 配置 | Swarm 模式 | 单机模式 | 说明 |
|---|---|---|---|
deploy.resources.limits |
✅ | ❌ | 硬性上限 |
deploy.resources.reservations |
✅ | ❌ | 软性预留 |
mem_limit |
✅ | ✅ | 内存上限(旧语法) |
cpus |
✅ | ✅ | CPU 核心数限制 |
2.4 网络安全配置
使用自定义隔离网络(替代 host 模式)
yaml
version: '3.8'
services:
app:
image: myapp:v1.0
networks:
- frontend
- backend
ports:
- "127.0.0.1:8080:8080" # 仅绑定本地接口,不暴露到公网
# 或指定 IP
# ports:
# - "192.168.1.100:8080:8080"
database:
image: postgres:15
networks:
- backend
# 不映射端口到宿主机,仅内部网络可访问
networks:
frontend:
driver: bridge
ipam:
config:
- subnet: 172.28.0.0/16
backend:
driver: bridge
internal: true # 内部网络,无法访问外网
ipam:
config:
- subnet: 172.29.0.0/16
禁用容器间默认互通
bash
# 创建隔离网络
docker network create --driver bridge --opt com.docker.network.bridge.enable_icc=false isolated
# 在 docker-compose.yml 中使用
services:
app:
networks:
- isolated
2.5 只读文件系统与安全选项
yaml
services:
app:
image: myapp:v1.0
read_only: true # 只读根文件系统
tmpfs:
- /tmp # 临时目录使用 tmpfs
- /var/cache
- /app/tmp:size=100M,mode=1777 # 限制大小和权限
security_opt:
- no-new-privileges:true # 禁止提升权限
cap_drop:
- ALL # 先丢弃所有能力
cap_add:
- CHOWN
- SETGID
- SETUID
- NET_BIND_SERVICE # 仅添加必要能力
sysctls:
- net.core.somaxconn=1024
ulimits:
nproc: 65535
nofile:
soft: 65536
hard: 65536
Linux Capabilities 常用权限:
| Capability | 说明 | 使用场景 |
|---|---|---|
NET_BIND_SERVICE |
绑定 1024 以下端口 | Web 服务器 |
CHOWN |
修改文件所有者 | 文件操作 |
SETGID |
修改进程 GID | 权限管理 |
SETUID |
修改进程 UID | 权限管理 |
SYS_TIME |
修改系统时间 | NTP 服务 |
2.6 镜像安全最佳实践
yaml
services:
app:
# ✅ 使用官方镜像
image: nginx:1.24-alpine
# ✅ 使用具体版本标签
# image: nginx:latest ❌ 避免使用
# ✅ 使用特定 digest(最安全的固定版本方式)
# image: nginx:1.24-alpine@sha256:abc123...
# 拉取策略
pull_policy: if-not-present # 仅本地不存在时拉取
# pull_policy: always # 总是拉取最新
# pull_policy: never # 仅使用本地镜像
镜像安全检查清单:
- 使用官方镜像或可信源
- 使用具体版本标签,避免
latest - 定期扫描镜像漏洞:
docker scan <image> - 使用最小化基础镜像(alpine、distroless)
- 启用 Docker Content Trust:
export DOCKER_CONTENT_TRUST=1
三、使用体验优化
3.1 日志管理优化
使用日志聚合(ELK/Fluentd)
yaml
services:
app:
image: myapp:v1.0
logging:
driver: fluentd
options:
fluentd-address: localhost:24224
tag: docker.app
# 使用 labels/env 丰富日志
labels: "service,environment"
env: "OS,HOSTNAME"
结构化日志(JSON 格式)
yaml
services:
app:
image: myapp:v1.0
logging:
driver: json-file
options:
max-size: "100m"
max-file: "10"
labels: "service_name,version"
env: "NODE_ENV"
3.2 健康检查进阶配置
yaml
services:
web:
image: nginx:alpine
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:80/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s # 容器启动后的宽限期
disable: false
api:
image: myapi:v1.0
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres || exit 1"]
interval: 10s
timeout: 5s
retries: 5
自定义健康检查脚本:
yaml
services:
app:
image: myapp:v1.0
healthcheck:
test: ["/app/healthcheck.sh"]
volumes:
- ./scripts/healthcheck.sh:/app/healthcheck.sh:ro
healthcheck.sh 示例:
bash
#!/bin/bash
# 综合健康检查脚本
# 检查应用端口
if ! nc -z localhost 8080; then
echo "Port check failed"
exit 1
fi
# 检查 API 响应
if ! curl -sf http://localhost:8080/health; then
echo "API health check failed"
exit 1
fi
# 检查数据库连接
if ! pg_isready -h db -U app_user; then
echo "Database connection failed"
exit 1
fi
echo "All checks passed"
exit 0
3.3 优雅关闭与信号处理
yaml
services:
app:
image: myapp:v1.0
init: true # 使用 tini 作为 init 进程,正确处理僵尸进程
stop_signal: SIGTERM # 默认信号
stop_grace_period: 30s # 优雅关闭超时时间
3.4 开发环境热重载
yaml
# docker-compose.override.yml (开发环境)
version: '3.8'
services:
app:
volumes:
- ./src:/app/src:cached # 挂载源代码,cached 模式提升性能
environment:
- DEBUG=true
- NODE_ENV=development
command: ["npm", "run", "dev"] # 开发模式启动
# 或启用自动重启
# restart: unless-stopped
3.5 依赖服务管理
yaml
services:
api:
image: myapi:v1.0
depends_on:
database:
condition: service_healthy # 等待数据库健康后才启动
redis:
condition: service_started
restart: unless-stopped
database:
image: postgres:15-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
3.6 配置文件验证与格式化工具
bash
# 验证配置文件语法
docker-compose config
# 验证并展开环境变量
docker-compose -f docker-compose.yml config
# 格式化输出为 YAML
docker-compose config > docker-compose.processed.yml
四、多环境配置管理
4.1 配置文件拆分策略
project/
├── docker-compose.yml # 基础配置(通用)
├── docker-compose.override.yml # 开发环境(自动加载)
├── docker-compose.prod.yml # 生产环境
├── docker-compose.test.yml # 测试环境
├── .env # 环境变量
├── .env.production # 生产环境变量
└── .env.development # 开发环境变量
4.2 基础配置(docker-compose.yml)
yaml
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
environment:
- APP_NAME=${APP_NAME:-myapp}
networks:
- app_network
db:
image: postgres:15-alpine
environment:
- POSTGRES_DB=${DB_NAME}
- POSTGRES_USER=${DB_USER}
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- db_data:/var/lib/postgresql/data
networks:
- app_network
networks:
app_network:
driver: bridge
volumes:
db_data:
4.3 开发环境配置(docker-compose.override.yml)
yaml
version: '3.8'
services:
app:
ports:
- "3000:3000"
volumes:
- ./src:/app/src:cached
- ./config:/app/config:ro
environment:
- DEBUG=true
- LOG_LEVEL=debug
command: ["npm", "run", "dev"]
db:
ports:
- "5432:5432" # 开发环境暴露数据库端口
environment:
- POSTGRES_PASSWORD=devpassword # 开发环境简单密码
4.4 生产环境配置(docker-compose.prod.yml)
yaml
version: '3.8'
services:
app:
image: myapp:${APP_VERSION:-latest}
build: # 生产环境通常不构建
context: .
ports:
- "${APP_PORT:-8080}:8080"
environment:
- NODE_ENV=production
- LOG_LEVEL=warn
deploy:
replicas: 3
resources:
limits:
cpus: '1.0'
memory: 512M
restart: always
logging:
driver: json-file
options:
max-size: "100m"
max-file: "10"
db:
environment:
- POSTGRES_PASSWORD_FILE=/run/secrets/db_password
secrets:
- db_password
volumes:
- db_data:/var/lib/postgresql/data
- ./backups:/backups # 备份目录
deploy:
resources:
limits:
memory: 1G
restart: always
# 生产环境不暴露端口
# ports:
# - "5432:5432"
secrets:
db_password:
file: ./secrets/db_password.txt
4.5 环境启动命令
bash
# 开发环境(自动加载 override 文件)
docker-compose up -d
# 生产环境
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# 测试环境
docker-compose -f docker-compose.yml -f docker-compose.test.yml up -d
# 使用环境变量文件
docker-compose --env-file .env.production -f docker-compose.yml -f docker-compose.prod.yml up -d
4.6 Makefile 简化操作
makefile
# Makefile
.PHONY: dev prod test down logs
# 开发环境
dev:
docker-compose up -d --build
# 生产环境
prod:
docker-compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.production up -d
# 测试环境
test:
docker-compose -f docker-compose.yml -f docker-compose.test.yml up -d --abort-on-container-exit
# 停止所有服务
down:
docker-compose down -v
docker-compose -f docker-compose.yml -f docker-compose.prod.yml down -v
# 查看日志
logs:
docker-compose logs -f
# 重建镜像
build:
docker-compose build --no-cache
# 清理
clean:
docker system prune -f
docker volume prune -f
五、完整生产级配置示例
5.1 Web 应用 + 数据库 + Redis 栈
yaml
version: '3.8'
services:
# 前端 Nginx
nginx:
image: nginx:1.24-alpine
container_name: ${APP_NAME:-app}_nginx
restart: unless-stopped
ports:
- "${HTTP_PORT:-80}:80"
- "${HTTPS_PORT:-443}:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- static_files:/usr/share/nginx/html/static:ro
networks:
- frontend
depends_on:
- api
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 3
logging:
driver: json-file
options:
max-size: "100m"
max-file: "10"
labels: "service"
# API 服务
api:
image: ${API_IMAGE:-myapi}:${API_VERSION:-v1.0}
container_name: ${APP_NAME:-app}_api
restart: unless-stopped
user: "${APP_UID:-1000}:${APP_GID:-1000}"
read_only: true
environment:
- NODE_ENV=production
- DB_HOST=postgres
- DB_PORT=5432
- DB_NAME=${DB_NAME}
- DB_USER=${DB_USER}
- DB_PASSWORD_FILE=/run/secrets/db_password
- REDIS_HOST=redis
- REDIS_PORT=6379
- REDIS_PASSWORD_FILE=/run/secrets/redis_password
- JWT_SECRET_FILE=/run/secrets/jwt_secret
- LOG_LEVEL=info
secrets:
- db_password
- redis_password
- jwt_secret
tmpfs:
- /tmp:size=100M,mode=1777
- /app/logs:size=50M,mode=1777
volumes:
- static_files:/app/static
networks:
- frontend
- backend
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
deploy:
replicas: 2
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
- CHOWN
- SETGID
- SETUID
security_opt:
- no-new-privileges:true
logging:
driver: json-file
options:
max-size: "100m"
max-file: "10"
# 后台任务处理
worker:
image: ${API_IMAGE:-myapi}:${API_VERSION:-v1.0}
container_name: ${APP_NAME:-app}_worker
restart: unless-stopped
user: "${APP_UID:-1000}:${APP_GID:-1000}"
read_only: true
environment:
- NODE_ENV=production
- DB_HOST=postgres
- REDIS_HOST=redis
- WORKER_MODE=true
secrets:
- db_password
- redis_password
tmpfs:
- /tmp:size=50M,mode=1777
networks:
- backend
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
deploy:
replicas: 1
resources:
limits:
cpus: '0.5'
memory: 256M
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
logging:
driver: json-file
options:
max-size: "100m"
max-file: "5"
# PostgreSQL 数据库
postgres:
image: postgres:15-alpine
container_name: ${APP_NAME:-app}_postgres
restart: unless-stopped
environment:
- POSTGRES_DB=${DB_NAME}
- POSTGRES_USER=${DB_USER}
- POSTGRES_PASSWORD_FILE=/run/secrets/db_password
- POSTGRES_INITDB_ARGS=--auth-host=scram-sha-256
secrets:
- db_password
volumes:
- postgres_data:/var/lib/postgresql/data
- ./postgres/init:/docker-entrypoint-initdb.d:ro
- ./backups:/backups
networks:
- backend
deploy:
resources:
limits:
cpus: '1.0'
memory: 1G
reservations:
memory: 512M
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d ${DB_NAME}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
logging:
driver: json-file
options:
max-size: "100m"
max-file: "10"
# Redis 缓存
redis:
image: redis:7-alpine
container_name: ${APP_NAME:-app}_redis
restart: unless-stopped
command: >
redis-server
--requirepass $(cat /run/secrets/redis_password)
--maxmemory 256mb
--maxmemory-policy allkeys-lru
--appendonly yes
--appendfsync everysec
secrets:
- redis_password
volumes:
- redis_data:/data
networks:
- backend
deploy:
resources:
limits:
memory: 256M
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 5
logging:
driver: json-file
options:
max-size: "50m"
max-file: "5"
# 定时备份服务
backup:
image: postgres:15-alpine
container_name: ${APP_NAME:-app}_backup
restart: unless-stopped
environment:
- PGHOST=postgres
- PGDATABASE=${DB_NAME}
- PGUSER=${DB_USER}
- BACKUP_RETENTION_DAYS=7
secrets:
- db_password
volumes:
- ./backups:/backups
- ./scripts/backup.sh:/backup.sh:ro
networks:
- backend
depends_on:
- postgres
entrypoint: ["/bin/sh", "/backup.sh"]
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
networks:
frontend:
driver: bridge
internal: false
backend:
driver: bridge
internal: true # 内部网络,无法访问外网
volumes:
postgres_data:
driver: local
redis_data:
driver: local
static_files:
driver: local
secrets:
db_password:
file: ./secrets/db_password.txt
redis_password:
file: ./secrets/redis_password.txt
jwt_secret:
file: ./secrets/jwt_secret.txt
5.2 部署脚本
deploy.sh:
bash
#!/bin/bash
set -euo pipefail
# ============================================
# 生产环境部署脚本
# ============================================
APP_NAME="${APP_NAME:-myapp}"
COMPOSE_FILE="docker-compose.yml"
ENV_FILE=".env.production"
# 颜色输出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 检查依赖
check_dependencies() {
command -v docker >/dev/null 2>&1 || { log_error "Docker 未安装"; exit 1; }
command -v docker-compose >/dev/null 2>&1 || { log_error "Docker Compose 未安装"; exit 1; }
}
# 检查 secrets 文件
check_secrets() {
local secrets_dir="./secrets"
local required_secrets=("db_password.txt" "redis_password.txt" "jwt_secret.txt")
if [[ ! -d "$secrets_dir" ]]; then
log_error "secrets 目录不存在: $secrets_dir"
exit 1
fi
for secret in "${required_secrets[@]}"; do
if [[ ! -f "$secrets_dir/$secret" ]]; then
log_error "缺少 secret 文件: $secrets_dir/$secret"
exit 1
fi
# 检查文件权限
local perms
perms=$(stat -c "%a" "$secrets_dir/$secret" 2>/dev/null || stat -f "%Lp" "$secrets_dir/$secret")
if [[ "$perms" != "600" && "$perms" != "400" ]]; then
log_warn "secret 文件权限过于开放: $secret (当前: $perms, 建议: 600)"
chmod 600 "$secrets_dir/$secret"
fi
done
log_info "Secrets 检查通过"
}
# 创建必要目录
create_directories() {
local dirs=("backups" "logs")
for dir in "${dirs[@]}"; do
if [[ ! -d "$dir" ]]; then
mkdir -p "$dir"
log_info "创建目录: $dir"
fi
done
}
# 拉取最新镜像
pull_images() {
log_info "拉取最新镜像..."
docker-compose --env-file "$ENV_FILE" -f "$COMPOSE_FILE" pull
}
# 执行部署
deploy() {
log_info "开始部署 $APP_NAME..."
# 滚动更新
docker-compose --env-file "$ENV_FILE" -f "$COMPOSE_FILE" up -d --no-deps --build api
# 等待健康检查
log_info "等待服务健康检查..."
sleep 10
# 检查服务状态
local unhealthy
unhealthy=$(docker-compose --env-file "$ENV_FILE" -f "$COMPOSE_FILE" ps -q | xargs docker inspect --format='{{.State.Health.Status}}' 2>/dev/null | grep -v "healthy" || true)
if [[ -n "$unhealthy" ]]; then
log_error "部分服务未通过健康检查"
docker-compose --env-file "$ENV_FILE" -f "$COMPOSE_FILE" ps
exit 1
fi
log_info "部署成功!"
docker-compose --env-file "$ENV_FILE" -f "$COMPOSE_FILE" ps
}
# 清理旧资源
cleanup() {
log_info "清理旧资源..."
docker system prune -f
docker volume prune -f
}
# 主函数
main() {
check_dependencies
check_secrets
create_directories
pull_images
deploy
cleanup
log_info "部署完成!"
}
main "$@"
5.3 备份脚本
scripts/backup.sh:
bash
#!/bin/bash
set -euo pipefail
# 数据库备份脚本
BACKUP_DIR="/backups"
RETENTION_DAYS="${BACKUP_RETENTION_DAYS:-7}"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/backup_${DATE}.sql"
# 从文件读取密码
if [[ -f /run/secrets/db_password ]]; then
export PGPASSWORD=$(cat /run/secrets/db_password)
fi
# 执行备份
echo "开始备份数据库: ${PGDATABASE}"
pg_dump -h "$PGHOST" -U "$PGUSER" -d "$PGDATABASE" \
--verbose \
--no-owner \
--no-acl \
--format=custom \
--file="${BACKUP_FILE}.dump"
# 同时生成 SQL 格式(便于查看)
pg_dump -h "$PGHOST" -U "$PGUSER" -d "$PGDATABASE" \
--verbose \
--no-owner \
--no-acl \
> "${BACKUP_FILE}"
# 压缩
gzip -f "${BACKUP_FILE}"
gzip -f "${BACKUP_FILE}.dump"
echo "备份完成: ${BACKUP_FILE}.gz"
# 清理旧备份
echo "清理 ${RETENTION_DAYS} 天前的备份..."
find "$BACKUP_DIR" -name "backup_*.sql.gz" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "backup_*.sql.dump.gz" -mtime +$RETENTION_DAYS -delete
echo "备份任务完成"
六、常见问题排查
6.1 容器无法启动
bash
# 查看容器日志
docker-compose logs <service_name>
# 进入容器调试
docker-compose exec <service_name> sh
# 检查容器详情
docker-compose ps
docker inspect <container_id>
6.2 权限问题
bash
# 修复文件权限
sudo chown -R 1000:1000 ./data
# 查看容器内用户
id
# 以 root 进入容器调试
docker-compose exec -u root <service_name> sh
6.3 网络问题
bash
# 查看网络列表
docker network ls
# 查看网络详情
docker network inspect <network_name>
# 测试容器间连通性
docker-compose exec <service_a> ping <service_b>
6.4 资源限制问题
bash
# 查看容器资源使用
docker stats
# 查看容器 OOM 事件
docker events --filter event=oom
附录:安全检查清单
部署前检查
- 使用非 root 用户运行容器
- 敏感信息存储在 secrets 或环境变量文件中(不提交到 Git)
- 设置了资源限制(CPU、内存)
- 配置了健康检查
- 配置了日志限制
- 使用了自定义隔离网络
- 端口仅绑定到必要接口(避免
0.0.0.0) - 使用了只读文件系统(
read_only: true) - 丢弃了不必要的 Linux capabilities
- 禁用了特权提升(
no-new-privileges:true) - 使用了具体的镜像版本标签
- 扫描了镜像漏洞
运行时检查
- 定期更新基础镜像
- 监控容器资源使用
- 配置日志聚合和告警
- 定期备份数据
- 启用 Docker 审计日志