1. 简介与核心作用
etcd 是 K8S 集群的核心数据存储,采用 Raft 共识算法保证分布式一致性。存储内容:
- Pod、Service、ConfigMap、Secret 等资源对象元数据
- API Server 所有状态信息
- 集群自举信息、调度器与 Controller Manager 状态
存储架构
plaintext
┌────────────────────────────────────────────────────────────┐
│ etcd 存储架构 │
├────────────────────────────────────────────────────────────┤
│ Client → API Server → WAL (预写日志) → Boltdb (KV存储) │
│ ↓ │
│ MVCC 多版本控制 + Watch 机制 │
└────────────────────────────────────────────────────────────┘
2. 工作原理
2.1 Raft 选举与日志复制
Leader 选举
plaintext
Follower ──(选举超时150-300ms)──► Candidate ──(获得多数票)──► Leader
↑
└──(未获多数)──► 重试选举
Term 概念:每个任期最多一个 Leader,节点通过 Term 判断过期信息。
日志复制流程
plaintext
Client → Leader: PUT key=value
│
▼
┌─────────────┐
│ 写入本地 WAL │
└──────┬──────┘
▼
广播 AppendEntries RPC
│ │ │ │
▼ ▼ ▼ ▼
Node2 Node3 Node4 Node5
│ │ │ │
└────────┴────────┴────────┘
│ (多数派确认 3/5)
▼
┌─────────────┐
│ 应用到状态机 │
│ 返回客户端 │
└─────────────┘
日志条目结构:
Index │ Term │ Data │ Committed
1 │ 1 │ kv1 │ ✓
2 │ 1 │ kv2 │ ✓
3 │ 2 │ kv3 │ ✓
4 │ 2 │ kv4 │ -
2.2 读写流程
- 写: Leader → WAL → 广播 Followers → 多数派确认 → 状态机
- 读: Leader 直接读本地;Follower 可转发 Leader 读
- Watch: gRPC 流实时推送 key 变化事件
2.3 MVCC 多版本控制
bash
etcdctl get --rev=100 /registry/pods/default # 读取历史版本
etcdctl compaction 10000 # 压缩历史
3. 集群部署
3.1 kubeadm 静态 Pod 部署
yaml
# /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --data-dir=/var/lib/etcd
- --wal-dir=/var/lib/etcd/wal
- --name=node1
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --client-cert-auth=true
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --peer-client-cert-auth=true
- --listen-peer-urls=https://0.0.0.0:2380
- --listen-client-urls=https://0.0.0.0:2379
- --advertise-client-urls=https://192.168.1.10:2379
- --initial-cluster=node1=https://192.168.1.10:2380,node2=https://192.168.1.11:2380,node3=https://192.168.1.12:2380
- --initial-cluster-state=new
- --initial-cluster-token=etcd-cluster
- --quota-backend-bytes=8589934592
- --auto-compaction-retention=1
- --heartbeat-interval=500
- --election-timeout=2500
image: registry.k8s.io/etcd:3.5.9
livenessProbe:
httpGet:
host: 192.168.1.10
path: /health
port: 2379
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
3.2 关键参数
表格
| 参数 | 说明 | 推荐值 |
|---|---|---|
--quota-backend-bytes |
数据库配额 | 8GB |
--heartbeat-interval |
心跳间隔(ms) | 500 |
--election-timeout |
选举超时(ms) | 5000 |
--auto-compaction-retention |
自动压缩(小时) | 1 |
--snapshot-count |
快照触发事务数 | 5000 |
4. 常用操作命令
4.1 健康检查
bash
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
etcdctl member list -w table # 成员列表
etcdctl endpoint health -w table # 健康状态
etcdctl endpoint status -w table # 状态详情
4.2 数据操作
bash
etcdctl put /registry/pods/default/nginx '{"apiVersion":"v1"}' # 写入
etcdctl get /registry/pods/default/nginx # 读取
etcdctl get --prefix /registry/pods/ # 前缀查询
etcdctl del /registry/pods/default/nginx # 删除
etcdctl get --rev=100 /registry/pods/default # 历史版本
4.3 备份与恢复
bash
# 快照备份
etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db
# 检查快照
etcdctl snapshot status /backup/etcd-snapshot.db -w table
# 恢复快照
etcdctl snapshot restore /backup/etcd-snapshot.db \
--name=node1 \
--initial-cluster=node1=https://192.168.1.10:2380,node2=https://192.168.1.11:2380 \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://192.168.1.10:2380 \
--data-dir=/var/lib/etcd
4.4 成员管理
bash
etcdctl member add node4 --peer-urls=https://192.168.1.13:2380 # 添加
etcdctl member remove <member_id> # 移除
etcdctl member update <member_id> --peer-urls=https://... # 更新
4.5 维护操作
bash
etcdctl defrag --endpoints=$ENDPOINTS # 碎片整理(必须定期)
etcdctl compaction <revision> # 压缩历史版本
etcdctl alarm disarm # 取消空间告警
5. 常见问题与排查
5.1 集群脑裂 / 成员故障
排查步骤:
bash
# 1. 检查日志
kubectl logs -n kube-system etcd-node1 --tail=100
# 2. 成员状态
etcdctl member list
# 3. 网络连通性
nc -zv <peer_ip> 2380
# 4. 检查选举超时一致性
grep -E "heartbeat|election" /etc/kubernetes/manifests/etcd.yaml
解决方案:
bash
# 重启故障节点
sudo systemctl restart etcd
# 移除不可用节点
etcdctl member remove <故障节点ID>
# 完全不可用时从快照恢复
5.2 数据库空间不足
错误 : etcdserver: mvcc: database space exceeded
排查:
bash
etcdctl endpoint status -w table # 查看配额使用
du -sh /var/lib/etcd/ # 查看实际大小
解决:
bash
# 1. 取消告警
etcdctl alarm disarm
# 2. 压缩历史版本
REVISION=$(etcdctl endpoint status --write-out=json | jq -r '.[0].Status.header.revision')
etcdctl compaction $((REVISION - 1000))
# 3. 碎片整理
etcdctl defrag --endpoints=$ENDPOINTS
# 4. 验证
etcdctl endpoint status -w table
预防: 配额设 8GB + auto-compaction + 定期备份
5.3 API Server 连接超时
排查:
bash
# 1. etcd 服务状态
kubectl get pods -n kube-system -l component=etcd
# 2. 健康状态
etcdctl endpoint health
# 3. API Server 日志
kubectl logs -n kube-system kube-apiserver-<node> --tail=50 | grep -i etcd
# 4. 证书检查
openssl x509 -in /etc/kubernetes/pki/etcd/server.crt -noout -dates
常见原因: 证书过期、CN 不匹配、网络不通、防火墙阻断
6. 最佳实践
6.1 集群规模建议
表格
| 规模 | 节点 | CPU | 内存 | 磁盘 |
|---|---|---|---|---|
| <100节点 | 1-3 | 2C | 4G | 50GB SSD |
| 100-500 | 3 | 4C | 8G | 100GB SSD |
| 500+ | 3-5 | 8C | 16G | 200GB SSD |
6.2 硬件配置
- 磁盘: NVMe SSD,WAL 独立分区
- 网络: 10Gbps,建议独立网络隔离
- 内存: 8GB 以上,内存直接影响性能
6.3 备份策略
bash
#!/bin/bash
# /usr/local/bin/etcd-backup.sh
BACKUP_DIR=/backup/etcd
DATE=$(date +%Y%m%d%H%M%S)
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
etcdctl snapshot save ${BACKUP_DIR}/etcd-snapshot-${DATE}.db
find ${BACKUP_DIR} -name "etcd-snapshot-*.db" -mtime +7 -delete
6.4 灾难恢复流程
bash
# 1. 停止控制平面
sudo systemctl stop kube-apiserver kube-controller-manager kube-scheduler
# 2. 停止 etcd
sudo systemctl stop etcd
# 3. 清理数据
sudo mv /var/lib/etcd /var/lib/etcd.bak
# 4. 恢复快照
etcdctl snapshot restore /backup/etcd-snapshot.db \
--name=node1 \
--initial-cluster=node1=https://192.168.1.10:2380,node2=https://192.168.1.11:2380 \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://192.168.1.10:2380 \
--data-dir=/var/lib/etcd
# 5. 启动并验证
sudo systemctl start etcd
etcdctl endpoint health
kubectl get nodes
6.5 监控指标
关键 Prometheus 指标:
etcd_server_leader_changes_seen_total- Leader 变更次数etcd_mvcc_db_total_size_in_bytes- 数据库大小etcd_server_quota_backend_bytes- 配额使用率etcd_network_peer_round_trip_time_seconds- 节点延迟
参考 : K8S etcd 文档 | etcd 官方