🔥 为什么StorageClass是K8s存储的灵魂?
在传统运维中,我们经常遇到这些痛点:
-
• 存储资源分配混乱:手动创建PV,容易出错且效率低下
-
• 多租户隔离困难:不同业务线的存储需求无法有效区分
-
• 扩容操作繁琐:业务增长时,存储扩容需要大量人工干预
StorageClass的出现完美解决了这些问题,它就像是K8s存储的"智能调度器"。
StorageClass核心原理解析
# 高性能SSD存储类配置
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd-ssd
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: rbd.csi.ceph.com
parameters:
clusterID: b9127830-b4cc-4e86-9b1d-991b12c4b754
pool: k8s-ssd-pool
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: ceph-csi-rbd-secret
csi.storage.k8s.io/provisioner-secret-namespace: kube-system
csi.storage.k8s.io/controller-expand-secret-name: ceph-csi-rbd-secret
csi.storage.k8s.io/controller-expand-secret-namespace: kube-system
csi.storage.k8s.io/node-stage-secret-name: ceph-csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: kube-system
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
🚀 Ceph集成实战:零停机部署方案
环境准备清单
在开始之前,确保你的环境满足以下条件:
-
• Kubernetes集群:1.20+版本
-
• Ceph集群:Pacific 16.x或更新版本
-
• 节点规格:每个工作节点至少4核8GB内存
-
• 网络要求:集群内网带宽≥1Gbps
第一步:部署Ceph-CSI驱动
# 下载官方部署文件
curl -O https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml
curl -O https://raw.githubusercontent.com/ceph/ceph-csi/master/deploy/rbd/kubernetes/csi-rbdplugin.yaml
# 创建专用命名空间
kubectl create namespace ceph-csi-rbd
# 部署CSI驱动
kubectl apply -f csi-rbdplugin-provisioner.yaml
kubectl apply -f csi-rbdplugin.yaml
⚠️ 生产环境注意事项:
-
• 建议为CSI Pod设置资源限制
-
• 启用Pod反亲和性确保高可用
-
• 配置监控告警机制
第二步:创建Ceph认证密钥
# 在Ceph集群中创建专用用户
ceph auth get-or-create client.kubernetes mon 'profile rbd' osd 'profile rbd pool=k8s-pool'
# 获取密钥信息
ceph auth get-key client.kubernetes | base64
# ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-csi-rbd-secret
namespace: kube-system
type: Opaque
data:
userID: a3ViZXJuZXRlcw== # kubernetes base64编码
userKey: QVFBTmVsWmZ... # 你的密钥base64编码
第三步:配置ConfigMap
# ceph-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: ceph-csi-config
namespace: kube-system
data:
config.json: |-
[
{
"clusterID": "b9127830-b4cc-4e86-9b1d-991b12c4b754",
"monitors": [
"192.168.1.100:6789",
"192.168.1.101:6789",
"192.168.1.102:6789"
]
}
]
💪 高级特性配置
多层存储策略设计
# 高性能存储类 - 适用于数据库
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd-high-perf
labels:
storage.tier: "high-performance"
provisioner: rbd.csi.ceph.com
parameters:
clusterID: b9127830-b4cc-4e86-9b1d-991b12c4b754
pool: ssd-pool
imageFeatures: layering,exclusive-lock,object-map,fast-diff
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# 标准存储类 - 适用于一般应用
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd-standard
labels:
storage.tier: "standard"
provisioner: rbd.csi.ceph.com
parameters:
clusterID: b9127830-b4cc-4e86-9b1d-991b12c4b754
pool: hdd-pool
imageFeatures: layering
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
---
# 冷存储类 - 适用于备份归档
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd-cold
labels:
storage.tier: "cold"
provisioner: rbd.csi.ceph.com
parameters:
clusterID: b9127830-b4cc-4e86-9b1d-991b12c4b754
pool: cold-pool
imageFeatures: layering
reclaimPolicy: Retain
allowVolumeExpansion: false
volumeBindingMode: WaitForFirstConsumer
智能PVC模板
# 数据库专用PVC模板
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-data-pvc
labels:
app: mysql
tier: database
spec:
accessModes:
- ReadWriteOnce
storageClassName: ceph-rbd-high-perf
resources:
requests:
storage: 100Gi
selector:
matchLabels:
storage.tier: "high-performance"
🔧 性能调优秘籍
Ceph集群优化
# 1. 调整OSD线程池大小
ceph tell osd.* injectargs '--osd-op-threads=8'
ceph tell osd.* injectargs '--osd-disk-threads=4'
# 2. 优化RBD缓存
ceph tell osd.* injectargs '--rbd-cache=true'
ceph tell osd.* injectargs '--rbd-cache-size=268435456' # 256MB
# 3. 调整PG数量(重要!)
# 计算公式:(OSD数量 × 100) / 副本数 / 存储池数量
ceph osd pool set k8s-pool pg_num 256
ceph osd pool set k8s-pool pgp_num 256
K8s节点优化
# 优化内核参数
cat >> /etc/sysctl.conf << EOF
# RBD性能优化
vm.dirty_ratio = 5
vm.dirty_background_ratio = 2
vm.swappiness = 1
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
EOF
sysctl -p
CSI驱动资源配置
# csi-rbdplugin优化配置
spec:
containers:
- name: csi-rbdplugin
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
env:
- name: RBD_CACHE_ENABLED
value: "true"
- name: RBD_CACHE_SIZE
value: "256Mi"
📊 监控与告警体系
Prometheus监控配置
# ceph-monitoring.yaml
apiVersion: v1
kind: ServiceMonitor
metadata:
name: ceph-cluster-monitor
spec:
selector:
matchLabels:
app: ceph-exporter
endpoints:
- port: metrics
interval: 30s
path: /metrics
关键指标告警规则
# storage-alerts.yaml
groups:
- name: ceph.storage.rules
rules:
- alert: CephClusterWarningState
expr: ceph_health_status == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Ceph集群状态异常"
description: "集群ID {{ $labels.cluster }} 处于Warning状态"
- alert: StorageClassProvisionFailed
expr: increase(ceph_rbd_provision_failed_total[5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "存储供给失败"
description: "StorageClass {{ $labels.storage_class }} 在过去5分钟内供给失败"
🛡️ 生产级安全配置
RBAC权限控制
# ceph-csi-rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbd-csi-provisioner
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rbd-csi-provisioner-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
网络策略隔离
# ceph-network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ceph-csi-network-policy
namespace: kube-system
spec:
podSelector:
matchLabels:
app: csi-rbdplugin
policyTypes:
- Ingress
- Egress
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 6789 # Ceph Monitor端口
- protocol: TCP
port: 6800 # Ceph OSD端口范围开始
endPort: 7300
🚨 故障排查实战手册
常见问题诊断
问题1:PVC一直处于Pending状态
# 检查StorageClass是否正确
kubectl get storageclass
# 查看PVC详细信息
kubectl describe pvc <pvc-name>
# 检查CSI驱动Pod状态
kubectl get pods -n kube-system | grep csi-rbd
# 查看驱动日志
kubectl logs -n kube-system <csi-rbdplugin-pod> -c csi-rbdplugin
问题2:挂载失败错误
# 检查Ceph集群连通性
kubectl exec -it <csi-pod> -n kube-system -- rbd ls --pool=k8s-pool
# 验证密钥配置
kubectl get secret ceph-csi-rbd-secret -n kube-system -o yaml
# 检查节点RBD模块
lsmod | grep rbd
modprobe rbd # 如果未加载
性能问题诊断
# 1. 检查Ceph集群性能
ceph -s
ceph osd perf
ceph osd df
# 2. 测试存储性能
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: storage-benchmark
spec:
containers:
- name: benchmark
image: nginx
volumeMounts:
- name: test-volume
mountPath: /data
command:
- /bin/sh
- -c
- |
# 写入测试
dd if=/dev/zero of=/data/test bs=1M count=1024 oflag=direct
# 读取测试
dd if=/data/test of=/dev/null bs=1M iflag=direct
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: test-pvc
EOF