Redis Kubernetes Operator 实测：三个方案的真实差距

如果你正在 Kubernetes 上跑 Redis，可能已经发现选择比 Postgresql 少得多。本文在真实集群上跑了三款开源 Operator，从部署、故障切换、扩容、参数调优等维度给出实测数据，帮你跳过文档坑和预期差。

一、先说背景

Redis 和 Postgresql 虽然都是数据库，但在 Kubernetes 上的 Operator 生态差别很大。Postgresql 有 cloudnative-pg、zalando postgres operator、CrunchyData/postgres-operator 等多种选择，Redis 这边成熟方案一只手数得过来。

这次测试选了三款社区里相对活跃的：

Operator	版本	维护方	开源协议
KubeBlocks Redis Operator	1.0.2	ApeCloud	AGPL 3.0
OT-CONTAINER-KIT Redis Operator	v0.24.0	Opstree	Apache 2.0
Spotahome Redis Operator	master	Spotahome	Apache 2.0

测试环境：

平台：Kubernetes v1.34.1-vke.4，3 节点 4c16G
工具：kubectl v1.30.1，Helm v3.11.3
时间：2026 年 4 月

测试重点关注几个高频场景：Sentinel vs Cluster 两种主流架构的部署体验、故障转移速度、扩缩容效率、以及动参数变更等

二、部署体验

KubeBlocks（约 5 分钟）

KubeBlocks 走的是 Helm + Cluster CRD 路线。装完控制平面后，Redis 集群通过一个 YAML 创建，支持两种主流架构：

bash 复制代码

kubectl create -f https://github.com/apecloud/kubeblocks/releases/download/v1.0.2/kubeblocks_crds.yaml
helm repo add kubeblocks https://apecloud.github.io/helm-charts
helm install kubeblocks kubeblocks/kubeblocks \
  --namespace kb-system --create-namespace --version=1.0.2

Replication 模式（主从 + Sentinel）：

yaml 复制代码

apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: redis-replication
  namespace: demo
spec:
  terminationPolicy: Delete
  clusterDef: redis
  topology: replication
  componentSpecs:
    - name: redis
      serviceVersion: "7.2.4"
      replicas: 2
      resources:
        limits:
          cpu: '0.5'
          memory: 0.5Gi
        requests:
          cpu: '0.5'
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: "ebs-ssd"
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: redis-sentinel
      serviceVersion: "7.2.4"
      replicas: 3
      resources:
        limits:
          cpu: '0.5'
          memory: 0.5Gi
        requests:
          cpu: '0.5'
          memory: 0.5Gi

Cluster 模式（Sharding）：

yaml 复制代码

apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: redis-sharding
  namespace: demo
spec:
  terminationPolicy: Delete
  shardings:
    - name: shard
      shards: 3
      template:
        name: redis
        componentDef: redis-cluster-7
        replicas: 2
        resources:
          limits:
            cpu: '0.5'
            memory: 0.5Gi
          requests:
            cpu: '0.5'
            memory: 0.5Gi
        serviceVersion: "7.2.4"
        volumeClaimTemplates:
          - name: data
            spec:
              storageClassName: "ebs-ssd"
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 20Gi

部署完成后，密码藏在 Secret 里：

bash 复制代码

kubectl get secret redis-replication-redis-account-default -n demo \
  -o jsonpath='{.data.password}' | base64 -d

Pod 布局 ：Replication 模式下是 redis-0/1 加 redis-sentinel-0/1/2；Cluster 模式下每 shard 两副本，共 6 个 Pod。

OT-CONTAINER-KIT（约 5 分钟）

Opstree 出品的这套方案走的是双 CRD 路线：RedisReplication 和 RedisCluster 分开。Helm 安装命令一行搞定：

bash 复制代码

helm repo add ot-helm https://ot-container-kit.github.io/helm-charts/
helm install redis-operator ot-helm/redis-operator \
  --namespace ot-operators --create-namespace \
  --set featureGates.GenerateConfigInInitContainer=true

RedisReplication（Sentinel 高可用）：

yaml 复制代码

apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisReplication
metadata:
  name: redis-replication
spec:
  clusterSize: 3
  sentinel:
    size: 3
    image: quay.io/opstree/redis-sentinel:latest
    service:
      serviceType: NodePort
  kubernetesConfig:
    image: quay.io/opstree/redis:latest
    service:
      serviceType: NodePort
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: ebs-ssd
        resources:
          requests:
            storage: 20Gi

RedisCluster（分片）：

yaml 复制代码

apiVersion: redis.redis.opstreelabs.in/v1beta2
kind: RedisCluster
metadata:
  name: redis-cluster
spec:
  clusterSize: 3
  clusterVersion: v7
  podSecurityContext:
    runAsUser: 1000
    fsGroup: 1000
  persistenceEnabled: true
  kubernetesConfig:
    image: quay.io/opstree/redis:v7.0.15
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 200m
        memory: 256Mi
    service:
      serviceType: NodePort
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:latest
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: ebs-ssd
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi
    nodeConfVolume: true
    nodeConfVolumeClaimTemplate:
      spec:
        storageClassName: ebs-ssd
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

部署时发现一个版本限制：Redis >= 7.2.9，Redis 6 用户可以直接跳过这个方案了。

Spotahome（约 10 分钟）

这家的问题最直接：Helm 安装直接报 CRD 解析错误并且operator镜像版本也不对。换 kubectl apply 和版本才跑起来：

bash 复制代码

kubectl apply -f redis-operator/example/operator/all-redis-operator-resources.yaml
helm install redis-operator/charts/redisoperator --set image.tag=v1.3.0-rc1

只支持 Sentinel 模式，不支持 Cluster------这个限制来自 CRD 设计，不是功能没做完：

yaml 复制代码

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: redisfailover-persistent
spec:
  sentinel:
    replicas: 3
  redis:
    replicas: 3
    storage:
      persistentVolumeClaim:
        spec:
          storageClassName: ebs-ssd
          resources:
            requests:
              storage: 10Gi

更大的问题是这个项目已经很久没更新了，生产环境选这个要慎重。

三、实测结果

3.1 架构支持

架构	KubeBlocks	OT-CONTAINER-KIT	Spotahome
Redis Sentinel（主从 + Sentinel）	✅	✅	✅
Redis Cluster（Sharding）	✅	✅	❌
Redis 版本	6, 7, 8	仅 7, 8（要求 >=7.2.9）	6, 7, 8

一句话：需要 Cluster 模式的别选 Spotahome，还在用 Redis 6 的别选 OT-CONTAINER-KIT。

3.2 故障转移速度

用 kubectl delete pod <master> 强制杀主节点，脚本一直轮询新主选举，记录从操作到恢复写的时间。

Operator	模式	RTO
KubeBlocks	Cluster	~6s
KubeBlocks	Sentinel	~30s
OT-CONTAINER-KIT	Cluster	~4-6s
OT-CONTAINER-KIT	Sentinel	~46s
Spotahome	Sentinel	~40-43s

Cluster 模式普遍快（~4-6s），因为 Redis Cluster 自己的槽选举不依赖外部组件。Sentinel 方案里 KubeBlocks 最快（30s），比另外两个少等 10-16 秒。

故障转移后各 Operator 用不同 label 标记主从节点：KubeBlocks 用 kubeblocks.io/role，OT-CONTAINER-KIT 用 redis-role，Spotahome 用 redisfailovers-role。

3.3 扩缩容

KubeBlocks 走 OpsRequest 声明式路线，水平、垂直分开：

bash 复制代码

# 水平扩容
kubectl apply -f - <<EOF
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: redis-scale-out
spec:
  clusterName: redis-replication
  type: HorizontalScaling
  horizontalScaling:
  - componentName: redis
    scaleOut:
      replicaChanges: 1
EOF

# 垂直扩容
kubectl apply -f - <<EOF
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: redis-verticalscaling
spec:
  clusterName: redis-replication
  type: VerticalScaling
  verticalScaling:
  - componentName: redis
    requests:
      cpu: "1"
      memory: 1Gi
    limits:
      cpu: "1"
      memory: 1Gi
EOF

进度直接查 OpsRequest 状态：

bash 复制代码

$ kubectl get opsrequest redis-scale-out -n demo -w
NAME             TYPE                STATUS     AGE
redis-scale-out  HorizontalScaling   Running    8s
redis-scale-out  HorizontalScaling   Succeed    37s

OT-CONTAINER-KIT 和 Spotahome 都是 patch CRD，没有统一追踪：

bash 复制代码

# OT-CONTAINER-KIT
kubectl patch RedisReplication redis-replication -n ot-operators \
  --type='json' -p='[{"op": "replace", "path": "/spec/clusterSize", "value": 5}]'

# Spotahome
kubectl patch redisfailover redisfailover-persistent --type='json' \
  -p='[{"op": "replace", "path": "/spec/redis/replicas", "value": 5}]'

能力	KubeBlocks	OT-CONTAINER-KIT	Spotahome
水平扩容	OpsRequest	patch	patch
垂直扩容	OpsRequest	patch	patch
扩容耗时	~37s	~60s	~67s
缩容耗时	~8s	~30s	~30s
进度跟踪	✅ OpsRequest	❌	❌

一个小细节：KubeBlocks 垂直扩容时先改从库再改主库，另外两个依赖 StatefulSet 按序号滚动更新。

3.4 参数调整

KubeBlocks 能区分动态参数和静态参数：

yaml 复制代码

apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: redis-reconfiguring
spec:
  clusterName: redis-replication
  type: Reconfiguring
  reconfigures:
  - componentName: redis
    parameters:
    - key: maxmemory-policy
      value: allkeys-lru

maxmemory-policy 这种动态参数立即生效，maxclients 这种静态参数会触发 Pod 重启。

OT-CONTAINER-KIT 的 dynamicConfig 有点鸡肋：

bash 复制代码

kubectl patch RedisReplication redis-replication -n ot-operators --type='json' \
  -p='[{"op": "add", "path": "/spec/redisConfig", "value": {"dynamicConfig": ["maxmemory-policy allkeys-lru"]}}]'

实测发现这个配置只有在集群不健康时才生效，正常状态下改完会被覆盖。Replication 模式下这个 limitation 更明显。

Spotahome 的 customConfig 反而更好用：

bash 复制代码

kubectl patch redisfailover redisfailover-persistent --type='json' -p='[
  {"op": "add", "path": "/spec/redis/customConfig", "value": [
    "maxmemory-policy allkeys-lru",
    "maxmemory 256mb"
  ]}
]'

Operator	方式	效果
KubeBlocks	Reconfiguring OpsRequest	✅ 动态参数立即生效，静态参数需重启
OT-CONTAINER-KIT	dynamicConfig	⚠️ 仅集群非 Ready 时有效，Replication 模式不支持
Spotahome	customConfig	✅ 动态参数直接生效

3.5 TLS

Operator	Redis TLS	Sentinel TLS	备注
KubeBlocks	❌	❌	1.0.2 未支持，1.0.3 beta 看代码是支持的，待验证
OT-CONTAINER-KIT	✅	⚠️ 有问题	Redis TLS 正常，Sentinel 模式下会有连接问题
Spotahome	❌	❌	不支持

OT-CONTAINER-KIT 的 Redis TLS 本身能跑通，但 Sentinel 配合 TLS 时需要额外配置。

3.6 备份与恢复

能力	KubeBlocks	OT-CONTAINER-KIT	Spotahome
备份恢复	✅	⏳ 代码里有，Replication 集群支持，Cluster 待验证	❌

KubeBlocks 有完整方案。OT-CONTAINER-KIT 靠 PR #489 的 cron job 机制，代码层面 Replication 集群能用，但没完整跑过验证流程。

3.7 监控

三个都带 exporter，启用方式都是 CRD 开关：

Operator	启用方式
KubeBlocks	`disableExporter: false`
OT-CONTAINER-KIT	`spec.redisExporter.enabled: true`
Spotahome	`spec.redis.exporter` 配置块

3.8 操作界面一致性

三个 Operator 的操作风格差异明显。

OT-CONTAINER-KIT 和 Spotahome 都是"哪里需要改点哪里"：扩容戳 clusterSize，改参数戳 redisConfig.dynamicConfig，API 路径分散在 CRD 各处，也没有统一的状态追踪。

KubeBlocks 走了另一条路：OpsRequest 一个 CRD 覆盖所有操作类型。

bash 复制代码

# 水平扩容
kubectl apply -f ops-hscale.yaml    # type: HorizontalScaling

# 参数变更
kubectl apply -f ops-reconfig.yaml  # type: Reconfiguring

# 垂直扩容
kubectl apply -f ops-vscale.yaml  # type: VerticalScaling

# 查状态
kubectl get opsrequest -n demo
NAME              TYPE               STATUS    AGE
redis-scale-out   HorizontalScaling  Succeed   5m
redis-reconfig    Reconfiguring      Succeed   2m
redis-vscale      VerticalScaling    Succeed   3m

每种操作都有 Pending / Running / Succeed / Failed 四种状态，CI/CD 流程里直接 watch 就行，不用自己写轮询脚本。更实用的是这一套接口跨引擎通用------MySQL、Redis、PostgreSQL 的扩容方式完全一样。

四、横向对比

功能	KubeBlocks	OT-CONTAINER-KIT	Spotahome
开源协议	AGPL 3.0	Apache 2.0	Apache 2.0
Sentinel 模式	✅	✅	✅
Cluster 模式	✅	✅	❌
Redis 版本	6, 7, 8	仅 7, 8	6, 7, 8
RTO Cluster	~6s	~4-6s	N/A
RTO Sentinel	~30s	~46s	~40-43s
水平扩容	OpsRequest	patch	patch
垂直扩容	OpsRequest	patch	patch
动态参数	✅	⚠️ 有限制	✅
备份恢复	✅	replication 待验证， cluster 不支持	❌
TLS	1.0.2不支持， 1.0.3待验证	⚠️ Sentinel 有问题	❌
Prometheus 监控	✅	✅	✅
多数据库统一 API	✅	❌	❌
社区状态	活跃	活跃	⚠️ 已停止维护
部署难度	简单	简单	中等（Helm 有坑）

五、什么场景选什么

KubeBlocks：适合多数据库混合部署的团队，一套 OpsRequest 接口通吃 MySQL、Redis、PostgreSQL，不用每人学一套 CRD。Sentinel 和 Cluster 模式切换也灵活，备份恢复是三家里最完整的。

OT-CONTAINER-KIT：只跑 Redis、必须用 Cluster 模式、对故障转移速度敏感的话可以选。动态参数的坑要心里有数，Redis 6 用户直接排除。

不推荐 Spotahome：项目已停更，Cluster 不支持，备份不支持，Helm 还有解析错误。生产环境风险太大，除非维护者突然复活。

六、结论

三个方案横向比下来，KubeBlocks 的综合分最高。

核心优势在这三点：一是 Sentinel 和 Cluster 两种架构随意切换，Spotahome 压根不支持 Cluster，OT-CONTAINER-KIT 把 Redis 6 用户拒之门外；二是运维接口整齐划一，OpsRequest 一套走天下，扩缩容、参数调整、备份恢复全用同一个 CRD 发起，CI/CD 接入成本最低；三是备份机制最完整，OT-CONTAINER-KIT 的备份还停在代码层面没验证过，Spotahome 压根没有。

故障转移速度方面 Cluster 模式都在 4-6 秒这个区间，OT-CONTAINER-KIT 稍快一点但优势不大，Sentinel 模式下 KubeBlocks 的 30s 比另外两个好出一截。

如果你的团队只跑 Redis、对故障转移速度极度敏感、而且确定用 Cluster 架构，OT-CONTAINER-KIT 可以考虑------前提是能接受 Redis 版本限制和动态参数的坑。否则大多数场景下 KubeBlocks 仍然是投入产出比最高的选择。

测试时间：2026-04-17

参考：KubeBlocks 文档 / OT-CONTAINER-KIT GitHub / Spotahome redis-operator