8. Redis Operator (1) —— 单机部署

0. 简介

k8s内置的资源类型,可以满足绝大部分的需求,然而对于追求更高自由度的特殊需求下,用户可以使用CRD(CustomResourceDefinition)实现,无需修改原生代码,只需向APIServer注入自定义资源方式,通过拓展的方式实现自身需求。

K8S Operator就是一种软件拓展模式,DeepSeek 总结Operator = Controller + CRD + 领域知识,通过 Operator 可以定义 CRD 来管理特定资源,并且通过 Deployment 的形式部署到K8S中。Operator一般会有以下组成:

  • CRD:自定义资源,声明式的API,程序会通过该定义一直让最小调度单元(POD)趋近于该状态;
  • Controller:控制器,监视资源的创建/更新/删除事件,并触发Reconcile(可译作:调谐)函数作为响应,目的就是为了让最小单元趋近于最终状态;
  • AdmissionWebhook:用来拦截非法请求、保证配置一致性及减少复杂校验逻辑等。

所以,我这里就以我们平常用到很多的 Redis 组件为例,实现一个 Redis Operator,用于对集群中k8s部署Redis的管理,也学一下Operator的整体流程和创建、部署等。

1. 开发环境

1.1 工具版本

  • go:v1.20.2
  • kubebuilder:v3.10.0
  • kind:v0.20.0

1.2 集群搭建

因为redis需要1)对外暴露端口;2)保证数据的持久化,所以我们需要将存储挂载出来;因此我们这里使用如下配置启动kind:

yaml 复制代码
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  # port forward 80 on the host to 80 on this node
  extraPortMappings:
  - containerPort: 30950
    hostPort: 80
    # optional: set the bind address on the host
    # 0.0.0.0 is the current default
    listenAddress: "127.0.0.1"
    # optional: set the protocol to one of TCP, UDP, SCTP.
    # TCP is the default
    protocol: TCP
  - containerPort: 31000
    hostPort: 6379
    protocol: TCP
- role: worker
  extraMounts:
  # 主机目录映射到节点容器
  - hostPath: /Users/xxx/workspace/k8s/kind/data
    containerPath: /data
- role: worker
  extraMounts:
  # 主机目录映射到节点容器
  - hostPath: /Users/xxx/workspace/k8s/kind/data
    containerPath: /data
- role: worker
  extraMounts:
  # 主机目录映射到节点容器
  - hostPath: /Users/xxx/workspace/k8s/kind/data
    containerPath: /data

然后使用kind create cluster --name multi --config multi.yaml命令创建集群。

1.3 kubebuilder安装

因为我这边要安装指定的v3.10.0版本,如下:

css 复制代码
$ curl -L -O "https://github.com/kubernetes-sigs/kubebuilder/releases/download/v3.10.0/kubebuilder_darwin_arm64
$ chmod +x kubebuilder_darwin_arm64
$ mv kubebuilder_darwin_arm64 kubebuilder
$ sudo mv kubebuilder /usr/local/bin/
$ kubebuilder version
Version: main.version{KubeBuilderVersion:"3.10.0", KubernetesVendor:"1.26.1", GitCommit:"0fa57405d4a892efceec3c5a902f634277e30732", BuildDate:"2023-04-15T08:10:35Z", GoOs:"darwin", GoArch:"arm64"}

2. Operator开发

2.1 初始化 Redis Operator 项目

如下,我们首先创建文件夹:

bash 复制代码
$ mkdir redis-operator && cd redis-operator

然后创建Operator项目:

bash 复制代码
$ kubebuilder init --domain iguochan.io --repo github.com/IguoChan/redis-operator 

可以看一下其现在的源码结构:

bash 复制代码
$ tree
.
├── Dockerfile
├── LICENSE
├── Makefile
├── PROJECT
├── README.md
├── cmd
│   └── main.go
├── config
│   ├── default
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   └── rbac
│       ├── auth_proxy_client_clusterrole.yaml
│       ├── auth_proxy_role.yaml
│       ├── auth_proxy_role_binding.yaml
│       ├── auth_proxy_service.yaml
│       ├── kustomization.yaml
│       ├── leader_election_role.yaml
│       ├── leader_election_role_binding.yaml
│       ├── role_binding.yaml
│       └── service_account.yaml
├── go.mod
├── go.sum
└── hack
    └── boilerplate.go.txt

7 directories, 25 files

注意这时候可能会报go: sigs.k8s.io/[email protected]: verifying go.mod: invalid GOSUMDB: malformed verifier id错误,这时候只需要提前go env -w GOSUMDB=off关闭即可,当然也许有其他更优雅的解决方式,咱们暂不深究。

2.2 创建 Redis API

bash 复制代码
$ kubebuilder create api --group cache --version v1 --kind Redis
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/redis_types.go
api/v1/groupversion_info.go
internal/controller/suite_test.go
internal/controller/redis_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /Users/xxx/workspace/github/iguochan/redis-operator/bin
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen && /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen --version | grep -q v0.11.3 || \
	GOBIN=/Users/xxx/workspace/github/iguochan/redis-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests

可以看到,新增了apiinternal/controller等代码文件,在config的配置中新增了crd,rbac中也新增了redis相关的配置。

bash 复制代码
$ tree
.
├── Dockerfile
├── LICENSE
├── Makefile
├── PROJECT
├── README.md
├── api
│   └── v1
│       ├── groupversion_info.go
│       ├── redis_types.go
│       └── zz_generated.deepcopy.go
├── bin
│   └── controller-gen
├── cmd
│   └── main.go
├── config
│   ├── crd
│   │   ├── kustomization.yaml
│   │   ├── kustomizeconfig.yaml
│   │   └── patches
│   │       ├── cainjection_in_redis.yaml
│   │       └── webhook_in_redis.yaml
│   ├── default
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   ├── rbac
│   │   ├── auth_proxy_client_clusterrole.yaml
│   │   ├── auth_proxy_role.yaml
│   │   ├── auth_proxy_role_binding.yaml
│   │   ├── auth_proxy_service.yaml
│   │   ├── kustomization.yaml
│   │   ├── leader_election_role.yaml
│   │   ├── leader_election_role_binding.yaml
│   │   ├── redis_editor_role.yaml
│   │   ├── redis_viewer_role.yaml
│   │   ├── role_binding.yaml
│   │   └── service_account.yaml
│   └── samples
│       ├── cache_v1_redis.yaml
│       └── kustomization.yaml
├── go.mod
├── go.sum
├── hack
│   └── boilerplate.go.txt
└── internal
    └── controller
        ├── redis_controller.go
        └── suite_test.go

15 directories, 39 files

其中,我们在api目录下的redis_types.go中定义CRD相关字段;在internal/controller中的redis_controller.go中实现Reconcile控制逻辑。

2.3 实现Controller

2.3.1 定义CRD

api/v1/redis_types.go中定义Redis的相关配置:

go 复制代码
/*
Copyright 2025.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1

import (
    "k8s.io/apimachinery/pkg/api/resource"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

type RedisPhase string

const (
    RedisPhasePending = "Pending"
    RedisPhaseError   = "Error"
    RedisPhaseReady   = "Ready"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.

// RedisSpec defines the desired state of Redis
type RedisSpec struct {
    // Image: Redis Image
    // +kubebuilder:default="redis:7.0"
    Image string `json:"image,omitempty"`

    // NodePort Service
    // +kubebuilder:validation:Minimum=30000
    // +kubebuilder:validation:Maximum=32767
    // +kubebuilder:default=31000
    NodePort int32 `json:"nodePort,omitempty"`

    // 存储配置
    Storage StorageSpec `json:"storage,omitempty"`
}

// StorageSpec 定义 Redis 存储配置
type StorageSpec struct {
    // 存储大小
    // +kubebuilder:default="1Gi"
    Size resource.Quantity `json:"size,omitempty"`

    // 主机目录路径
    // +kubebuilder:default="/data"
    HostPath string `json:"hostPath,omitempty"`
}

// RedisStatus defines the observed state of Redis
type RedisStatus struct {
    // 部署状态
    Phase RedisPhase `json:"phase,omitempty"`

    // Redis 服务端点
    Endpoint string `json:"endpoint,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:JSONPath=".status.phase",name=phase,type=string,description="当前阶段"
//+kubebuilder:printcolumn:name="Endpoint",type="string",JSONPath=".status.endpoint",description="访问端点"
//+kubebuilder:printcolumn:name="Image",type="string",JSONPath=".spec.image",description="使用的镜像"
//+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp",description="创建时间"

// Redis is the Schema for the redis API
type Redis struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   RedisSpec   `json:"spec,omitempty"`
    Status RedisStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// RedisList contains a list of Redis
type RedisList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []Redis `json:"items"`
}

func init() {
    SchemeBuilder.Register(&Redis{}, &RedisList{})
}

首先我们定义Redis的镜像,其中// +kubebuilder:default="redis:7.0"的注释,表示用户在创建Redis时,如果没有显示指定spec.image字段,系统会自动将其值设置为redis:7.0

go 复制代码
// Image: Redis Image
// +kubebuilder:default="redis:7.0"
Image string `json:"image,omitempty"`

其次是NodePort的设置,为了方便外部调用,我们这里使用NodePort的Service;由于咱们使用的kind搭建的集群,后续在设置集群时会映射31000的端口,所以在我们的集群中无需设置。

go 复制代码
// NodePort Service
// +kubebuilder:validation:Minimum=30000
// +kubebuilder:validation:Maximum=32767
// +kubebuilder:default=31000
NodePort int32 `json:"nodePort,omitempty"`

至于存储配置,用于存放redis的data数据的,这里不详述了。

在代码的77-80行,我们定义了一些字段,这些字段会作为redis这个CRD的显示字段,在执行kubectl get redis时显示出来。如果我们不自定义的话,只会显示NAMEAGE两个字段,如果我们新增了自定义字段了之后,AGE字段需要显示指示出来。

接下来我们执行make manifests,会发现:

  1. 新增了config/crd/bases目录下的cache.iguochan.io_redis.yaml文件,其作用是定义Redis这个CRD,主要包括结构定义、验证规则和行为;
  2. 新增了config/rbac目录下的role.yaml文件,其作用是定义Operator需要的k8s API权限,以及控制Operator可以访问哪些资源和执行什么操作。

然后我们执行一次make generate会发现:

  1. api/v1zz_generated.deepcopy.go文件中,各种字段Deepcopy的方法都被更新了,该命令实现了自动生成深拷贝。

2.3.2 实现controller

接下来,我们需要实现controller的逻辑了,需要修改internal/controller/redis_controller.go文件。首先我们需要明确一下我们的方案,在这里的redis方案中:

  • 选择Statefulset来管理Pod;
  • 使用NodePort的Service暴露端口;
  • 使用PV/PVC管理存储;
  • 维护Redis状态。

所以我们的Controller中也需要做好这几件事:

go 复制代码
/*
Copyright 2025.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package controller

import (
    "context"
    "fmt"

    cachev1 "github.com/IguoChan/redis-operator/api/v1"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    "k8s.io/apimachinery/pkg/util/intstr"
    "k8s.io/client-go/tools/record"
    "k8s.io/utils/pointer"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"
)

// RedisReconciler reconciles a Redis object
type RedisReconciler struct {
    client.Client
    Scheme   *runtime.Scheme
    Recorder record.EventRecorder
}

const (
    RecordReasonFailed  = "Failed"
    RecordReasonWaiting = "Waiting"
)

//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Redis object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/[email protected]/pkg/reconcile
func (r *RedisReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // 获取 Redis 实例
    redis := &cachev1.Redis{}
    if err := r.Get(ctx, req.NamespacedName, redis); err != nil {
       return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 确保 StatefulSet 存在
    if err := r.reconcileStatefulSet(ctx, redis); err != nil {
       logger.Error(err, "reconcileStatefulSet failed")
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileStatefulSet failed: %s", err.Error())
       return ctrl.Result{}, r.updateStatus(ctx, redis)
    }

    // 确保 Service 存在
    if err := r.reconcileService(ctx, redis); err != nil {
       logger.Error(err, "reconcileService failed")
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileService failed: %s", err.Error())
       return ctrl.Result{}, r.updateStatus(ctx, redis)
    }

    // 确保 PVC 存在
    if err := r.reconcilePVC(ctx, redis); err != nil {
       logger.Error(err, "reconcilePVC failed")
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcilePVC failed: %s", err.Error())
       return ctrl.Result{}, r.updateStatus(ctx, redis)
    }

    // 更新状态
    return ctrl.Result{}, r.updateStatus(ctx, redis)
}

func (r *RedisReconciler) reconcileStatefulSet(ctx context.Context, redis *cachev1.Redis) error {
    logger := log.FromContext(ctx)

    desired := &appsv1.StatefulSet{
       ObjectMeta: metav1.ObjectMeta{
          Name:      fmt.Sprintf("%s-redis", redis.Name),
          Namespace: redis.Namespace,
       },
       Spec: appsv1.StatefulSetSpec{
          ServiceName: fmt.Sprintf("%s-svc", redis.Name),
          Replicas:    pointer.Int32(1), // Standalone 模式
          Selector: &metav1.LabelSelector{
             MatchLabels: map[string]string{
                "app":  "redis",
                "name": redis.Name,
             },
          },
          Template: corev1.PodTemplateSpec{
             ObjectMeta: metav1.ObjectMeta{
                Labels: map[string]string{
                   "app":  "redis",
                   "name": redis.Name,
                },
             },
             Spec: corev1.PodSpec{
                Containers: []corev1.Container{
                   {
                      Name:            "redis",
                      Image:           redis.Spec.Image,
                      ImagePullPolicy: corev1.PullIfNotPresent,
                      Ports: []corev1.ContainerPort{
                         {
                            Name:          "redis",
                            ContainerPort: 6379,
                         },
                      },
                      VolumeMounts: []corev1.VolumeMount{
                         {
                            Name:      "redis-data",
                            MountPath: "/data",
                         },
                      },
                   },
                },
             },
          },
          VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
             {
                ObjectMeta: metav1.ObjectMeta{
                   Name: "redis-data",
                },
                Spec: corev1.PersistentVolumeClaimSpec{
                   AccessModes: []corev1.PersistentVolumeAccessMode{
                      corev1.ReadWriteOnce,
                   },
                   Resources: corev1.ResourceRequirements{
                      Requests: corev1.ResourceList{
                         corev1.ResourceStorage: redis.Spec.Storage.Size,
                      },
                   },
                   StorageClassName: pointer.String("redis-storage"),
                },
             },
          },
       },
    }

    // 设置 OwnerReference
    if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
       return err
    }

    existing := &appsv1.StatefulSet{}
    err := r.Get(ctx, types.NamespacedName{
       Name:      desired.Name,
       Namespace: desired.Namespace,
    }, existing)

    if errors.IsNotFound(err) {
       logger.Info("Creating StatefulSet", "name", desired.Name)
       return r.Create(ctx, desired)
    } else if err != nil {
       return err
    }

    // 更新 StatefulSet (简化处理)
    logger.Info("Updating StatefulSet", "name", desired.Name)
    desired.Spec.DeepCopyInto(&existing.Spec)
    return r.Update(ctx, existing)
}

func (r *RedisReconciler) reconcileService(ctx context.Context, redis *cachev1.Redis) error {
    logger := log.FromContext(ctx)

    desired := &corev1.Service{
       ObjectMeta: metav1.ObjectMeta{
          Name:      fmt.Sprintf("%s-svc", redis.Name),
          Namespace: redis.Namespace,
       },
       Spec: corev1.ServiceSpec{
          Type: corev1.ServiceTypeNodePort,
          Selector: map[string]string{
             "app":  "redis",
             "name": redis.Name,
          },
          Ports: []corev1.ServicePort{
             {
                Name:       "redis",
                Port:       6379,
                TargetPort: intstr.FromInt(6379),
                NodePort:   redis.Spec.NodePort,
             },
          },
       },
    }

    // 设置 OwnerReference
    if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
       return err
    }

    existing := &corev1.Service{}
    err := r.Get(ctx, types.NamespacedName{
       Name:      desired.Name,
       Namespace: desired.Namespace,
    }, existing)

    if errors.IsNotFound(err) {
       logger.Info("Creating Service", "name", desired.Name)
       return r.Create(ctx, desired)
    } else if err != nil {
       return err
    }

    // 更新 Service (只更新端口)
    if existing.Spec.Ports[0].NodePort != desired.Spec.Ports[0].NodePort {
       logger.Info("Updating Service port", "name", desired.Name)
       existing.Spec.Ports = desired.Spec.Ports
       return r.Update(ctx, existing)
    }

    return nil
}

func (r *RedisReconciler) reconcilePVC(ctx context.Context, redis *cachev1.Redis) error {
    logger := log.FromContext(ctx)

    pvName := fmt.Sprintf("%s-pv", redis.Name)

    // 创建 PV (使用 HostPath)
    pv := &corev1.PersistentVolume{
       ObjectMeta: metav1.ObjectMeta{
          Name: pvName,
       },
       Spec: corev1.PersistentVolumeSpec{
          Capacity: corev1.ResourceList{
             corev1.ResourceStorage: redis.Spec.Storage.Size,
          },
          AccessModes: []corev1.PersistentVolumeAccessMode{
             corev1.ReadWriteOnce,
          },
          PersistentVolumeReclaimPolicy: corev1.PersistentVolumeReclaimRecycle,
          StorageClassName:              "redis-storage",
          PersistentVolumeSource: corev1.PersistentVolumeSource{
             HostPath: &corev1.HostPathVolumeSource{
                Path: redis.Spec.Storage.HostPath,
             },
          },
       },
    }

    if err := r.Create(ctx, pv); err != nil && !errors.IsAlreadyExists(err) {
       logger.Error(err, "create pv failed")
       return err
    }

    // 更新 PVC 指向创建的 PV (在 StatefulSet 中会自动创建 PVC)
    return nil
}

func (r *RedisReconciler) updateStatus(ctx context.Context, redis *cachev1.Redis) error {
    // 1. 获取关联的Pod
    podList := &corev1.PodList{}
    if err := r.List(ctx, podList, client.MatchingLabels{
       "name": redis.Name,
    }); err != nil {
       return err
    }

    // 2. 检查Pod状态
    if len(podList.Items) == 0 {
       r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "no pod is ready")
       redis.Status.Phase = cachev1.RedisPhasePending
       return r.Status().Update(ctx, redis)
    }

    // 3. 确保所有Pod都处于Ready状态
    allPodsReady := true
    for _, pod := range podList.Items {
       isPodReady := false
       for _, cond := range pod.Status.Conditions {
          if cond.Type == corev1.PodReady && cond.Status == corev1.ConditionTrue {
             isPodReady = true
             break
          }
       }
       if !isPodReady {
          allPodsReady = false
          break
       }
    }

    // 4. 如果Pod没准备好,直接返回
    if !allPodsReady {
       r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "not every pod is ready")
       redis.Status.Phase = cachev1.RedisPhasePending
       return r.Status().Update(ctx, redis)
    }

    // 5. 获取服务信息
    svc := &corev1.Service{}
    if err := r.Get(ctx, types.NamespacedName{
       Name:      fmt.Sprintf("%s-svc", redis.Name),
       Namespace: redis.Namespace,
    }, svc); err != nil {
       // 处理服务不存在的情况
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get service failed: %s", err.Error())
       redis.Status.Phase = cachev1.RedisPhaseError
       return r.Status().Update(ctx, redis)
    }

    // 6. 验证Endpoint可用性
    endpoints := &corev1.Endpoints{}
    if err := r.Get(ctx, types.NamespacedName{
       Name:      svc.Name,
       Namespace: svc.Namespace,
    }, endpoints); err != nil {
       // 处理Endpoint不存在的情况
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get endpoints failed: %s", err.Error())
       redis.Status.Phase = cachev1.RedisPhaseError
       return r.Status().Update(ctx, redis)
    }

    // 7. 确保Endpoint有可用的目标
    if len(endpoints.Subsets) == 0 || len(endpoints.Subsets[0].Addresses) == 0 {
       r.Recorder.Event(redis, corev1.EventTypeWarning, RecordReasonFailed, "endpoints have no address")
       redis.Status.Phase = cachev1.RedisPhaseError
       return r.Status().Update(ctx, redis)
    }

    // 8. 获取节点IP(从Endpoint获取)
    nodeIP := endpoints.Subsets[0].Addresses[0].IP
    endpoint := fmt.Sprintf("%s:%d", nodeIP, svc.Spec.Ports[0].TargetPort.IntVal)

    // 9. 成功更新状态
    redis.Status.Endpoint = endpoint
    redis.Status.Phase = cachev1.RedisPhaseReady

    return r.Status().Update(ctx, redis)
}

// SetupWithManager sets up the controller with the Manager.
func (r *RedisReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
       For(&cachev1.Redis{}).
       Owns(&appsv1.StatefulSet{}).
       Owns(&corev1.Service{}).
       Complete(r)
}

接下来我们分几段来分析一下这些代码是做什么的:

基本逻辑

首先我们看一下Reconcile函数:

go 复制代码
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Redis object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/[email protected]/pkg/reconcile
func (r *RedisReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // 获取 Redis 实例
    redis := &cachev1.Redis{}
    if err := r.Get(ctx, req.NamespacedName, redis); err != nil {
       return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 确保 StatefulSet 存在
    if err := r.reconcileStatefulSet(ctx, redis); err != nil {
       logger.Error(err, "reconcileStatefulSet failed")
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileStatefulSet failed: %s", err.Error())
       return ctrl.Result{}, r.updateStatus(ctx, redis)
    }

    // 确保 Service 存在
    if err := r.reconcileService(ctx, redis); err != nil {
       logger.Error(err, "reconcileService failed")
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileService failed: %s", err.Error())
       return ctrl.Result{}, r.updateStatus(ctx, redis)
    }

    // 确保 PVC 存在
    if err := r.reconcilePVC(ctx, redis); err != nil {
       logger.Error(err, "reconcilePVC failed")
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcilePVC failed: %s", err.Error())
       return ctrl.Result{}, r.updateStatus(ctx, redis)
    }

    // 更新状态
    return ctrl.Result{}, r.updateStatus(ctx, redis)
}

Reconcile是CRD状态的调谐引擎,其主要通过循环比较实际状态(CRD Status)和期望状态(CRD Spec),执行操作使其趋于一致。以上代码的4-6行,用于赋予Operator操作其子资源的权限。其他代码可以理解成对不同子资源的控制。除了ReconcileSetupWithManager也是重要的组件:

  • For:声明监控主要资源类型;
  • Owns:资源所有权声明,建立父子关系;
  • WithOptions:调优参数设置,比如并发控制、超时设置等;
go 复制代码
// SetupWithManager sets up the controller with the Manager.
func (r *RedisReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
       For(&cachev1.Redis{}).
       Owns(&appsv1.StatefulSet{}).
       Owns(&corev1.Service{}).
       Complete(r)
}
Statefulset
go 复制代码
func (r *RedisReconciler) reconcileStatefulSet(ctx context.Context, redis *cachev1.Redis) error {
    logger := log.FromContext(ctx)

    desired := &appsv1.StatefulSet{
       ObjectMeta: metav1.ObjectMeta{
          Name:      fmt.Sprintf("%s-redis", redis.Name),
          Namespace: redis.Namespace,
       },
       Spec: appsv1.StatefulSetSpec{
          ServiceName: fmt.Sprintf("%s-svc", redis.Name),
          Replicas:    pointer.Int32(1), // Standalone 模式
          Selector: &metav1.LabelSelector{
             MatchLabels: map[string]string{
                "app":  "redis",
                "name": redis.Name,
             },
          },
          Template: corev1.PodTemplateSpec{
             ObjectMeta: metav1.ObjectMeta{
                Labels: map[string]string{
                   "app":  "redis",
                   "name": redis.Name,
                },
             },
             Spec: corev1.PodSpec{
                Containers: []corev1.Container{
                   {
                      Name:            "redis",
                      Image:           redis.Spec.Image,
                      ImagePullPolicy: corev1.PullIfNotPresent,
                      Ports: []corev1.ContainerPort{
                         {
                            Name:          "redis",
                            ContainerPort: 6379,
                         },
                      },
                      VolumeMounts: []corev1.VolumeMount{
                         {
                            Name:      "redis-data",
                            MountPath: "/data",
                         },
                      },
                   },
                },
             },
          },
          VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
             {
                ObjectMeta: metav1.ObjectMeta{
                   Name: "redis-data",
                },
                Spec: corev1.PersistentVolumeClaimSpec{
                   AccessModes: []corev1.PersistentVolumeAccessMode{
                      corev1.ReadWriteOnce,
                   },
                   Resources: corev1.ResourceRequirements{
                      Requests: corev1.ResourceList{
                         corev1.ResourceStorage: redis.Spec.Storage.Size,
                      },
                   },
                   StorageClassName: pointer.String("redis-storage"),
                },
             },
          },
       },
    }

    // 设置 OwnerReference
    if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
       return err
    }

    existing := &appsv1.StatefulSet{}
    err := r.Get(ctx, types.NamespacedName{
       Name:      desired.Name,
       Namespace: desired.Namespace,
    }, existing)

    if errors.IsNotFound(err) {
       logger.Info("Creating StatefulSet", "name", desired.Name)
       return r.Create(ctx, desired)
    } else if err != nil {
       return err
    }

    // 更新 StatefulSet (简化处理)
    logger.Info("Updating StatefulSet", "name", desired.Name)
    desired.Spec.DeepCopyInto(&existing.Spec)
    return r.Update(ctx, existing)
}

其基本逻辑是:

  1. 创建一个desired期望对象;
  2. 设置OwnerReference
  3. 查看是否存在existing对象,不存在则创建desired,存在则复制existing.Specexisting.Spec,然后更新对象,让k8s自带的Controller去调谐,去产生Pod;

需要注意ImagePullPolicy: corev1.PullIfNotPresent,因为我们是kind搭建的集群中,需要将docker镜像load到集群中,所以这里选择PullIfNotPresent

Service
go 复制代码
func (r *RedisReconciler) reconcileService(ctx context.Context, redis *cachev1.Redis) error {
    logger := log.FromContext(ctx)

    desired := &corev1.Service{
       ObjectMeta: metav1.ObjectMeta{
          Name:      fmt.Sprintf("%s-svc", redis.Name),
          Namespace: redis.Namespace,
       },
       Spec: corev1.ServiceSpec{
          Type: corev1.ServiceTypeNodePort,
          Selector: map[string]string{
             "app":  "redis",
             "name": redis.Name,
          },
          Ports: []corev1.ServicePort{
             {
                Name:       "redis",
                Port:       6379,
                TargetPort: intstr.FromInt(6379),
                NodePort:   redis.Spec.NodePort,
             },
          },
       },
    }

    // 设置 OwnerReference
    if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
       return err
    }

    existing := &corev1.Service{}
    err := r.Get(ctx, types.NamespacedName{
       Name:      desired.Name,
       Namespace: desired.Namespace,
    }, existing)

    if errors.IsNotFound(err) {
       logger.Info("Creating Service", "name", desired.Name)
       return r.Create(ctx, desired)
    } else if err != nil {
       return err
    }

    // 更新 Service (只更新端口)
    if existing.Spec.Ports[0].NodePort != desired.Spec.Ports[0].NodePort {
       logger.Info("Updating Service port", "name", desired.Name)
       existing.Spec.Ports = desired.Spec.Ports
       return r.Update(ctx, existing)
    }

    return nil
}

基本逻辑和Statefulset差不多,都是期待值和现存值的对比,然后选择创建或者更新。注意使用Selector和Pod的label匹配。

pv/pvc
go 复制代码
func (r *RedisReconciler) reconcilePVC(ctx context.Context, redis *cachev1.Redis) error {
    logger := log.FromContext(ctx)

    pvName := fmt.Sprintf("%s-pv", redis.Name)

    // 创建 PV (使用 HostPath)
    pv := &corev1.PersistentVolume{
       ObjectMeta: metav1.ObjectMeta{
          Name: pvName,
       },
       Spec: corev1.PersistentVolumeSpec{
          Capacity: corev1.ResourceList{
             corev1.ResourceStorage: redis.Spec.Storage.Size,
          },
          AccessModes: []corev1.PersistentVolumeAccessMode{
             corev1.ReadWriteOnce,
          },
          PersistentVolumeReclaimPolicy: corev1.PersistentVolumeReclaimRecycle,
          StorageClassName:              "redis-storage",
          PersistentVolumeSource: corev1.PersistentVolumeSource{
             HostPath: &corev1.HostPathVolumeSource{
                Path: redis.Spec.Storage.HostPath,
             },
          },
       },
    }

    if err := r.Create(ctx, pv); err != nil && !errors.IsAlreadyExists(err) {
       logger.Error(err, "create pv failed")
       return err
    }

    // 更新 PVC 指向创建的 PV (在 StatefulSet 中会自动创建 PVC)
    return nil
}

这里说是reconcilePVC,其实是创建PV,因为PVC在Statefulset里面已经创建了,这里没有设置。注意:如果需要再删除redis的时候删除相关的PV/PVC资源,那么需要额外的逻辑。

更新状态
go 复制代码
func (r *RedisReconciler) updateStatus(ctx context.Context, redis *cachev1.Redis) error {
    // 1. 获取关联的Pod
    podList := &corev1.PodList{}
    if err := r.List(ctx, podList, client.MatchingLabels{
       "name": redis.Name,
    }); err != nil {
       return err
    }

    // 2. 检查Pod状态
    if len(podList.Items) == 0 {
       r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "no pod is ready")
       redis.Status.Phase = cachev1.RedisPhasePending
       return r.Status().Update(ctx, redis)
    }

    // 3. 确保所有Pod都处于Ready状态
    allPodsReady := true
    for _, pod := range podList.Items {
       isPodReady := false
       for _, cond := range pod.Status.Conditions {
          if cond.Type == corev1.PodReady && cond.Status == corev1.ConditionTrue {
             isPodReady = true
             break
          }
       }
       if !isPodReady {
          allPodsReady = false
          break
       }
    }

    // 4. 如果Pod没准备好,直接返回
    if !allPodsReady {
       r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "not every pod is ready")
       redis.Status.Phase = cachev1.RedisPhasePending
       return r.Status().Update(ctx, redis)
    }

    // 5. 获取服务信息
    svc := &corev1.Service{}
    if err := r.Get(ctx, types.NamespacedName{
       Name:      fmt.Sprintf("%s-svc", redis.Name),
       Namespace: redis.Namespace,
    }, svc); err != nil {
       // 处理服务不存在的情况
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get service failed: %s", err.Error())
       redis.Status.Phase = cachev1.RedisPhaseError
       return r.Status().Update(ctx, redis)
    }

    // 6. 验证Endpoint可用性
    endpoints := &corev1.Endpoints{}
    if err := r.Get(ctx, types.NamespacedName{
       Name:      svc.Name,
       Namespace: svc.Namespace,
    }, endpoints); err != nil {
       // 处理Endpoint不存在的情况
       r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get endpoints failed: %s", err.Error())
       redis.Status.Phase = cachev1.RedisPhaseError
       return r.Status().Update(ctx, redis)
    }

    // 7. 确保Endpoint有可用的目标
    if len(endpoints.Subsets) == 0 || len(endpoints.Subsets[0].Addresses) == 0 {
       r.Recorder.Event(redis, corev1.EventTypeWarning, RecordReasonFailed, "endpoints have no address")
       redis.Status.Phase = cachev1.RedisPhaseError
       return r.Status().Update(ctx, redis)
    }

    // 8. 获取节点IP(从Endpoint获取)
    nodeIP := endpoints.Subsets[0].Addresses[0].IP
    endpoint := fmt.Sprintf("%s:%d", nodeIP, svc.Spec.Ports[0].TargetPort.IntVal)

    // 9. 成功更新状态
    redis.Status.Endpoint = endpoint
    redis.Status.Phase = cachev1.RedisPhaseReady

    return r.Status().Update(ctx, redis)
}

这里通过对子资源的校验,确认最终整个Redis资源是否Ready。

Recorder

对于k8s自带的一些资源,在使用kubectl describe xxx的时候,异常情况下一般会显示Events信息,这些就是通过record.EventRecorder实现的,在发生异常时,可以打印记录到Event中。注意在main函数中需要新增Recorder的定义Recorder: mgr.GetEventRecorderFor("Redis")

scss 复制代码
if err = (&controller.RedisReconciler{
    Client:   mgr.GetClient(),
    Scheme:   mgr.GetScheme(),
    Recorder: mgr.GetEventRecorderFor("Redis"),
}).SetupWithManager(mgr); err != nil {
    setupLog.Error(err, "unable to create controller", "controller", "Redis")
    os.Exit(1)
}

2.3.3 验证

设置CRD

前面我们已经通过kind搭建了集群,现在我们来验证一下基本功能,第一步就是要将CRD创建到集群中去。首先我们在工程目录执行make install

bash 复制代码
$ make install
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen && /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen --version | grep -q v0.11.3 || \
        GOBIN=/Users/xxx/workspace/github/iguochan/redis-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/kustomize || { curl -Ss "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" --output install_kustomize.sh && bash install_kustomize.sh 5.0.0 /Users/xxx/workspace/github/iguochan/redis-operator/bin; rm install_kustomize.sh; }
/Users/xxx/workspace/github/iguochan/redis-operator/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/redis.cache.iguochan.io created

$ kubectl get crd # 验证CRD
NAME                      CREATED AT
redis.cache.iguochan.io   2025-06-14T14:19:13Z

这时候可以发现CRD定义已经有了,也可以查看资源:

bash 复制代码
$ kubectl api-resources | grep redis
redis                                          cache.iguochan.io/v1                   true         Redis
本地起Controller
bash 复制代码
$ make run                                                                                                                                                                   130 ↵
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen && /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen --version | grep -q v0.11.3 || \
        GOBIN=/Users/xxx/workspace/github/iguochan/redis-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./cmd/main.go
2025-06-14T22:22:15+08:00       INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2025-06-14T22:22:15+08:00       INFO    setup   starting manager
2025-06-14T22:22:15+08:00       INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2025-06-14T22:22:15+08:00       INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2025-06-14T22:22:15+08:00       INFO    Starting EventSource    {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "source": "kind source: *v1.Redis"}
2025-06-14T22:22:15+08:00       INFO    Starting EventSource    {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "source": "kind source: *v1.StatefulSet"}
2025-06-14T22:22:15+08:00       INFO    Starting EventSource    {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "source": "kind source: *v1.Service"}
2025-06-14T22:22:15+08:00       INFO    Starting Controller     {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis"}
2025-06-14T22:22:15+08:00       INFO    Starting workers        {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "worker count": 1}

可以发现执行make run可以在本地起一个Controller,用于监控CRD,其实为了调试,可以在本地CRD上直接Debug,这个我本人试过了,一点问题没有。

起一个Redis实例

在起实例之前,我们先在集群中创建一个StorageClass如下配置,执行kubectl apply -f redis-storage.yaml

yaml 复制代码
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: redis-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

config/samples/cache_v1_redis.yaml中,我们修改如下:

yaml 复制代码
apiVersion: cache.iguochan.io/v1
kind: Redis
metadata:
  labels:
    app.kubernetes.io/name: redis
    app.kubernetes.io/instance: redis-sample
    app.kubernetes.io/part-of: redis-operator
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/created-by: redis-operator
  name: redis-sample
spec: # 以下为新增
  image: redis:7.0
  nodePort: 31000
  storage:
    size: 1Gi
    hostPath: /data

在这里我们定义了Redis的一些参数,然后执行kubectl apply -f config/samples/cache_v1_redis.yaml,即可生成一个Redis的CRD。

bash 复制代码
$ kubectl get redis
NAME           PHASE     ENDPOINT   IMAGE       AGE
redis-sample   Pending              redis:7.0   4m44s

可以看到,此时Redis的CRD在Pending中,这是为何呢?我们可以如下Debug:

bash 复制代码
$ kubectl describe redis redis-sample
Name:         redis-sample
Namespace:    default
...
Spec:
  Image:      redis:7.0
  Node Port:  31000
  Storage:
    Host Path:  /data
    Size:       1Gi
Status:
  Phase:  Pending
Events:
  Type    Reason   Age                    From   Message
  ----    ------   ----                   ----   -------
  Normal  Waiting  5m22s (x2 over 5m25s)  Redis  not every pod is ready

可以发现,是至少一个Pod没有Ready,这时候我们检查Pod:

bash 复制代码
$ kubectl get pod | grep redis
redis-sample-redis-0   0/1     ImagePullBackOff   0          7m15s

可以发现Pod的镜像拉取出了点问题,应该是没有redis:7.0的镜像,具体可以kubectl describe pod redis-sample-redis-0查看。为了解决这个问题,我们将镜像拉取到kind的节点上:

bash 复制代码
$ kind load docker-image redis:7.0 redis:7.0 --name multi
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-worker", loading...
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-worker2", loading...
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-worker3", loading...
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-control-plane", loading...

这个时候可以检查所有的资源:

bash 复制代码
$ kubectl get redis
NAME           PHASE   ENDPOINT          IMAGE       AGE
redis-sample   Ready   10.244.1.2:6379   redis:7.0   11m

$ kubectl get sts | grep redis
redis-sample-redis   1/1     11m

$ kubectl get svc | grep redis
redis-sample-svc   NodePort    10.96.251.201   <none>        6379:31000/TCP   11m

$ kubectl get pod | grep redis
redis-sample-redis-0   1/1     Running   0          11m

$ kubectl get pv | grep redis
redis-sample-pv   1Gi        RWO            Recycle          Bound    default/redis-data-redis-sample-redis-0   redis-storage            12m

$ kubectl get pvc | grep redis
redis-data-redis-sample-redis-0   Bound    redis-sample-pv   1Gi        RWO            redis-storage   12m
验证

在本机上连接验证:

bash 复制代码
$ redis-cli -h 127.0.0.1 -p 6379
127.0.0.1:6379> get key1
(nil)
127.0.0.1:6379> set key1 hello
OK
127.0.0.1:6379> get key1
"hello"

验证挂载是否起作用,我们先使用kubectl delete redis redis-sample删除redis,这时候发现redis无法连接。

然后我们再启动redis,再次获取,可以发现数据依然可以拿到:

bash 复制代码
127.0.0.1:6379> get key1
"hello"

在本地的挂载目录下可以发现dump.rdb文件,说明挂载成功。

2.4 准入控制

准入控制会在请求通过认证和授权之后、对象被持久化之前拦截到达API服务器的请求,其通过WebHook的方式实现功能,存在两种控制器类型:

  • 变更准入控制器(Mutating):作用于修改请求对象,比如自动自动补充,安全配置注入,敏感数据加密等;
  • 验证准入控制器(Validating):用于校验,比如安全策略,业务规则或者资源配额等;

二者具体的作用可以看下图:

2.4.1 创建webhook

bash 复制代码
$ kubebuilder create webhook --group cache --version v1 --kind Redis --defaulting --programmatic-validation

使用上面的指令创建webhook,然后会发现工程中新增了webhookcertmanager相关的文件。

2.4.2 cert-manager

因为webhook需要被API Server通过HTTPS安全调用,所以需要启用cert-manager相关的配置,打开config/default/kustomization.yaml,将Uncomment字段下的内容取消注释,以及webhook相关的字段取消注释:

yaml 复制代码
# Adds namespace to all resources.
namespace: redis-operator-system

# Value of this field is prepended to the
# names of all resources, e.g. a deployment named
# "wordpress" becomes "alices-wordpress".
# Note that it should also match with the prefix (text before '-') of the namespace
# field above.
namePrefix: redis-operator-

# Labels to add to all resources and selectors.
#labels:
#- includeSelectors: true
#  pairs:
#    someName: someValue

resources:
- ../crd
- ../rbac
- ../manager
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus

patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, please comment the following line.
- manager_auth_proxy_patch.yaml



# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- manager_webhook_patch.yaml

# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'.
# Uncomment 'CERTMANAGER' sections in crd/kustomization.yaml to enable the CA injection in the admission webhooks.
# 'CERTMANAGER' needs to be enabled to use ca injection
- webhookcainjection_patch.yaml

# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
# Uncomment the following replacements to add the cert-manager CA injection annotations
replacements:
  - source: # Add cert-manager annotation to ValidatingWebhookConfiguration, MutatingWebhookConfiguration and CRDs
      kind: Certificate
      group: cert-manager.io
      version: v1
      name: serving-cert # this name should match the one in certificate.yaml
      fieldPath: .metadata.namespace # namespace of the certificate CR
    targets:
      - select:
          kind: ValidatingWebhookConfiguration
        fieldPaths:
          - .metadata.annotations.[cert-manager.io/inject-ca-from]
        options:
          delimiter: '/'
          index: 0
          create: true
      - select:
          kind: MutatingWebhookConfiguration
        fieldPaths:
          - .metadata.annotations.[cert-manager.io/inject-ca-from]
        options:
          delimiter: '/'
          index: 0
          create: true
      - select:
          kind: CustomResourceDefinition
        fieldPaths:
          - .metadata.annotations.[cert-manager.io/inject-ca-from]
        options:
          delimiter: '/'
          index: 0
          create: true
  - source:
      kind: Certificate
      group: cert-manager.io
      version: v1
      name: serving-cert # this name should match the one in certificate.yaml
      fieldPath: .metadata.name
    targets:
      - select:
          kind: ValidatingWebhookConfiguration
        fieldPaths:
          - .metadata.annotations.[cert-manager.io/inject-ca-from]
        options:
          delimiter: '/'
          index: 1
          create: true
      - select:
          kind: MutatingWebhookConfiguration
        fieldPaths:
          - .metadata.annotations.[cert-manager.io/inject-ca-from]
        options:
          delimiter: '/'
          index: 1
          create: true
      - select:
          kind: CustomResourceDefinition
        fieldPaths:
          - .metadata.annotations.[cert-manager.io/inject-ca-from]
        options:
          delimiter: '/'
          index: 1
          create: true
  - source: # Add cert-manager annotation to the webhook Service
      kind: Service
      version: v1
      name: webhook-service
      fieldPath: .metadata.name # namespace of the service
    targets:
      - select:
          kind: Certificate
          group: cert-manager.io
          version: v1
        fieldPaths:
          - .spec.dnsNames.0
          - .spec.dnsNames.1
        options:
          delimiter: '.'
          index: 0
          create: true
  - source:
      kind: Service
      version: v1
      name: webhook-service
      fieldPath: .metadata.namespace # namespace of the service
    targets:
      - select:
          kind: Certificate
          group: cert-manager.io
          version: v1
        fieldPaths:
          - .spec.dnsNames.0
          - .spec.dnsNames.1
        options:
          delimiter: '.'
          index: 1
          create: true

同样的,因为其运行需要证书校验,kubebuilder 官方建议我们使用 cert-manager 简化对证书的管理,所以我们先部署一下 cert-manager 的服务并验证:

bash 复制代码
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.yaml

$ kubectl get pods -ncert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5d6bc46969-n9znn              1/1     Running   0          37s
cert-manager-cainjector-7d8b8bb6b8-52vq7   1/1     Running   0          37s
cert-manager-webhook-5c5c5bb457-nd5g5      1/1     Running   0          37s

2.4.3 实现Mutating接口

api/v1/redis_webhook.go中,我们对Default函数进行如下修改:

go 复制代码
// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *Redis) Default() {
    redislog.Info("default", "name", r.Name)

    // 设置默认镜像
    if r.Spec.Image == "" {
       r.Spec.Image = "redis:7.0"
       redislog.Info("Setting default image", "image", r.Spec.Image)
    }

    // 设置默认端口
    if r.Spec.NodePort == 0 {
       r.Spec.NodePort = 31000
       redislog.Info("Setting default NodePort", "nodePort", r.Spec.NodePort)
    }// 设置默认存储路径
    if r.Spec.Storage.HostPath == "" {
       r.Spec.Storage.HostPath = "/data"
       redislog.Info("Setting default host path", "hostPath", r.Spec.Storage.HostPath)
    }

    // 设置默认存储大小
    if r.Spec.Storage.Size.IsZero() {
       size := resource.MustParse("1Gi")
       r.Spec.Storage.Size = size
       redislog.Info("Setting default storage size", "size", size.String())
    }
}

2.4.4 实现Validating接口

同样的,在api/v1/redis_webhook.go中,我们对Validate相关接口进行修改:

go 复制代码
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *Redis) ValidateCreate() error {
    redislog.Info("validate create", "name", r.Name)

    return r.validateRedis(r)
}

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *Redis) ValidateUpdate(old runtime.Object) error {
    redislog.Info("validate update", "name", r.Name)

    oldRedis, ok := old.(*Redis)
    if !ok {
       return fmt.Errorf("expected a Redis object but got %T", old)
    }

    if err := r.validateRedis(r); err != nil {
       return err
    }

    // 验证禁止修改的字段
    if oldRedis.Spec.Image != r.Spec.Image {
       return field.Forbidden(
          field.NewPath("spec", "image"),
          "image cannot be changed after creation",
       )
    }

    if oldRedis.Spec.Storage.HostPath != r.Spec.Storage.HostPath {
       return field.Forbidden(
          field.NewPath("spec", "storage", "hostPath"),
          "hostPath cannot be changed after creation",
       )
    }

    return nil
}

// ValidateDelete implements webhook.Validator so a webhook will be registered for the type
func (r *Redis) ValidateDelete() error {
    redislog.Info("validate delete", "name", r.Name)
    
    return nil
}

// validateRedis 执行具体验证逻辑
func (r *Redis) validateRedis(redis *Redis) error {
    allErrs := field.ErrorList{}

    // 验证存储大小范围
    minSize := resource.MustParse(MinStorageSize)
    maxSize := resource.MustParse(MaxStorageSize)

    if redis.Spec.Storage.Size.Cmp(minSize) < 0 {
       allErrs = append(allErrs, field.Invalid(
          field.NewPath("spec", "storage", "size"),
          redis.Spec.Storage.Size.String(),
          fmt.Sprintf("storage size must be at least %s", MinStorageSize),
       ))
    }

    if redis.Spec.Storage.Size.Cmp(maxSize) > 0 {
       allErrs = append(allErrs, field.Invalid(
          field.NewPath("spec", "storage", "size"),
          redis.Spec.Storage.Size.String(),
          fmt.Sprintf("storage size must be no more than %s", MaxStorageSize),
       ))
    }

    // 验证端口范围
    if redis.Spec.NodePort < 30000 || redis.Spec.NodePort > 32767 {
       allErrs = append(allErrs, field.Invalid(
          field.NewPath("spec", "nodePort"),
          redis.Spec.NodePort,
          "nodePort must be between 30000 and 32767",
       ))
    }

    // 验证主机路径安全
    if !isValidHostPath(redis.Spec.Storage.HostPath) {
       allErrs = append(allErrs, field.Invalid(
          field.NewPath("spec", "storage", "hostPath"),
          redis.Spec.Storage.HostPath,
          "invalid host path, only /data directory is allowed",
       ))
    }

    if len(allErrs) == 0 {
       return nil
    }

    return allErrs.ToAggregate()
}

// 验证主机路径安全的辅助函数
func isValidHostPath(path string) bool {
    // 这里简化处理,实际应根据安全策略制定
    allowedPaths := []string{"/data", "/data/", "/mnt/data"}
    for _, p := range allowedPaths {
       if path == p {
          return true
       }
    }
    return false
}

2.4.5 验证

在验证之前,我们有三个地方需要修改:

一个是在Makefile文件中,IMG默认定义为controller:latest,这里我们将其修改为redis-controller:latest

另一个是在config/manager/manager.yaml中,需要修改两个部分:

  • image: controller:latest修改为image: redis-controller:latest
  • image下面新增imagePullPolicy: IfNotPresent

这是因为kind部署集群的缘故,所以镜像只能先kind load上去。

还有一点是,因为版本的问题,在执行make docker-build时,会通过go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest安装,会发现版本不匹配,为此我们需要修改Makefile

makefile 复制代码
.PHONY: envtest
envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
$(ENVTEST): $(LOCALBIN)
    test -s $(LOCALBIN)/setup-envtest || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-runtime/tools/[email protected]

然后执行make docker-build时即可打包镜像,可以查看如下:

bash 复制代码
$ docker images | grep redis-controller
redis-controller                                      latest    2e0405615510   About a minute ago   50.1MB

再执行kind load docker-image redis-controller:latest redis-controller:latest --name multi上传镜像。

最后执行make deploy在集群上部署Operator,可以发现Operator已经运行:

bash 复制代码
$ kubectl get pod -nredis-operator-system
NAME                                                 READY   STATUS    RESTARTS   AGE
redis-operator-controller-manager-6d4c5546bd-jg44p   2/2     Running   0          19s

这时候我们执行kubectl apply -f config/samples/cache_v1_redis.yaml指令,可以起一个Redis的CRD,但是kubectl logs -f查看operator的代码时会发现,没有权限对Pod进行List,这是因为我们在最开始进行权限设置时忘记设置了,需要在redis_controller.go中补齐以下5-6两行:

go 复制代码
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch
//+kubebuilder:rbac:groups=core,resources=endpoints,verbs=get;list;watch
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete

然后执行make undeploy撤销一切,再执行make installmake deploy之后即可实现对Redis的管控。

为了验证Default函数,我们kubectl delete redis redis-sample之后,将config/samples/cache_v1_redis.yaml修改成如下:

yaml 复制代码
apiVersion: cache.iguochan.io/v1
kind: Redis
metadata:
  labels:
    app.kubernetes.io/name: redis
    app.kubernetes.io/instance: redis-sample
    app.kubernetes.io/part-of: redis-operator
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/created-by: redis-operator
  name: redis-sample
spec:
#  image: redis:7.0
#  nodePort: 31000
#  storage:
#    size: 1Gi
#    hostPath: /data

然后再执行kubectl apply -f config/samples/cache_v1_redis.yaml之后即可发现,redis创建成功,且都是根据默认值创建。

bash 复制代码
$ k describe redis redis-sample
Name:         redis-sample
Namespace:    default
...
Spec:
  Image:      redis:7.0
  Node Port:  31000
  Storage:
    Host Path:  /data
    Size:       1Gi
Status:
  Endpoint:  10.244.2.9:6379
  Phase:     Ready
Events:      <none>

我们再验证一下其他的,比如我们将存储的size设置为100Gi,则会报错admission webhook "vredis.kb.io" denied the request: spec.storage.size: Invalid value: "100Gi": storage size must be no more than 10Gi

比如我们将镜像修改为redis:6.0,则会报错admission webhook "vredis.kb.io" denied the request: spec.image: Forbidden: image cannot be changed after creation

到这里,我们基本完成了一个Operator的搭建工作。

3. 参考文献

使用kubebuilder开发operator详解

DeepSeek

Redis Operator

相关推荐
幻灭行度5 小时前
CKA考试知识点分享(16)---cri-dockerd
kubernetes
dsd233312 小时前
K8S 专栏 —— Pod 篇
docker·容器·kubernetes
风清再凯13 小时前
k8s的开篇学习和安装
学习·容器·kubernetes
listhi52015 小时前
k8s使用私有harbor镜像源
java·docker·kubernetes
程序员阿超的博客17 小时前
云原生核心技术 (9/12): K8s 实战:如何管理应用的配置 (ConfigMap/Secret) 与数据 (Volume)?
云原生·容器·kubernetes
ahhhhaaaa-1 天前
【k8s】阿里云ACK服务中GPU实例部署问题
阿里云·云原生·容器·kubernetes·云计算
955.2 天前
k8s从入门到放弃之k3s轻量级
云原生·容器·kubernetes