0. 简介
k8s内置的资源类型,可以满足绝大部分的需求,然而对于追求更高自由度的特殊需求下,用户可以使用CRD(CustomResourceDefinition)实现,无需修改原生代码,只需向APIServer注入自定义资源方式,通过拓展的方式实现自身需求。
而K8S Operator
就是一种软件拓展模式,DeepSeek 总结Operator = Controller + CRD + 领域知识
,通过 Operator 可以定义 CRD 来管理特定资源,并且通过 Deployment 的形式部署到K8S中。Operator一般会有以下组成:
CRD
:自定义资源,声明式的API,程序会通过该定义一直让最小调度单元(POD)趋近于该状态;Controller
:控制器,监视资源的创建/更新/删除
事件,并触发Reconcile
(可译作:调谐)函数作为响应,目的就是为了让最小单元趋近于最终状态;AdmissionWebhook
:用来拦截非法请求、保证配置一致性及减少复杂校验逻辑等。
所以,我这里就以我们平常用到很多的 Redis 组件为例,实现一个 Redis Operator
,用于对集群中k8s部署Redis的管理,也学一下Operator的整体流程和创建、部署等。
1. 开发环境
1.1 工具版本
go
:v1.20.2kubebuilder
:v3.10.0kind
:v0.20.0
1.2 集群搭建
因为redis需要1)对外暴露端口;2)保证数据的持久化,所以我们需要将存储挂载出来;因此我们这里使用如下配置启动kind:
yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
# port forward 80 on the host to 80 on this node
extraPortMappings:
- containerPort: 30950
hostPort: 80
# optional: set the bind address on the host
# 0.0.0.0 is the current default
listenAddress: "127.0.0.1"
# optional: set the protocol to one of TCP, UDP, SCTP.
# TCP is the default
protocol: TCP
- containerPort: 31000
hostPort: 6379
protocol: TCP
- role: worker
extraMounts:
# 主机目录映射到节点容器
- hostPath: /Users/xxx/workspace/k8s/kind/data
containerPath: /data
- role: worker
extraMounts:
# 主机目录映射到节点容器
- hostPath: /Users/xxx/workspace/k8s/kind/data
containerPath: /data
- role: worker
extraMounts:
# 主机目录映射到节点容器
- hostPath: /Users/xxx/workspace/k8s/kind/data
containerPath: /data
然后使用kind create cluster --name multi --config multi.yaml
命令创建集群。
1.3 kubebuilder安装
因为我这边要安装指定的v3.10.0
版本,如下:
css
$ curl -L -O "https://github.com/kubernetes-sigs/kubebuilder/releases/download/v3.10.0/kubebuilder_darwin_arm64
$ chmod +x kubebuilder_darwin_arm64
$ mv kubebuilder_darwin_arm64 kubebuilder
$ sudo mv kubebuilder /usr/local/bin/
$ kubebuilder version
Version: main.version{KubeBuilderVersion:"3.10.0", KubernetesVendor:"1.26.1", GitCommit:"0fa57405d4a892efceec3c5a902f634277e30732", BuildDate:"2023-04-15T08:10:35Z", GoOs:"darwin", GoArch:"arm64"}
2. Operator开发
2.1 初始化 Redis Operator
项目
如下,我们首先创建文件夹:
bash
$ mkdir redis-operator && cd redis-operator
然后创建Operator项目:
bash
$ kubebuilder init --domain iguochan.io --repo github.com/IguoChan/redis-operator
可以看一下其现在的源码结构:
bash
$ tree
.
├── Dockerfile
├── LICENSE
├── Makefile
├── PROJECT
├── README.md
├── cmd
│ └── main.go
├── config
│ ├── default
│ │ ├── kustomization.yaml
│ │ ├── manager_auth_proxy_patch.yaml
│ │ └── manager_config_patch.yaml
│ ├── manager
│ │ ├── kustomization.yaml
│ │ └── manager.yaml
│ ├── prometheus
│ │ ├── kustomization.yaml
│ │ └── monitor.yaml
│ └── rbac
│ ├── auth_proxy_client_clusterrole.yaml
│ ├── auth_proxy_role.yaml
│ ├── auth_proxy_role_binding.yaml
│ ├── auth_proxy_service.yaml
│ ├── kustomization.yaml
│ ├── leader_election_role.yaml
│ ├── leader_election_role_binding.yaml
│ ├── role_binding.yaml
│ └── service_account.yaml
├── go.mod
├── go.sum
└── hack
└── boilerplate.go.txt
7 directories, 25 files
注意这时候可能会报
go: sigs.k8s.io/[email protected]: verifying go.mod: invalid GOSUMDB: malformed verifier id
错误,这时候只需要提前go env -w GOSUMDB=off
关闭即可,当然也许有其他更优雅的解决方式,咱们暂不深究。
2.2 创建 Redis API
bash
$ kubebuilder create api --group cache --version v1 --kind Redis
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/redis_types.go
api/v1/groupversion_info.go
internal/controller/suite_test.go
internal/controller/redis_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /Users/xxx/workspace/github/iguochan/redis-operator/bin
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen && /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen --version | grep -q v0.11.3 || \
GOBIN=/Users/xxx/workspace/github/iguochan/redis-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests
可以看到,新增了api
、internal/controller
等代码文件,在config的配置中新增了crd
,rbac中也新增了redis
相关的配置。
bash
$ tree
.
├── Dockerfile
├── LICENSE
├── Makefile
├── PROJECT
├── README.md
├── api
│ └── v1
│ ├── groupversion_info.go
│ ├── redis_types.go
│ └── zz_generated.deepcopy.go
├── bin
│ └── controller-gen
├── cmd
│ └── main.go
├── config
│ ├── crd
│ │ ├── kustomization.yaml
│ │ ├── kustomizeconfig.yaml
│ │ └── patches
│ │ ├── cainjection_in_redis.yaml
│ │ └── webhook_in_redis.yaml
│ ├── default
│ │ ├── kustomization.yaml
│ │ ├── manager_auth_proxy_patch.yaml
│ │ └── manager_config_patch.yaml
│ ├── manager
│ │ ├── kustomization.yaml
│ │ └── manager.yaml
│ ├── prometheus
│ │ ├── kustomization.yaml
│ │ └── monitor.yaml
│ ├── rbac
│ │ ├── auth_proxy_client_clusterrole.yaml
│ │ ├── auth_proxy_role.yaml
│ │ ├── auth_proxy_role_binding.yaml
│ │ ├── auth_proxy_service.yaml
│ │ ├── kustomization.yaml
│ │ ├── leader_election_role.yaml
│ │ ├── leader_election_role_binding.yaml
│ │ ├── redis_editor_role.yaml
│ │ ├── redis_viewer_role.yaml
│ │ ├── role_binding.yaml
│ │ └── service_account.yaml
│ └── samples
│ ├── cache_v1_redis.yaml
│ └── kustomization.yaml
├── go.mod
├── go.sum
├── hack
│ └── boilerplate.go.txt
└── internal
└── controller
├── redis_controller.go
└── suite_test.go
15 directories, 39 files
其中,我们在api
目录下的redis_types.go
中定义CRD相关字段;在internal/controller
中的redis_controller.go
中实现Reconcile控制逻辑。
2.3 实现Controller
2.3.1 定义CRD
在api/v1/redis_types.go
中定义Redis的相关配置:
go
/*
Copyright 2025.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package v1
import (
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
type RedisPhase string
const (
RedisPhasePending = "Pending"
RedisPhaseError = "Error"
RedisPhaseReady = "Ready"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// RedisSpec defines the desired state of Redis
type RedisSpec struct {
// Image: Redis Image
// +kubebuilder:default="redis:7.0"
Image string `json:"image,omitempty"`
// NodePort Service
// +kubebuilder:validation:Minimum=30000
// +kubebuilder:validation:Maximum=32767
// +kubebuilder:default=31000
NodePort int32 `json:"nodePort,omitempty"`
// 存储配置
Storage StorageSpec `json:"storage,omitempty"`
}
// StorageSpec 定义 Redis 存储配置
type StorageSpec struct {
// 存储大小
// +kubebuilder:default="1Gi"
Size resource.Quantity `json:"size,omitempty"`
// 主机目录路径
// +kubebuilder:default="/data"
HostPath string `json:"hostPath,omitempty"`
}
// RedisStatus defines the observed state of Redis
type RedisStatus struct {
// 部署状态
Phase RedisPhase `json:"phase,omitempty"`
// Redis 服务端点
Endpoint string `json:"endpoint,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:JSONPath=".status.phase",name=phase,type=string,description="当前阶段"
//+kubebuilder:printcolumn:name="Endpoint",type="string",JSONPath=".status.endpoint",description="访问端点"
//+kubebuilder:printcolumn:name="Image",type="string",JSONPath=".spec.image",description="使用的镜像"
//+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp",description="创建时间"
// Redis is the Schema for the redis API
type Redis struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec RedisSpec `json:"spec,omitempty"`
Status RedisStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// RedisList contains a list of Redis
type RedisList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []Redis `json:"items"`
}
func init() {
SchemeBuilder.Register(&Redis{}, &RedisList{})
}
首先我们定义Redis的镜像,其中// +kubebuilder:default="redis:7.0"
的注释,表示用户在创建Redis时,如果没有显示指定spec.image
字段,系统会自动将其值设置为redis:7.0
。
go
// Image: Redis Image
// +kubebuilder:default="redis:7.0"
Image string `json:"image,omitempty"`
其次是NodePort的设置,为了方便外部调用,我们这里使用NodePort的Service;由于咱们使用的kind搭建的集群,后续在设置集群时会映射31000
的端口,所以在我们的集群中无需设置。
go
// NodePort Service
// +kubebuilder:validation:Minimum=30000
// +kubebuilder:validation:Maximum=32767
// +kubebuilder:default=31000
NodePort int32 `json:"nodePort,omitempty"`
至于存储配置,用于存放redis的data数据的,这里不详述了。
在代码的77-80行,我们定义了一些字段,这些字段会作为redis这个CRD的显示字段,在执行kubectl get redis
时显示出来。如果我们不自定义的话,只会显示NAME
和AGE
两个字段,如果我们新增了自定义字段了之后,AGE
字段需要显示指示出来。
接下来我们执行make manifests
,会发现:
- 新增了
config/crd/bases
目录下的cache.iguochan.io_redis.yaml
文件,其作用是定义Redis
这个CRD,主要包括结构定义、验证规则和行为; - 新增了
config/rbac
目录下的role.yaml
文件,其作用是定义Operator需要的k8s API
权限,以及控制Operator可以访问哪些资源和执行什么操作。
然后我们执行一次make generate
会发现:
- 在
api/v1
的zz_generated.deepcopy.go
文件中,各种字段Deepcopy
的方法都被更新了,该命令实现了自动生成深拷贝。
2.3.2 实现controller
接下来,我们需要实现controller的逻辑了,需要修改internal/controller/redis_controller.go
文件。首先我们需要明确一下我们的方案,在这里的redis方案中:
- 选择Statefulset来管理Pod;
- 使用NodePort的Service暴露端口;
- 使用PV/PVC管理存储;
- 维护Redis状态。
所以我们的Controller中也需要做好这几件事:
go
/*
Copyright 2025.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package controller
import (
"context"
"fmt"
cachev1 "github.com/IguoChan/redis-operator/api/v1"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/client-go/tools/record"
"k8s.io/utils/pointer"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
)
// RedisReconciler reconciles a Redis object
type RedisReconciler struct {
client.Client
Scheme *runtime.Scheme
Recorder record.EventRecorder
}
const (
RecordReasonFailed = "Failed"
RecordReasonWaiting = "Waiting"
)
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Redis object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/[email protected]/pkg/reconcile
func (r *RedisReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 获取 Redis 实例
redis := &cachev1.Redis{}
if err := r.Get(ctx, req.NamespacedName, redis); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 确保 StatefulSet 存在
if err := r.reconcileStatefulSet(ctx, redis); err != nil {
logger.Error(err, "reconcileStatefulSet failed")
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileStatefulSet failed: %s", err.Error())
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
// 确保 Service 存在
if err := r.reconcileService(ctx, redis); err != nil {
logger.Error(err, "reconcileService failed")
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileService failed: %s", err.Error())
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
// 确保 PVC 存在
if err := r.reconcilePVC(ctx, redis); err != nil {
logger.Error(err, "reconcilePVC failed")
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcilePVC failed: %s", err.Error())
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
// 更新状态
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
func (r *RedisReconciler) reconcileStatefulSet(ctx context.Context, redis *cachev1.Redis) error {
logger := log.FromContext(ctx)
desired := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-redis", redis.Name),
Namespace: redis.Namespace,
},
Spec: appsv1.StatefulSetSpec{
ServiceName: fmt.Sprintf("%s-svc", redis.Name),
Replicas: pointer.Int32(1), // Standalone 模式
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": "redis",
"name": redis.Name,
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": "redis",
"name": redis.Name,
},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "redis",
Image: redis.Spec.Image,
ImagePullPolicy: corev1.PullIfNotPresent,
Ports: []corev1.ContainerPort{
{
Name: "redis",
ContainerPort: 6379,
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "redis-data",
MountPath: "/data",
},
},
},
},
},
},
VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
{
ObjectMeta: metav1.ObjectMeta{
Name: "redis-data",
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{
corev1.ReadWriteOnce,
},
Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: redis.Spec.Storage.Size,
},
},
StorageClassName: pointer.String("redis-storage"),
},
},
},
},
}
// 设置 OwnerReference
if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
return err
}
existing := &appsv1.StatefulSet{}
err := r.Get(ctx, types.NamespacedName{
Name: desired.Name,
Namespace: desired.Namespace,
}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating StatefulSet", "name", desired.Name)
return r.Create(ctx, desired)
} else if err != nil {
return err
}
// 更新 StatefulSet (简化处理)
logger.Info("Updating StatefulSet", "name", desired.Name)
desired.Spec.DeepCopyInto(&existing.Spec)
return r.Update(ctx, existing)
}
func (r *RedisReconciler) reconcileService(ctx context.Context, redis *cachev1.Redis) error {
logger := log.FromContext(ctx)
desired := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-svc", redis.Name),
Namespace: redis.Namespace,
},
Spec: corev1.ServiceSpec{
Type: corev1.ServiceTypeNodePort,
Selector: map[string]string{
"app": "redis",
"name": redis.Name,
},
Ports: []corev1.ServicePort{
{
Name: "redis",
Port: 6379,
TargetPort: intstr.FromInt(6379),
NodePort: redis.Spec.NodePort,
},
},
},
}
// 设置 OwnerReference
if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
return err
}
existing := &corev1.Service{}
err := r.Get(ctx, types.NamespacedName{
Name: desired.Name,
Namespace: desired.Namespace,
}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating Service", "name", desired.Name)
return r.Create(ctx, desired)
} else if err != nil {
return err
}
// 更新 Service (只更新端口)
if existing.Spec.Ports[0].NodePort != desired.Spec.Ports[0].NodePort {
logger.Info("Updating Service port", "name", desired.Name)
existing.Spec.Ports = desired.Spec.Ports
return r.Update(ctx, existing)
}
return nil
}
func (r *RedisReconciler) reconcilePVC(ctx context.Context, redis *cachev1.Redis) error {
logger := log.FromContext(ctx)
pvName := fmt.Sprintf("%s-pv", redis.Name)
// 创建 PV (使用 HostPath)
pv := &corev1.PersistentVolume{
ObjectMeta: metav1.ObjectMeta{
Name: pvName,
},
Spec: corev1.PersistentVolumeSpec{
Capacity: corev1.ResourceList{
corev1.ResourceStorage: redis.Spec.Storage.Size,
},
AccessModes: []corev1.PersistentVolumeAccessMode{
corev1.ReadWriteOnce,
},
PersistentVolumeReclaimPolicy: corev1.PersistentVolumeReclaimRecycle,
StorageClassName: "redis-storage",
PersistentVolumeSource: corev1.PersistentVolumeSource{
HostPath: &corev1.HostPathVolumeSource{
Path: redis.Spec.Storage.HostPath,
},
},
},
}
if err := r.Create(ctx, pv); err != nil && !errors.IsAlreadyExists(err) {
logger.Error(err, "create pv failed")
return err
}
// 更新 PVC 指向创建的 PV (在 StatefulSet 中会自动创建 PVC)
return nil
}
func (r *RedisReconciler) updateStatus(ctx context.Context, redis *cachev1.Redis) error {
// 1. 获取关联的Pod
podList := &corev1.PodList{}
if err := r.List(ctx, podList, client.MatchingLabels{
"name": redis.Name,
}); err != nil {
return err
}
// 2. 检查Pod状态
if len(podList.Items) == 0 {
r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "no pod is ready")
redis.Status.Phase = cachev1.RedisPhasePending
return r.Status().Update(ctx, redis)
}
// 3. 确保所有Pod都处于Ready状态
allPodsReady := true
for _, pod := range podList.Items {
isPodReady := false
for _, cond := range pod.Status.Conditions {
if cond.Type == corev1.PodReady && cond.Status == corev1.ConditionTrue {
isPodReady = true
break
}
}
if !isPodReady {
allPodsReady = false
break
}
}
// 4. 如果Pod没准备好,直接返回
if !allPodsReady {
r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "not every pod is ready")
redis.Status.Phase = cachev1.RedisPhasePending
return r.Status().Update(ctx, redis)
}
// 5. 获取服务信息
svc := &corev1.Service{}
if err := r.Get(ctx, types.NamespacedName{
Name: fmt.Sprintf("%s-svc", redis.Name),
Namespace: redis.Namespace,
}, svc); err != nil {
// 处理服务不存在的情况
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get service failed: %s", err.Error())
redis.Status.Phase = cachev1.RedisPhaseError
return r.Status().Update(ctx, redis)
}
// 6. 验证Endpoint可用性
endpoints := &corev1.Endpoints{}
if err := r.Get(ctx, types.NamespacedName{
Name: svc.Name,
Namespace: svc.Namespace,
}, endpoints); err != nil {
// 处理Endpoint不存在的情况
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get endpoints failed: %s", err.Error())
redis.Status.Phase = cachev1.RedisPhaseError
return r.Status().Update(ctx, redis)
}
// 7. 确保Endpoint有可用的目标
if len(endpoints.Subsets) == 0 || len(endpoints.Subsets[0].Addresses) == 0 {
r.Recorder.Event(redis, corev1.EventTypeWarning, RecordReasonFailed, "endpoints have no address")
redis.Status.Phase = cachev1.RedisPhaseError
return r.Status().Update(ctx, redis)
}
// 8. 获取节点IP(从Endpoint获取)
nodeIP := endpoints.Subsets[0].Addresses[0].IP
endpoint := fmt.Sprintf("%s:%d", nodeIP, svc.Spec.Ports[0].TargetPort.IntVal)
// 9. 成功更新状态
redis.Status.Endpoint = endpoint
redis.Status.Phase = cachev1.RedisPhaseReady
return r.Status().Update(ctx, redis)
}
// SetupWithManager sets up the controller with the Manager.
func (r *RedisReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&cachev1.Redis{}).
Owns(&appsv1.StatefulSet{}).
Owns(&corev1.Service{}).
Complete(r)
}
接下来我们分几段来分析一下这些代码是做什么的:
基本逻辑
首先我们看一下Reconcile
函数:
go
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Redis object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/[email protected]/pkg/reconcile
func (r *RedisReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 获取 Redis 实例
redis := &cachev1.Redis{}
if err := r.Get(ctx, req.NamespacedName, redis); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 确保 StatefulSet 存在
if err := r.reconcileStatefulSet(ctx, redis); err != nil {
logger.Error(err, "reconcileStatefulSet failed")
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileStatefulSet failed: %s", err.Error())
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
// 确保 Service 存在
if err := r.reconcileService(ctx, redis); err != nil {
logger.Error(err, "reconcileService failed")
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcileService failed: %s", err.Error())
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
// 确保 PVC 存在
if err := r.reconcilePVC(ctx, redis); err != nil {
logger.Error(err, "reconcilePVC failed")
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "reconcilePVC failed: %s", err.Error())
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
// 更新状态
return ctrl.Result{}, r.updateStatus(ctx, redis)
}
Reconcile
是CRD状态的调谐引擎,其主要通过循环比较实际状态(CRD Status)和期望状态(CRD Spec),执行操作使其趋于一致。以上代码的4-6行,用于赋予Operator操作其子资源的权限。其他代码可以理解成对不同子资源的控制。除了Reconcile
,SetupWithManager
也是重要的组件:
- For:声明监控主要资源类型;
- Owns:资源所有权声明,建立父子关系;
- WithOptions:调优参数设置,比如并发控制、超时设置等;
go
// SetupWithManager sets up the controller with the Manager.
func (r *RedisReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&cachev1.Redis{}).
Owns(&appsv1.StatefulSet{}).
Owns(&corev1.Service{}).
Complete(r)
}
Statefulset
go
func (r *RedisReconciler) reconcileStatefulSet(ctx context.Context, redis *cachev1.Redis) error {
logger := log.FromContext(ctx)
desired := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-redis", redis.Name),
Namespace: redis.Namespace,
},
Spec: appsv1.StatefulSetSpec{
ServiceName: fmt.Sprintf("%s-svc", redis.Name),
Replicas: pointer.Int32(1), // Standalone 模式
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": "redis",
"name": redis.Name,
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": "redis",
"name": redis.Name,
},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "redis",
Image: redis.Spec.Image,
ImagePullPolicy: corev1.PullIfNotPresent,
Ports: []corev1.ContainerPort{
{
Name: "redis",
ContainerPort: 6379,
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "redis-data",
MountPath: "/data",
},
},
},
},
},
},
VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
{
ObjectMeta: metav1.ObjectMeta{
Name: "redis-data",
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{
corev1.ReadWriteOnce,
},
Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: redis.Spec.Storage.Size,
},
},
StorageClassName: pointer.String("redis-storage"),
},
},
},
},
}
// 设置 OwnerReference
if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
return err
}
existing := &appsv1.StatefulSet{}
err := r.Get(ctx, types.NamespacedName{
Name: desired.Name,
Namespace: desired.Namespace,
}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating StatefulSet", "name", desired.Name)
return r.Create(ctx, desired)
} else if err != nil {
return err
}
// 更新 StatefulSet (简化处理)
logger.Info("Updating StatefulSet", "name", desired.Name)
desired.Spec.DeepCopyInto(&existing.Spec)
return r.Update(ctx, existing)
}
其基本逻辑是:
- 创建一个
desired
期望对象; - 设置
OwnerReference
; - 查看是否存在
existing
对象,不存在则创建desired
,存在则复制existing.Spec
到existing.Spec
,然后更新对象,让k8s自带的Controller去调谐,去产生Pod;
需要注意ImagePullPolicy: corev1.PullIfNotPresent
,因为我们是kind搭建的集群中,需要将docker镜像load到集群中,所以这里选择PullIfNotPresent
。
Service
go
func (r *RedisReconciler) reconcileService(ctx context.Context, redis *cachev1.Redis) error {
logger := log.FromContext(ctx)
desired := &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-svc", redis.Name),
Namespace: redis.Namespace,
},
Spec: corev1.ServiceSpec{
Type: corev1.ServiceTypeNodePort,
Selector: map[string]string{
"app": "redis",
"name": redis.Name,
},
Ports: []corev1.ServicePort{
{
Name: "redis",
Port: 6379,
TargetPort: intstr.FromInt(6379),
NodePort: redis.Spec.NodePort,
},
},
},
}
// 设置 OwnerReference
if err := ctrl.SetControllerReference(redis, desired, r.Scheme); err != nil {
return err
}
existing := &corev1.Service{}
err := r.Get(ctx, types.NamespacedName{
Name: desired.Name,
Namespace: desired.Namespace,
}, existing)
if errors.IsNotFound(err) {
logger.Info("Creating Service", "name", desired.Name)
return r.Create(ctx, desired)
} else if err != nil {
return err
}
// 更新 Service (只更新端口)
if existing.Spec.Ports[0].NodePort != desired.Spec.Ports[0].NodePort {
logger.Info("Updating Service port", "name", desired.Name)
existing.Spec.Ports = desired.Spec.Ports
return r.Update(ctx, existing)
}
return nil
}
基本逻辑和Statefulset
差不多,都是期待值和现存值的对比,然后选择创建或者更新。注意使用Selector
和Pod的label匹配。
pv/pvc
go
func (r *RedisReconciler) reconcilePVC(ctx context.Context, redis *cachev1.Redis) error {
logger := log.FromContext(ctx)
pvName := fmt.Sprintf("%s-pv", redis.Name)
// 创建 PV (使用 HostPath)
pv := &corev1.PersistentVolume{
ObjectMeta: metav1.ObjectMeta{
Name: pvName,
},
Spec: corev1.PersistentVolumeSpec{
Capacity: corev1.ResourceList{
corev1.ResourceStorage: redis.Spec.Storage.Size,
},
AccessModes: []corev1.PersistentVolumeAccessMode{
corev1.ReadWriteOnce,
},
PersistentVolumeReclaimPolicy: corev1.PersistentVolumeReclaimRecycle,
StorageClassName: "redis-storage",
PersistentVolumeSource: corev1.PersistentVolumeSource{
HostPath: &corev1.HostPathVolumeSource{
Path: redis.Spec.Storage.HostPath,
},
},
},
}
if err := r.Create(ctx, pv); err != nil && !errors.IsAlreadyExists(err) {
logger.Error(err, "create pv failed")
return err
}
// 更新 PVC 指向创建的 PV (在 StatefulSet 中会自动创建 PVC)
return nil
}
这里说是reconcilePVC
,其实是创建PV,因为PVC在Statefulset里面已经创建了,这里没有设置。注意:如果需要再删除redis的时候删除相关的PV/PVC资源,那么需要额外的逻辑。
更新状态
go
func (r *RedisReconciler) updateStatus(ctx context.Context, redis *cachev1.Redis) error {
// 1. 获取关联的Pod
podList := &corev1.PodList{}
if err := r.List(ctx, podList, client.MatchingLabels{
"name": redis.Name,
}); err != nil {
return err
}
// 2. 检查Pod状态
if len(podList.Items) == 0 {
r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "no pod is ready")
redis.Status.Phase = cachev1.RedisPhasePending
return r.Status().Update(ctx, redis)
}
// 3. 确保所有Pod都处于Ready状态
allPodsReady := true
for _, pod := range podList.Items {
isPodReady := false
for _, cond := range pod.Status.Conditions {
if cond.Type == corev1.PodReady && cond.Status == corev1.ConditionTrue {
isPodReady = true
break
}
}
if !isPodReady {
allPodsReady = false
break
}
}
// 4. 如果Pod没准备好,直接返回
if !allPodsReady {
r.Recorder.Event(redis, corev1.EventTypeNormal, RecordReasonWaiting, "not every pod is ready")
redis.Status.Phase = cachev1.RedisPhasePending
return r.Status().Update(ctx, redis)
}
// 5. 获取服务信息
svc := &corev1.Service{}
if err := r.Get(ctx, types.NamespacedName{
Name: fmt.Sprintf("%s-svc", redis.Name),
Namespace: redis.Namespace,
}, svc); err != nil {
// 处理服务不存在的情况
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get service failed: %s", err.Error())
redis.Status.Phase = cachev1.RedisPhaseError
return r.Status().Update(ctx, redis)
}
// 6. 验证Endpoint可用性
endpoints := &corev1.Endpoints{}
if err := r.Get(ctx, types.NamespacedName{
Name: svc.Name,
Namespace: svc.Namespace,
}, endpoints); err != nil {
// 处理Endpoint不存在的情况
r.Recorder.Eventf(redis, corev1.EventTypeWarning, RecordReasonFailed, "get endpoints failed: %s", err.Error())
redis.Status.Phase = cachev1.RedisPhaseError
return r.Status().Update(ctx, redis)
}
// 7. 确保Endpoint有可用的目标
if len(endpoints.Subsets) == 0 || len(endpoints.Subsets[0].Addresses) == 0 {
r.Recorder.Event(redis, corev1.EventTypeWarning, RecordReasonFailed, "endpoints have no address")
redis.Status.Phase = cachev1.RedisPhaseError
return r.Status().Update(ctx, redis)
}
// 8. 获取节点IP(从Endpoint获取)
nodeIP := endpoints.Subsets[0].Addresses[0].IP
endpoint := fmt.Sprintf("%s:%d", nodeIP, svc.Spec.Ports[0].TargetPort.IntVal)
// 9. 成功更新状态
redis.Status.Endpoint = endpoint
redis.Status.Phase = cachev1.RedisPhaseReady
return r.Status().Update(ctx, redis)
}
这里通过对子资源的校验,确认最终整个Redis资源是否Ready。
Recorder
对于k8s自带的一些资源,在使用kubectl describe xxx
的时候,异常情况下一般会显示Events信息,这些就是通过record.EventRecorder
实现的,在发生异常时,可以打印记录到Event中。注意在main函数中需要新增Recorder的定义Recorder: mgr.GetEventRecorderFor("Redis")
。
scss
if err = (&controller.RedisReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("Redis"),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Redis")
os.Exit(1)
}
2.3.3 验证
设置CRD
前面我们已经通过kind搭建了集群,现在我们来验证一下基本功能,第一步就是要将CRD创建到集群中去。首先我们在工程目录执行make install
:
bash
$ make install
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen && /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen --version | grep -q v0.11.3 || \
GOBIN=/Users/xxx/workspace/github/iguochan/redis-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/kustomize || { curl -Ss "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" --output install_kustomize.sh && bash install_kustomize.sh 5.0.0 /Users/xxx/workspace/github/iguochan/redis-operator/bin; rm install_kustomize.sh; }
/Users/xxx/workspace/github/iguochan/redis-operator/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/redis.cache.iguochan.io created
$ kubectl get crd # 验证CRD
NAME CREATED AT
redis.cache.iguochan.io 2025-06-14T14:19:13Z
这时候可以发现CRD定义已经有了,也可以查看资源:
bash
$ kubectl api-resources | grep redis
redis cache.iguochan.io/v1 true Redis
本地起Controller
bash
$ make run 130 ↵
test -s /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen && /Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen --version | grep -q v0.11.3 || \
GOBIN=/Users/xxx/workspace/github/iguochan/redis-operator/bin go install sigs.k8s.io/controller-tools/cmd/[email protected]
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/Users/xxx/workspace/github/iguochan/redis-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./cmd/main.go
2025-06-14T22:22:15+08:00 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
2025-06-14T22:22:15+08:00 INFO setup starting manager
2025-06-14T22:22:15+08:00 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2025-06-14T22:22:15+08:00 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
2025-06-14T22:22:15+08:00 INFO Starting EventSource {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "source": "kind source: *v1.Redis"}
2025-06-14T22:22:15+08:00 INFO Starting EventSource {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "source": "kind source: *v1.StatefulSet"}
2025-06-14T22:22:15+08:00 INFO Starting EventSource {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "source": "kind source: *v1.Service"}
2025-06-14T22:22:15+08:00 INFO Starting Controller {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis"}
2025-06-14T22:22:15+08:00 INFO Starting workers {"controller": "redis", "controllerGroup": "cache.iguochan.io", "controllerKind": "Redis", "worker count": 1}
可以发现执行make run
可以在本地起一个Controller,用于监控CRD,其实为了调试,可以在本地CRD上直接Debug,这个我本人试过了,一点问题没有。
起一个Redis实例
在起实例之前,我们先在集群中创建一个StorageClass
如下配置,执行kubectl apply -f redis-storage.yaml
:
yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: redis-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
在config/samples/cache_v1_redis.yaml
中,我们修改如下:
yaml
apiVersion: cache.iguochan.io/v1
kind: Redis
metadata:
labels:
app.kubernetes.io/name: redis
app.kubernetes.io/instance: redis-sample
app.kubernetes.io/part-of: redis-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: redis-operator
name: redis-sample
spec: # 以下为新增
image: redis:7.0
nodePort: 31000
storage:
size: 1Gi
hostPath: /data
在这里我们定义了Redis的一些参数,然后执行kubectl apply -f config/samples/cache_v1_redis.yaml
,即可生成一个Redis的CRD。
bash
$ kubectl get redis
NAME PHASE ENDPOINT IMAGE AGE
redis-sample Pending redis:7.0 4m44s
可以看到,此时Redis的CRD在Pending中,这是为何呢?我们可以如下Debug:
bash
$ kubectl describe redis redis-sample
Name: redis-sample
Namespace: default
...
Spec:
Image: redis:7.0
Node Port: 31000
Storage:
Host Path: /data
Size: 1Gi
Status:
Phase: Pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Waiting 5m22s (x2 over 5m25s) Redis not every pod is ready
可以发现,是至少一个Pod没有Ready,这时候我们检查Pod:
bash
$ kubectl get pod | grep redis
redis-sample-redis-0 0/1 ImagePullBackOff 0 7m15s
可以发现Pod的镜像拉取出了点问题,应该是没有redis:7.0
的镜像,具体可以kubectl describe pod redis-sample-redis-0
查看。为了解决这个问题,我们将镜像拉取到kind的节点上:
bash
$ kind load docker-image redis:7.0 redis:7.0 --name multi
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-worker", loading...
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-worker2", loading...
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-worker3", loading...
Image: "redis:7.0" with ID "sha256:2c96e8092504e99e7601bcb6f4954b4cfed77d5125c88061a49c252fb33cf6f7" not yet present on node "multi-control-plane", loading...
这个时候可以检查所有的资源:
bash
$ kubectl get redis
NAME PHASE ENDPOINT IMAGE AGE
redis-sample Ready 10.244.1.2:6379 redis:7.0 11m
$ kubectl get sts | grep redis
redis-sample-redis 1/1 11m
$ kubectl get svc | grep redis
redis-sample-svc NodePort 10.96.251.201 <none> 6379:31000/TCP 11m
$ kubectl get pod | grep redis
redis-sample-redis-0 1/1 Running 0 11m
$ kubectl get pv | grep redis
redis-sample-pv 1Gi RWO Recycle Bound default/redis-data-redis-sample-redis-0 redis-storage 12m
$ kubectl get pvc | grep redis
redis-data-redis-sample-redis-0 Bound redis-sample-pv 1Gi RWO redis-storage 12m
验证
在本机上连接验证:
bash
$ redis-cli -h 127.0.0.1 -p 6379
127.0.0.1:6379> get key1
(nil)
127.0.0.1:6379> set key1 hello
OK
127.0.0.1:6379> get key1
"hello"
验证挂载是否起作用,我们先使用kubectl delete redis redis-sample
删除redis,这时候发现redis无法连接。
然后我们再启动redis,再次获取,可以发现数据依然可以拿到:
bash
127.0.0.1:6379> get key1
"hello"
在本地的挂载目录下可以发现dump.rdb
文件,说明挂载成功。
2.4 准入控制
准入控制会在请求通过认证和授权之后、对象被持久化之前拦截到达API服务器的请求,其通过WebHook的方式实现功能,存在两种控制器类型:
- 变更准入控制器(Mutating):作用于修改请求对象,比如自动自动补充,安全配置注入,敏感数据加密等;
- 验证准入控制器(Validating):用于校验,比如安全策略,业务规则或者资源配额等;
二者具体的作用可以看下图:
2.4.1 创建webhook
bash
$ kubebuilder create webhook --group cache --version v1 --kind Redis --defaulting --programmatic-validation
使用上面的指令创建webhook,然后会发现工程中新增了webhook
和certmanager
相关的文件。
2.4.2 cert-manager
因为webhook需要被API Server通过HTTPS安全调用,所以需要启用cert-manager相关的配置,打开config/default/kustomization.yaml
,将Uncomment
字段下的内容取消注释,以及webhook相关的字段取消注释:
yaml
# Adds namespace to all resources.
namespace: redis-operator-system
# Value of this field is prepended to the
# names of all resources, e.g. a deployment named
# "wordpress" becomes "alices-wordpress".
# Note that it should also match with the prefix (text before '-') of the namespace
# field above.
namePrefix: redis-operator-
# Labels to add to all resources and selectors.
#labels:
#- includeSelectors: true
# pairs:
# someName: someValue
resources:
- ../crd
- ../rbac
- ../manager
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- ../webhook
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
# If you want your controller-manager to expose the /metrics
# endpoint w/o any authn/z, please comment the following line.
- manager_auth_proxy_patch.yaml
# [WEBHOOK] To enable webhook, uncomment all the sections with [WEBHOOK] prefix including the one in
# crd/kustomization.yaml
- manager_webhook_patch.yaml
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'.
# Uncomment 'CERTMANAGER' sections in crd/kustomization.yaml to enable the CA injection in the admission webhooks.
# 'CERTMANAGER' needs to be enabled to use ca injection
- webhookcainjection_patch.yaml
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
# Uncomment the following replacements to add the cert-manager CA injection annotations
replacements:
- source: # Add cert-manager annotation to ValidatingWebhookConfiguration, MutatingWebhookConfiguration and CRDs
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # this name should match the one in certificate.yaml
fieldPath: .metadata.namespace # namespace of the certificate CR
targets:
- select:
kind: ValidatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 0
create: true
- select:
kind: MutatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 0
create: true
- select:
kind: CustomResourceDefinition
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 0
create: true
- source:
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # this name should match the one in certificate.yaml
fieldPath: .metadata.name
targets:
- select:
kind: ValidatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 1
create: true
- select:
kind: MutatingWebhookConfiguration
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 1
create: true
- select:
kind: CustomResourceDefinition
fieldPaths:
- .metadata.annotations.[cert-manager.io/inject-ca-from]
options:
delimiter: '/'
index: 1
create: true
- source: # Add cert-manager annotation to the webhook Service
kind: Service
version: v1
name: webhook-service
fieldPath: .metadata.name # namespace of the service
targets:
- select:
kind: Certificate
group: cert-manager.io
version: v1
fieldPaths:
- .spec.dnsNames.0
- .spec.dnsNames.1
options:
delimiter: '.'
index: 0
create: true
- source:
kind: Service
version: v1
name: webhook-service
fieldPath: .metadata.namespace # namespace of the service
targets:
- select:
kind: Certificate
group: cert-manager.io
version: v1
fieldPaths:
- .spec.dnsNames.0
- .spec.dnsNames.1
options:
delimiter: '.'
index: 1
create: true
同样的,因为其运行需要证书校验,kubebuilder 官方建议我们使用 cert-manager 简化对证书的管理,所以我们先部署一下 cert-manager 的服务并验证:
bash
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.yaml
$ kubectl get pods -ncert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-5d6bc46969-n9znn 1/1 Running 0 37s
cert-manager-cainjector-7d8b8bb6b8-52vq7 1/1 Running 0 37s
cert-manager-webhook-5c5c5bb457-nd5g5 1/1 Running 0 37s
2.4.3 实现Mutating接口
在api/v1/redis_webhook.go
中,我们对Default
函数进行如下修改:
go
// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *Redis) Default() {
redislog.Info("default", "name", r.Name)
// 设置默认镜像
if r.Spec.Image == "" {
r.Spec.Image = "redis:7.0"
redislog.Info("Setting default image", "image", r.Spec.Image)
}
// 设置默认端口
if r.Spec.NodePort == 0 {
r.Spec.NodePort = 31000
redislog.Info("Setting default NodePort", "nodePort", r.Spec.NodePort)
}// 设置默认存储路径
if r.Spec.Storage.HostPath == "" {
r.Spec.Storage.HostPath = "/data"
redislog.Info("Setting default host path", "hostPath", r.Spec.Storage.HostPath)
}
// 设置默认存储大小
if r.Spec.Storage.Size.IsZero() {
size := resource.MustParse("1Gi")
r.Spec.Storage.Size = size
redislog.Info("Setting default storage size", "size", size.String())
}
}
2.4.4 实现Validating接口
同样的,在api/v1/redis_webhook.go
中,我们对Validate
相关接口进行修改:
go
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *Redis) ValidateCreate() error {
redislog.Info("validate create", "name", r.Name)
return r.validateRedis(r)
}
// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *Redis) ValidateUpdate(old runtime.Object) error {
redislog.Info("validate update", "name", r.Name)
oldRedis, ok := old.(*Redis)
if !ok {
return fmt.Errorf("expected a Redis object but got %T", old)
}
if err := r.validateRedis(r); err != nil {
return err
}
// 验证禁止修改的字段
if oldRedis.Spec.Image != r.Spec.Image {
return field.Forbidden(
field.NewPath("spec", "image"),
"image cannot be changed after creation",
)
}
if oldRedis.Spec.Storage.HostPath != r.Spec.Storage.HostPath {
return field.Forbidden(
field.NewPath("spec", "storage", "hostPath"),
"hostPath cannot be changed after creation",
)
}
return nil
}
// ValidateDelete implements webhook.Validator so a webhook will be registered for the type
func (r *Redis) ValidateDelete() error {
redislog.Info("validate delete", "name", r.Name)
return nil
}
// validateRedis 执行具体验证逻辑
func (r *Redis) validateRedis(redis *Redis) error {
allErrs := field.ErrorList{}
// 验证存储大小范围
minSize := resource.MustParse(MinStorageSize)
maxSize := resource.MustParse(MaxStorageSize)
if redis.Spec.Storage.Size.Cmp(minSize) < 0 {
allErrs = append(allErrs, field.Invalid(
field.NewPath("spec", "storage", "size"),
redis.Spec.Storage.Size.String(),
fmt.Sprintf("storage size must be at least %s", MinStorageSize),
))
}
if redis.Spec.Storage.Size.Cmp(maxSize) > 0 {
allErrs = append(allErrs, field.Invalid(
field.NewPath("spec", "storage", "size"),
redis.Spec.Storage.Size.String(),
fmt.Sprintf("storage size must be no more than %s", MaxStorageSize),
))
}
// 验证端口范围
if redis.Spec.NodePort < 30000 || redis.Spec.NodePort > 32767 {
allErrs = append(allErrs, field.Invalid(
field.NewPath("spec", "nodePort"),
redis.Spec.NodePort,
"nodePort must be between 30000 and 32767",
))
}
// 验证主机路径安全
if !isValidHostPath(redis.Spec.Storage.HostPath) {
allErrs = append(allErrs, field.Invalid(
field.NewPath("spec", "storage", "hostPath"),
redis.Spec.Storage.HostPath,
"invalid host path, only /data directory is allowed",
))
}
if len(allErrs) == 0 {
return nil
}
return allErrs.ToAggregate()
}
// 验证主机路径安全的辅助函数
func isValidHostPath(path string) bool {
// 这里简化处理,实际应根据安全策略制定
allowedPaths := []string{"/data", "/data/", "/mnt/data"}
for _, p := range allowedPaths {
if path == p {
return true
}
}
return false
}
2.4.5 验证
在验证之前,我们有三个地方需要修改:
一个是在Makefile
文件中,IMG
默认定义为controller:latest
,这里我们将其修改为redis-controller:latest
。
另一个是在config/manager/manager.yaml
中,需要修改两个部分:
image: controller:latest
修改为image: redis-controller:latest
- 在
image
下面新增imagePullPolicy: IfNotPresent
这是因为kind部署集群的缘故,所以镜像只能先kind load
上去。
还有一点是,因为版本的问题,在执行make docker-build
时,会通过go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
安装,会发现版本不匹配,为此我们需要修改Makefile
:
makefile
.PHONY: envtest
envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
$(ENVTEST): $(LOCALBIN)
test -s $(LOCALBIN)/setup-envtest || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-runtime/tools/[email protected]
然后执行make docker-build
时即可打包镜像,可以查看如下:
bash
$ docker images | grep redis-controller
redis-controller latest 2e0405615510 About a minute ago 50.1MB
再执行kind load docker-image redis-controller:latest redis-controller:latest --name multi
上传镜像。
最后执行make deploy
在集群上部署Operator,可以发现Operator已经运行:
bash
$ kubectl get pod -nredis-operator-system
NAME READY STATUS RESTARTS AGE
redis-operator-controller-manager-6d4c5546bd-jg44p 2/2 Running 0 19s
这时候我们执行kubectl apply -f config/samples/cache_v1_redis.yaml
指令,可以起一个Redis的CRD,但是kubectl logs -f
查看operator的代码时会发现,没有权限对Pod进行List,这是因为我们在最开始进行权限设置时忘记设置了,需要在redis_controller.go
中补齐以下5-6两行:
go
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.iguochan.io,resources=redis/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch
//+kubebuilder:rbac:groups=core,resources=endpoints,verbs=get;list;watch
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete
然后执行make undeploy
撤销一切,再执行make install
,make deploy
之后即可实现对Redis的管控。
为了验证Default
函数,我们kubectl delete redis redis-sample
之后,将config/samples/cache_v1_redis.yaml
修改成如下:
yaml
apiVersion: cache.iguochan.io/v1
kind: Redis
metadata:
labels:
app.kubernetes.io/name: redis
app.kubernetes.io/instance: redis-sample
app.kubernetes.io/part-of: redis-operator
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/created-by: redis-operator
name: redis-sample
spec:
# image: redis:7.0
# nodePort: 31000
# storage:
# size: 1Gi
# hostPath: /data
然后再执行kubectl apply -f config/samples/cache_v1_redis.yaml
之后即可发现,redis创建成功,且都是根据默认值创建。
bash
$ k describe redis redis-sample
Name: redis-sample
Namespace: default
...
Spec:
Image: redis:7.0
Node Port: 31000
Storage:
Host Path: /data
Size: 1Gi
Status:
Endpoint: 10.244.2.9:6379
Phase: Ready
Events: <none>
我们再验证一下其他的,比如我们将存储的size设置为100Gi
,则会报错admission webhook "vredis.kb.io" denied the request: spec.storage.size: Invalid value: "100Gi": storage size must be no more than 10Gi
。
比如我们将镜像修改为redis:6.0
,则会报错admission webhook "vredis.kb.io" denied the request: spec.image: Forbidden: image cannot be changed after creation
。
到这里,我们基本完成了一个Operator的搭建工作。