一 、了解Broker
Broker主要负责消息的存储、投递和查询以及服务高可用保证。
NameServer几乎无状态节点,因此可集群部署,节点之间无任何信息同步。Broker部署相对复杂。
在 Master-Slave 架构中,Broker 分为 Master 与 Slave。一个Master可以对应多个Slave,但是一个Slave只能对应一个Master。Master 与 Slave 的对应关系通过指定相同的BrokerName,不同的BrokerId 来定义,BrokerId为0表示Master,非0表示Slave。Master也可以部署多个。
二 、Broker的具体实现
我是手抄了一下官网的项目,和官网上的可能有一些不同,我说一下重点
2.1 broke_types
go
// BrokeSpec defines the desired state of Broke
type BrokeSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Size of broker clusters
Size int `json:"size"`
// NameServers defines the name service list e.g. 192.168.1.1:9876;192.168.1.2:9876
NameServers string `json:"nameServers,omitempty"`
// ClusterMode defines the way to be a broker cluster, valid values can be one of the following:
// - STATIC: default clusters with static broker roles
// - CONTROLLER: clusters with DLedger Controller since RocketMQ 5.0
// - CONTAINER: [NOT implemented yet] enabling broker containers since RocketMQ 5.0
ClusterMode string `json:"clusterMode,omitempty"`
// ReplicaPerGroup each broker cluster's replica number
ReplicaPerGroup int `json:"replicaPerGroup"`
// BaseImage is the broker image to use for the Pods
BrokerImage string `json:"brokerImage"`
// ImagePullPolicy defines how the image is pulled
ImagePullPolicy corev1.PullPolicy `json:"imagePullPolicy"`
// HostNetwork can be true or false
HostNetwork bool `json:"hostNetwork,omitempty"`
// AllowRestart defines whether allow pod restart
AllowRestart bool `json:"allowRestart"`
// Resources describes the compute resource requirements
Resources corev1.ResourceRequirements `json:"resources"`
// StorageMode can be EmptyDir, HostPath, StorageClass
StorageMode string `json:"storageMode"`
// HostPath is the local path to store data
HostPath string `json:"hostPath,omitempty"`
// Env defines custom env, e.g. BROKER_MEM
Env []corev1.EnvVar `json:"env"`
// Volumes define the broker.conf
Volumes []corev1.Volume `json:"volumes"`
// VolumeClaimTemplates defines the StorageClass
VolumeClaimTemplates []corev1.PersistentVolumeClaim `json:"volumeClaimTemplates"`
// The name of pod where the metadata from
ScalePodName string `json:"scalePodName"`
// Pod Security Context
PodSecurityContext *corev1.PodSecurityContext `json:"securityContext,omitempty"`
// Container Security Context
ContainerSecurityContext *corev1.SecurityContext `json:"containerSecurityContext,omitempty"`
// The secrets used to pull image from private registry
ImagePullSecrets []corev1.LocalObjectReference `json:"imagePullSecrets,omitempty"`
// Affinity the pod's scheduling constraints
Affinity *corev1.Affinity `json:"affinity,omitempty"`
// Tolerations the pod's tolerations.
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
// NodeSelector is a selector which must be true for the pod to fit on a node
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
// PriorityClassName indicates the pod's priority
PriorityClassName string `json:"priorityClassName,omitempty"`
// ServiceAccountName
ServiceAccountName string `json:"serviceAccountName,omitempty"`
}
// BrokeStatus defines the observed state of Broke
type BrokeStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
Nodes []string `json:"nodes,omitempty"`
Size int `json:"size"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// Broke is the Schema for the brokes API
type Broke struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec BrokeSpec `json:"spec,omitempty"`
Status BrokeStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// BrokeList contains a list of Broke
type BrokeList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []Broke `json:"items"`
}
func init() {
SchemeBuilder.Register(&Broke{}, &BrokeList{})
}
和NameService差不多,不再复述
2.2broke_controller
2.2.1Reconcile方法
核心,无需多言,Reconcile方法有点长我将拆分成多个代码块说明
- 获取broker的rc,获取不到就报错
go
// Fetch the Broker instance
//获取broker资源
broker := &appsv1s.Broke{}
err := r.Client.Get(context.TODO(), req.NamespacedName, broker)
if err != nil {
if errors.IsNotFound(err) {
// Request object not found, could have been deleted after reconcile request.
// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
// Return and don't requeue
reqLogger.Info("Broker resource not found. Ignoring since object must be deleted")
return reconcile.Result{}, err
}
// Error reading the object - requeue the request.
reqLogger.Error(err, "Failed to get Broker")
return reconcile.Result{}, err
}
- 设置broker集群的数量,这个数量可以理解为几个Master-Slave
ini
//设置broker集群的数量
if broker.Status.Size == 0 {
//期望值
share.GroupNum = broker.Spec.Size
} else {
//实际值
share.GroupNum = broker.Status.Size
}
- 设置broker的NameServers地址
c
//等NameServer初始化完成,才会启动Broker
if broker.Spec.NameServers == "" {
// wait for name server ready when create broker cluster if nameServers is omitted
for {
if share.IsNameServersStrInitialized {
break
} else {
log.Info("Broker Waiting for name server ready...")
time.Sleep(time.Duration(constants.WaitForNameServerReadyInSecond) * time.Second)
}
}
} else {
//获取NameServer地址
share.NameServersStr = broker.Spec.NameServers
}
这里算是一个小的重点,如果你部署过MQ的话,在我们启动完nameservice之后,部署broker的时候需要执行如下命令$ nohup sh bin/mqbroker -n localhost:9876 --enable-proxy &
来部署Broker,我们发现部署Broker需要输入nameservice的ip地址和端口号,启动的每一台都需要输入,非常麻烦,又没有更好的解决方案呢?
聪明的你可能想到,我写一个脚本,将nameservice地址作为参数传入,去启动多个broker,事实上,我们在mq中确实是这样启动的nohup sh bin/mqbroker -n 192.168.1.1:9876 -c $ROCKETMQ_HOME/conf/2m-noslave/broker-a.properties --enable-proxy &
。
但是我们已经用了operator了啊,我们所有的nameservice和broker都交给了rocketmq-operator管理了,那么可不可以将nameservice地址都存到operator中,这样我启动broker时去operator取出地址就行了,这样好处就是完全实现了自动化,无需手动输入nameservice地址,再结合上一篇提到的nameservice地址变化会调用命令通知broker变化,完美的解决broekr获取nameservice地址的问题。
- 创建broker的所有(多集群多副本)StatefulSet资源
go
//保存一下broker的集群名字,用于nameService像broker发送命令使用
share.BrokerClusterName = broker.Name
replicaPerGroup := broker.Spec.ReplicaPerGroup
reqLogger.Info("brokerGroupNum=" + strconv.Itoa(share.GroupNum) + ", replicaPerGroup=" + strconv.Itoa(replicaPerGroup))
//两层for循环 第一层时集群(主),第二层 时副本(从)
//循环创建broker集群
for brokerGroupIndex := 0; brokerGroupIndex < share.GroupNum; brokerGroupIndex++ {
reqLogger.Info("Check Broker cluster" + strconv.Itoa(brokerGroupIndex+1) + "/" + strconv.Itoa(share.GroupNum))
//这里就和nameservice一样了
dep := r.getBrokerStatefulSet(broker, brokerGroupIndex, 0)
// Check if the statefulSet already exists, if not create a new one
// 检查 statefulSet 是否已经存在,如果不存在,请创建一个新的
found := &appsv1.StatefulSet{}
err = r.Client.Get(context.TODO(), types.NamespacedName{Name: dep.Name, Namespace: dep.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
reqLogger.Info("Creating a new Master Broker StatefulSet.", "StatefulSet.Namespace", dep.Namespace, "StatefulSet.Name", dep.Name)
err = r.Client.Create(context.TODO(), dep)
if err != nil {
reqLogger.Error(err, "Failed to create new Broker StatefulSet.", "StatefulSet.Namespace", dep.Namespace, "StatefulSet.Name", dep.Name)
}
} else if err != nil {
reqLogger.Error(err, "Failed to get broker master StatefulSet.")
}
for replicaIndex := 1; replicaIndex <= replicaPerGroup; replicaIndex++ {
//从节点
reqLogger.Info("Check Replica Broker of cluster-" + strconv.Itoa(brokerGroupIndex) + " " + strconv.Itoa(replicaIndex) + "/" + strconv.Itoa(replicaPerGroup))
replicaDep := r.getBrokerStatefulSet(broker, brokerGroupIndex, replicaIndex)
err = r.Client.Get(context.TODO(), types.NamespacedName{Name: replicaDep.Name, Namespace: replicaDep.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
reqLogger.Info("Creating a new Replica Broker StatefulSet.", "StatefulSet.Namespace", replicaDep.Namespace, "StatefulSet.Name", replicaDep.Name)
err = r.Client.Create(context.TODO(), replicaDep)
if err != nil {
reqLogger.Error(err, "Failed to create new StatefulSet of broker-"+strconv.Itoa(brokerGroupIndex)+"-replica-"+strconv.Itoa(replicaIndex), "StatefulSet.Namespace", replicaDep.Namespace, "StatefulSet.Name", replicaDep.Name)
}
} else if err != nil {
reqLogger.Error(err, "Failed to get broker replica StatefulSet.")
}
}
}
重点看一下两个for循环,第一个for创建主节点,第二个for创建从节点,核心在getBrokerStatefulSet
方法,主从唯一的区别就是传入replicaIndex的值,主传0,从就是1~n(有几个从就是几),体现出来的就是statefulset名字不同,如一主一从且brokerName为broke-sample,那么对应Broker的statefulset的Name为Brokerbroke-sample-0-masterbroke-sample-0-replica-1。
- 如果NameServers地址是否有变化,需要broker更新nameservice地址并重启pod。
less
// Check for name server scaling
//broker是否开启允许重新设置nameService地址
if broker.Spec.AllowRestart {
// The following code will restart all brokers to update NAMESRV_ADDR env
// NameServers地址是否有变化?
if share.IsNameServersStrUpdated {
for brokerGroupIndex := 0; brokerGroupIndex < broker.Spec.Size; brokerGroupIndex++ {
brokerName := getBrokerName(broker, brokerGroupIndex)
// Update master broker
reqLogger.Info("Update Master Broker NAMESRV_ADDR of " + brokerName)
dep := r.getBrokerStatefulSet(broker, brokerGroupIndex, 0)
found := &appsv1.StatefulSet{}
err = r.Client.Get(context.TODO(), types.NamespacedName{
Name: dep.Name,
Namespace: dep.Namespace,
}, found)
if err != nil {
reqLogger.Error(err, "Failed to get broker master StatefulSet of "+brokerName)
} else {
//给每个broker的env里第一个变量赋值nameservice地址
found.Spec.Template.Spec.Containers[0].Env[0].Value = share.NameServersStr
err = r.Client.Update(context.TODO(), found)
if err != nil {
reqLogger.Error(err, "Failed to update NAMESRV_ADDR of master broker "+brokerName, "StatefulSet.Namespace", found.Namespace, "StatefulSet.Name", found.Name)
} else {
reqLogger.Info("Successfully updated NAMESRV_ADDR of master broker "+brokerName, "StatefulSet.Namespace", found.Namespace, "StatefulSet.Name", found.Name)
}
time.Sleep(time.Duration(constants.RestartBrokerPodIntervalInSecond) * time.Second)
}
// Update replicas brokers
// 从节点同理
for replicaIndex := 1; replicaIndex <= replicaPerGroup; replicaIndex++ {
reqLogger.Info("Update Replica Broker NAMESRV_ADDR of " + brokerName + " " + strconv.Itoa(replicaIndex) + "/" + strconv.Itoa(replicaPerGroup))
replicaDep := r.getBrokerStatefulSet(broker, brokerGroupIndex, replicaIndex)
replicaFound := &appsv1.StatefulSet{}
err = r.Client.Get(context.TODO(), types.NamespacedName{Name: replicaDep.Name, Namespace: replicaDep.Namespace}, replicaFound)
if err != nil {
reqLogger.Error(err, "Failed to get broker replica StatefulSet of "+brokerName)
} else {
//从节点可能有多个环境变量,找到名字为"NAMESRV_ADDR"
for index := range replicaFound.Spec.Template.Spec.Containers[0].Env {
if constants.EnvNameServiceAddress == replicaFound.Spec.Template.Spec.Containers[0].Env[index].Name {
//赋值
replicaFound.Spec.Template.Spec.Containers[0].Env[index].Value = share.NameServersStr
break
}
}
err = r.Client.Update(context.TODO(), replicaFound)
if err != nil {
reqLogger.Error(err, "Failed to update NAMESRV_ADDR of "+strconv.Itoa(brokerGroupIndex)+"-replica-"+strconv.Itoa(replicaIndex), "StatefulSet.Namespace", replicaFound.Namespace, "StatefulSet.Name", replicaFound.Name)
} else {
reqLogger.Info("Successfully updated NAMESRV_ADDR of "+strconv.Itoa(brokerGroupIndex)+"-replica-"+strconv.Itoa(replicaIndex), "StatefulSet.Namespace", replicaFound.Namespace, "StatefulSet.Name", replicaFound.Name)
}
time.Sleep(time.Duration(constants.RestartBrokerPodIntervalInSecond) * time.Second)
}
}
}
}
share.IsNameServersStrUpdated = false
}
其实这里大概率不需要我们更新nameservice地址,因为NameService会通过admin的方式去更改。
- 检测所有pod是否都是就绪态,没有就先return。
go
// List the pods for this broker's statefulSet
podList := &corev1.PodList{}
labelSelector := labels.SelectorFromSet(labelsForBroker(broker.Name))
listOps := &client.ListOptions{
Namespace: broker.Namespace,
LabelSelector: labelSelector,
}
err = r.Client.List(context.TODO(), podList, listOps)
if err != nil {
reqLogger.Error(err, "Failed to list pods.", "Broker.Namespace", broker.Namespace, "Broker.Name", broker.Name)
return reconcile.Result{}, err
}
// 拿到所有pod的名字作保存
podNames := getPodNames(podList.Items)
log.Info("broker.Status.Nodes length = " + strconv.Itoa(len(broker.Status.Nodes)))
log.Info("podNames length = " + strconv.Itoa(len(podNames)))
// Ensure every pod is in ready
// 确保每个pod都是就绪态
for _, pod := range podList.Items {
//已经启动
if !reflect.DeepEqual(pod.Status.Phase, corev1.PodRunning) {
log.Info("pod " + pod.Name + " phase is " + string(pod.Status.Phase) + ", wait for a moment...")
}
//已经通过检查
if !isReady(pod) {
reqLogger.Info("pod " + pod.Name + " is not ready, wait for a moment...")
return reconcile.Result{Requeue: true, RequeueAfter: time.Duration(constants.RequeueIntervalInSecond) * time.Second}, nil
}
}
- 如果发现扩容,会复制sourcePodName名字pod的subscriptionGroup.json和topics.json文件内容,然后生成写入这些内容的命令(ps:理解为拷贝文件复制到新的pod里)
go
if broker.Status.Size != 0 && broker.Spec.Size > broker.Status.Size {
// Get the metadata including subscriptionGroup.json and topics.json from scale source pod
k8s, err := tool.NewK8sClient()
if err != nil {
log.Error(err, "Error occurred while getting K8s Client")
}
// 这个pod是你自己手动指定的
sourcePodName := broker.Spec.ScalePodName
topicsCommand := getCopyMetadataJsonCommand(cons.TopicJsonDir, sourcePodName, broker.Namespace, k8s)
log.Info("topicsCommand: " + topicsCommand)
subscriptionGroupCommand := getCopyMetadataJsonCommand(cons.SubscriptionGroupJsonDir, sourcePodName, broker.Namespace, k8s)
log.Info("subscriptionGroupCommand: " + subscriptionGroupCommand)
MakeConfigDirCommand := "mkdir -p " + cons.StoreConfigDir
ChmodDirCommand := "chmod a+rw " + cons.StoreConfigDir
cmdContent := MakeConfigDirCommand + " && " + ChmodDirCommand
if topicsCommand != "" {
cmdContent = cmdContent + " && " + topicsCommand
}
if subscriptionGroupCommand != "" {
cmdContent = cmdContent + " && " + subscriptionGroupCommand
}
cmd = []string{"/bin/bash", "-c", cmdContent}
}
其实这里这里只是生成了命令,当再次执行4中的getBrokerStatefulSet方法时,会有一个钩子函数,在pod创建成功之后立即执行这个命令。
- 更新state的size和node
scss
// Update status.Size if needed
if broker.Spec.Size != broker.Status.Size {
log.Info("broker.Status.Size = " + strconv.Itoa(broker.Status.Size))
log.Info("broker.Spec.Size = " + strconv.Itoa(broker.Spec.Size))
broker.Status.Size = broker.Spec.Size
err = r.client.Status().Update(context.TODO(), broker)
if err != nil {
reqLogger.Error(err, "Failed to update Broker Size status.")
}
}
// Update status.Nodes if needed
if !reflect.DeepEqual(podNames, broker.Status.Nodes) {
broker.Status.Nodes = podNames
err = r.client.Status().Update(context.TODO(), broker)
if err != nil {
reqLogger.Error(err, "Failed to update Broker Nodes status.")
}
}
这里更新完之后,当下一次执行4的时候如果是扩容,那么它在执行getBrokerStatefulSet时就是7中新的cmd命令了。至此流程结束。
画一个流程图方便理解:
2.2.2getBrokerStatefulSet方法
主要有两部分我分别介绍以下
- 设置statefulset的名字,主从不一样
ini
//主要看一下静态代理
if broker.Spec.ClusterMode == "STATIC" {
if replicaIndex == 0 { //主节点默认是0
//拼接statefuleSetName名字
statefuleSetName = broker.Name + "-" + strconv.Itoa(brokerGroupIndex) + "-master"
} else {
statefuleSetName = broker.Name + "-" + strconv.Itoa(brokerGroupIndex) + "-replica-" + strconv.Itoa(replicaIndex)
}
} else if broker.Spec.ClusterMode == "CONTROLLER" {
statefuleSetName = broker.Name + "-" + strconv.Itoa(brokerGroupIndex) + "-" + strconv.Itoa(replicaIndex)
}
- 创建statefulset
yaml
dep := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: statefuleSetName,
Namespace: broker.Namespace,
},
Spec: appsv1.StatefulSetSpec{
Replicas: c, //这里默认是1,和Nameservice有很大的不同,Nameservice获取的是spec.size
Selector: &metav1.LabelSelector{
MatchLabels: ls,
},
UpdateStrategy: appsv1.StatefulSetUpdateStrategy{
Type: appsv1.RollingUpdateStatefulSetStrategyType,
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: ls,
},
Spec: corev1.PodSpec{
ServiceAccountName: broker.Spec.ServiceAccountName,
HostNetwork: broker.Spec.HostNetwork,
DNSPolicy: corev1.DNSClusterFirstWithHostNet,
Affinity: broker.Spec.Affinity,
Tolerations: broker.Spec.Tolerations,
NodeSelector: broker.Spec.NodeSelector,
PriorityClassName: broker.Spec.PriorityClassName,
Containers: []corev1.Container{{
Resources: broker.Spec.Resources,
Image: broker.Spec.BrokerImage,
Name: constants.BrokerContainerName,
Lifecycle: &corev1.Lifecycle{
PostStart: &corev1.LifecycleHandler{ //钩子函数启动之后立即执行
Exec: &corev1.ExecAction{
Command: cmd,
},
},
},
SecurityContext: broker.GetContainerSecurityContext(),
ImagePullPolicy: broker.Spec.ImagePullPolicy,
Env: getENV(broker, replicaIndex, brokerGroupIndex), //->
Ports: []corev1.ContainerPort{{
ContainerPort: constants.BrokerVipContainerPort,
Name: constants.BrokerVipContainerPortName,
}, {
ContainerPort: constants.BrokerMainContainerPort,
Name: constants.BrokerMainContainerPortName,
},
{
ContainerPort: constants.BrokerHighAvailabilityContainerPort,
Name: constants.BrokerHighAvailabilityContainerPortName,
}},
VolumeMounts: []corev1.VolumeMount{{
MountPath: constants.LogMountPath,
Name: broker.Spec.VolumeClaimTemplates[0].Name,
SubPath: constants.LogSubPathName + getPathSuffix(broker, brokerGroupIndex, replicaIndex),
}, {
MountPath: constants.StoreMountPath,
Name: broker.Spec.VolumeClaimTemplates[0].Name,
SubPath: constants.StoreSubPathName + getPathSuffix(broker, brokerGroupIndex, replicaIndex),
}, {
MountPath: constants.BrokerConfigPath + "/" + constants.BrokerConfigName,
Name: broker.Spec.VolumeClaimTemplates[0].Name,
SubPath: constants.BrokerConfigName,
}},
}},
Volumes: broker.GetVolumes(),
SecurityContext: broker.GetPodSecurityContext(),
},
},
VolumeClaimTemplates: broker.GetVolumeClaimTemplates(),
},
}
一是Replicas: c
, 这里默认是1,和NameService有很大的不同,NameService获取的是spec.size
。也就是说NameService不管几个pod,都只会创建一个statefulset,而Broker是每个pod会创建一个statefulset。
二是下面这段代码,他会在pod成功后执行我们的cmd命令,也就是我们刚刚提到的2.2.1中的第7步,创建subscriptionGroup.json和topics.json。
css
Lifecycle: &corev1.Lifecycle{
PostStart: &corev1.LifecycleHandler{ //钩子函数启动之后立即执行
Exec: &corev1.ExecAction{
Command: cmd,
},
},
},
总结
我感觉主要是做了这两件事:
- 实现了新扩容的Name Server 自动被所有Broker感知
- 新扩容的Broker元数据注册