#作者:朱雷
文章目录
- 一、部署排坑
-
- [1.1. configmap配置文件](#1.1. configmap配置文件)
- [1.2. pv文件](#1.2. pv文件)
- [1.3. sc文件](#1.3. sc文件)
- [1.4. serviceAccount文件](#1.4. serviceAccount文件)
- [1.5. headless-service文件](#1.5. headless-service文件)
- [1.6. sts文件](#1.6. sts文件)
- 二、RabbitMQ集群部署关键问题总结
一、部署排坑
1.1. configmap配置文件
编辑cm.yaml 文件
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: rabbitmq-clu-9
data:
rabbitmq.conf: |
# 基础配置
listeners.tcp.default = 5672
# management.listener.port = 15672
# management.listener.ssl = false
disk_free_limit.absolute = 1GB
cluster_formation.peer_discovery_backend = k8s #指定集群发现通过k8s插件
# cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
# 后缀中rabbitmq-clu-9 为namespace 名称保持和sts 文件中的namespace一致
cluster_formation.k8s.hostname_suffix = .rabbitmq-headless.rabbitmq-clu-9.svc.cluster.local
cluster_formation.discovery_retry_limit = 10
cluster_formation.discovery_retry_interval = 3000
# service_name与headless 中保持一致
cluster_formation.k8s.service_name = rabbitmq-headless
cluster_formation.node_cleanup.interval = 30
cluster_formation.node_cleanup.only_log_warning = false
cluster_formation.etcd.ssl_options.verify = verify_none
# 内存配置
vm_memory_high_watermark.relative = 0.6
vm_memory_high_watermark_paging_ratio = 0.5
# 日志配置
log.console = true
log.console.level = debug
log.file = false
# 临时启用调试日志
log.connection.level = debug
log.channel.level = debug
log.queue.level = debug
1.2. pv文件
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbitmq-cluster-pv-0
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
storageClassName: hostpath-storage
hostPath:
path: /tmp/rabbitmq/0
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.88.201 #修改为绑定的node的hostname
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbitmq-cluster-pv-1
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
storageClassName: hostpath-storage
hostPath:
path: /tmp/rabbitmq/1
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.88.202 #修改为绑定的node的hostname
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: rabbitmq-cluster-pv-2
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
storageClassName: hostpath-storage
hostPath:
path: /tmp/rabbitmq/2
type: DirectoryOrCreate
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.88.203 #修改为绑定的node的hostname
坑1:node选择器绑定时使用的是集群node 的hostname,如node 的IP 和Hostname 不一致,填写IP 会导致pod 为运行pending状态。
1.3. sc文件
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hostpath-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
1.4. serviceAccount文件
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: rabbitmq-clu-9
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rabbitmq-peer-discovery
rules:
- apiGroups: [""]
resources: ["nodes", "pods", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: ["discovery.k8s.io"]
resources: ["endpointslices"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rabbitmq-peer-discovery
subjects:
- kind: ServiceAccount
name: rabbitmq
namespace: rabbitmq-clu-9
roleRef:
kind: ClusterRole
name: rabbitmq-peer-discovery
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rabbitmq-configmap
namespace: rabbitmq-clu-9
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "update"]
resourceNames: ["rabbitmq-config"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rabbitmq-configmap
namespace: rabbitmq-clu-9
subjects:
- kind: ServiceAccount
name: rabbitmq
roleRef:
kind: Role
name: rabbitmq-configmap
apiGroup: rbac.authorization.k8s.io
坑2:在启动pod 的过程中如果集群角色rabbitmq-peer-discovery未授权nodes资源则pod一直报错启动失败
1.5. headless-service文件
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-headless
labels:
app: rabbitmq
spec:
clusterIP: None
ports:
- name: amqp
port: 5672
targetPort: 5672
- name: management
port: 15672
targetPort: 15672
- name: epmd
port: 4369
targetPort: 4369
- name: dist
port: 25672
targetPort: 25672
selector:
app: rabbitmq
publishNotReadyAddresses: true
坑3:上面这几个端口都需要暴漏出来,否则集群创建失败
1.6. sts文件
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
namespace: rabbitmq-clu-9
labels:
app: rabbitmq
spec:
serviceName: rabbitmq-headless
replicas: 3
#podManagementPolicy: "Parallel"
selector:
matchLabels:
app: rabbitmq
template:
metadata:
labels:
app: rabbitmq
spec:
terminationGracePeriodSeconds: 10
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rabbitmq
topologyKey: kubernetes.io/hostname
serviceAccountName: rabbitmq
containers:
- name: rabbitmq
image: rabbitmq:3.8.27-management
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: http
env:
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_ERLANG_COOKIE
value: "secret-cookie"
- name: RABBITMQ_DEFAULT_USER
value: "admin"
- name: RABBITMQ_DEFAULT_PASS
value: "admin123"
volumeMounts:
- name: config
mountPath: /etc/rabbitmq/rabbitmq.conf
subPath: rabbitmq.conf
- name: data
mountPath: /var/lib/rabbitmq
readOnly: false
#readinessProbe:
# exec:
# command: ["rabbitmq-diagnostics", "status"]
# initialDelaySeconds: 20
# periodSeconds: 30
livenessProbe:
exec:
command: ["rabbitmq-diagnostics", "ping"]
initialDelaySeconds: 60
periodSeconds: 30
volumes:
- name: config
configMap:
name: rabbitmq-config
volumeClaimTemplates:
- metadata:
name: data
namespace: rabbitmq-clu-9
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1000M
storageClassName: hostpath-storage
坑4:RABBITMQ_USE_LONGNAME 环境变量需要指定为true,指定使用FQDN格式避免截全节点通信失败,集群建立失败。
坑5:如果podManagementPolicy 策略不为 "Parallel", 则readiness 探针需要关闭,避免集群启动失败。
二、RabbitMQ集群部署关键问题总结
以上总结了RabbitMQ集群部署中的五大核心问题,建议在实施前逐项核查配置,可显著提升部署成功率。
- 节点选择器绑定:必须使用集群节点的Hostname而非IP,否则Pod会陷入Pending状态。
- 角色授权:确保rabbitmq-peer-discovery角色已授权nodes资源,否则Pod启动报错。
- 端口暴露:必须开放4369(EPMD)、25672(Erlang分布式通信)等核心端口,否则集群初始化失败。
- 长名称配置:环境变量RABBITMQ_USE_LONGNAME需设为true,强制使用FQDN格式避免节点通信截断。
- Pod管理策略:若未采用Parallel策略,需关闭readiness探针,防止集群启动阻塞。