postgresql+rabbitmq集群搭建方案

方案介绍

本文档仅为示例，采用内网三台集群组成集群：192.168.1.6、192.168.1.89、192.168.1.83

其中1.6作为主服务器，1.89作为备选服务器，1.83作为兜底服务器

需要安装插件介绍

keepalived

作用说明

Keepalived 的核心作用是解决 HAProxy 自身的单点故障问题，为负载均衡层提供高可用保障。

提供虚拟 IP（VIP）：Keepalived 会在多台 HAProxy 节点之间维护一个漂移的虚拟 IP。客户端只需连接这个 VIP，而非具体某个 HAProxy 的物理 IP

主备自动切换：正常情况下，VIP 只在主 HAProxy 节点上生效。如果主节点宕机或服务异常，Keepalived 会自动将 VIP 漂移到备节点，整个过程对客户端无感知。

健康检查联动：Keepalived 会持续检测 HAProxy 进程的状态（比如通过检查端口、进程或自定义脚本）。一旦发现 HAProxy 服务失效，会主动触发 VIP 切换，避免客户端被路由到已失效的负载均衡器。

通常会是这样的 2 + N 模式：

两台服务器部署 HAProxy + Keepalived 组合，形成主备关系，对外暴露一个虚拟 IP。

Patroni

它是一个Python编写的模板，用于配置和管理PostgreSQL高可用集群，利用分布式共识存储（etcd）进行领导者选举和配置管理，可以理解为它就是一个死循环，一直去分布式配置管理信息存储系统里去取数据，你可以理解为它是一个管家，管理PostgreSQL进程，自动化处理故障转移，每个安装了pgsql的机器都需要安装一个Patroni。

etcd

它的稳定性直接决定了整个高可用体系是否可靠。etcd 是通过 Raft 共识算法，为 Patroni 集群提供了一个永远不会出现"两个主库"的、绝对一致的控制中心

作用：

领导者选举：所有 Patroni 节点都盯着 etcd 中的同一把"领导者锁"，保证有且仅有一个主库。
集群状态共识：集群拓扑、各节点健康状态和复制位点等关键信息，都储存在 etcd 中，所有节点看到的视图完全一致。
动态配置下发：修改 max_connections 等参数，Patroni 会写入 etcd，其他节点 Watch 到变化后自动响应。

HAProxy

HAProxy 在这个集群方案中的核心作用是一个 四层（TCP）负载均衡器，它为客户端提供一个统一的、高可用的入口，并将请求流量分发到后端的多台服务器上

需要注意的是：

HAProxy 本身不是高可用的,如果 HAProxy 挂了，整个入口就没了。所以生产环境通常会部署多个 HAProxy 实例，并在其上层用 Keepalived (VIP) 或云负载均衡器来保证 HAProxy 自身的冗余
它不做连接池：HAProxy 是逐连接的转发，如果应用频繁短连接，后端 PostgreSQL 压力会很大。这通常通过追加PgBouncer解决，而非在 HAProxy 层面。
只做路由，不参与决策：HAProxy 完全信任 Patroni 的健康检查反馈，自身不具备任何数据库角色判断逻辑，这保证了责任清晰。

PgBouncer

PgBouncer 是 PostgreSQL 生态中最轻量、最成熟的开源连接池。在 Patroni + HAProxy 的高可用架构里，它专门解决一个关键问题：PostgreSQL 的进程模型无法承受海量短连接。

PostgreSQL 采用"一个连接一个进程"的模型，每次连接都需 fork 一个新进程，占用约 2-10MB 内存，且频繁建立/销毁连接会导致 CPU 争抢和上下文切换开销巨大。PgBouncer 通过连接复用解决了这个问题

pgsql的定时任务工具（cron）

pgsql定时执行作业的一个插件

端口使用详情

组件	节点/实例	IP地址	端口	用途
etcd (DCS)	etcd1	192.168.1.6	2379, 2380	集群状态存储与选主
	etcd2	192.168.1.89	2379, 2380
	etcd3	192.168.1.83	2379, 2380
PostgreSQL + Patroni	pg-node1	192.168.1.6	5432, 8008	主库
	pg-node2	192.168.1.89	5432, 8008	备库
	pg-node3	192.168.1.83	5432, 8008	备库
PgBouncer (连接池)	pgbouncer-1	192.168.1.6	6432	连接池，减少数据库连接开销
	pgbouncer-2	192.168.1.89	6432
	pgbouncer-3	192.168.1.83	6432
HAProxy (读写分离)	haproxy-1	192.168.1.6	5431(写), 5430(读)	将写请求路由到主库，读请求负载均衡到从库
	haproxy-2	192.168.1.89	5431(写), 5430(读)
Keepalived (VIP)	VIP	192.168.1.100	-	对外提供高可用入口，故障时自动漂移
	keeplived-1	192.168.1.6	-	MASTER（优先级100）
	keeplived-2	192.168.1.89	-	BACKUP（优先级90）
rabbitmq	rabbit-node-1	192.168.1.6	5672（AMQP）,1883（MQTT）	消息队列服务
	rabbit-node-2	192.168.1.89	5672（AMQP）,1883（MQTT）
	rabbit-node-3	192.168.1.83	5672（AMQP）,1883（MQTT）

基础配置

增加host

执行以下语句配置host

shell 复制代码

echo "192.168.1.6    master" >> /etc/hosts
echo "192.168.1.89   backup" >> /etc/hosts
echo "192.168.1.83   follower" >> /etc/hosts

安装postgresql

执行以下命令进行安装

shell 复制代码

sudo apt update
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install postgresql-18

配置运行外网访问

编辑 /etc/postgresql/18/main/pg_hba.conf文件，增加 0.0.0.0/0 运行外部访问

编辑 /etc/postgresql/18/main/postgresql.conf文件，修改监听地址为 *，把timezone 设置为'Asia/Shanghai'

pgsql创建集群所需用户

先执行以下命令进入数据库

shell 复制代码

sudo -u postgres psql

配置管理员账户设置

sql 复制代码

ALTER USER postgres WITH PASSWORD '你的新密码';

创建复制用户

sql 复制代码

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD '用于集群复制账户的密码';

创建执行比对和回滚操作的用户

sql 复制代码

CREATE USER rewind_user WITH ENCRYPTED PASSWORD '密码';

GRANT EXECUTE ON FUNCTION pg_catalog.pg_ls_dir(text, boolean, boolean) TO rewind_user;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_stat_file(text, boolean) TO rewind_user;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_read_binary_file(text) TO rewind_user;
GRANT EXECUTE ON FUNCTION pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO rewind_user;

-- 确认授权
\du rewind_user

安装rabbitmq

注意以下脚本我指定了rabbitmq版本为4.1.3

context 复制代码

#!/bin/sh

sudo apt-get install curl gnupg apt-transport-https -y

## Team RabbitMQ's main signing key
curl -1sLf "https://keys.openpgp.org/vks/v1/by-fingerprint/0A9AF2115F4687BD29803A206B73A36E6026DFCA" | sudo gpg --dearmor | sudo tee /usr/share/keyrings/com.rabbitmq.team.gpg > /dev/null
## Community mirror of Cloudsmith: modern Erlang repository
curl -1sLf https://github.com/rabbitmq/signing-keys/releases/download/3.0/cloudsmith.rabbitmq-erlang.E495BB49CC4BBE5B.key | sudo gpg --dearmor | sudo tee /usr/share/keyrings/rabbitmq.E495BB49CC4BBE5B.gpg > /dev/null
## Community mirror of Cloudsmith: RabbitMQ repository
curl -1sLf https://github.com/rabbitmq/signing-keys/releases/download/3.0/cloudsmith.rabbitmq-server.9F4587F226208342.key | sudo gpg --dearmor | sudo tee /usr/share/keyrings/rabbitmq.9F4587F226208342.gpg > /dev/null

## Add apt repositories maintained by Team RabbitMQ
sudo tee /etc/apt/sources.list.d/rabbitmq.list <<EOF
## Provides modern Erlang/OTP releases
##
deb [arch=amd64 signed-by=/usr/share/keyrings/rabbitmq.E495BB49CC4BBE5B.gpg] https://ppa1.rabbitmq.com/rabbitmq/rabbitmq-erlang/deb/ubuntu noble main
deb-src [signed-by=/usr/share/keyrings/rabbitmq.E495BB49CC4BBE5B.gpg] https://ppa1.rabbitmq.com/rabbitmq/rabbitmq-erlang/deb/ubuntu noble main

# another mirror for redundancy
deb [arch=amd64 signed-by=/usr/share/keyrings/rabbitmq.E495BB49CC4BBE5B.gpg] https://ppa2.rabbitmq.com/rabbitmq/rabbitmq-erlang/deb/ubuntu noble main
deb-src [signed-by=/usr/share/keyrings/rabbitmq.E495BB49CC4BBE5B.gpg] https://ppa2.rabbitmq.com/rabbitmq/rabbitmq-erlang/deb/ubuntu noble main

## Provides RabbitMQ
##
deb [arch=amd64 signed-by=/usr/share/keyrings/rabbitmq.9F4587F226208342.gpg] https://ppa1.rabbitmq.com/rabbitmq/rabbitmq-server/deb/ubuntu noble main
deb-src [signed-by=/usr/share/keyrings/rabbitmq.9F4587F226208342.gpg] https://ppa1.rabbitmq.com/rabbitmq/rabbitmq-server/deb/ubuntu noble main

# another mirror for redundancy
deb [arch=amd64 signed-by=/usr/share/keyrings/rabbitmq.9F4587F226208342.gpg] https://ppa2.rabbitmq.com/rabbitmq/rabbitmq-server/deb/ubuntu noble main
deb-src [signed-by=/usr/share/keyrings/rabbitmq.9F4587F226208342.gpg] https://ppa2.rabbitmq.com/rabbitmq/rabbitmq-server/deb/ubuntu noble main
EOF

## RabbitMQ Start
sudo apt-get install apt-transport-https
sudo apt-get install curl gnupg apt-transport-https -y

## Team RabbitMQ's signing key
curl -1sLf "https://keys.openpgp.org/vks/v1/by-fingerprint/0A9AF2115F4687BD29803A206B73A36E6026DFCA" | sudo gpg --dearmor | sudo tee /usr/share/keyrings/com.rabbitmq.team.gpg > /dev/null
sudo tee /etc/apt/sources.list.d/rabbitmq.list <<EOF
## Modern Erlang/OTP releases
##
deb [arch=amd64 signed-by=/usr/share/keyrings/com.rabbitmq.team.gpg] https://deb1.rabbitmq.com/rabbitmq-erlang/ubuntu/noble noble main
deb [arch=amd64 signed-by=/usr/share/keyrings/com.rabbitmq.team.gpg] https://deb2.rabbitmq.com/rabbitmq-erlang/ubuntu/noble noble main

## Provides modern RabbitMQ releases
##
deb [arch=amd64 signed-by=/usr/share/keyrings/com.rabbitmq.team.gpg] https://deb1.rabbitmq.com/rabbitmq-server/ubuntu/noble noble main
deb [arch=amd64 signed-by=/usr/share/keyrings/com.rabbitmq.team.gpg] https://deb2.rabbitmq.com/rabbitmq-server/ubuntu/noble noble main
EOF

## Update package indices
sudo apt-get update -y

## Install Erlang packages
sudo apt-get install -y erlang-base \
                        erlang-asn1 erlang-crypto erlang-eldap erlang-ftp erlang-inets \
                        erlang-mnesia erlang-os-mon erlang-parsetools erlang-public-key \
                        erlang-runtime-tools erlang-snmp erlang-ssl \
                        erlang-syntax-tools erlang-tftp erlang-tools erlang-xmerl
## Install rabbitmq-server and its dependencies
sudo apt-get install rabbitmq-server=4.1.3-1 -y --fix-missing
## RabbitMQ End

rabbitmq集群相关配置

代码方面

默认情况下创建队列的时候rabbitmq x-queue-type值为classic,但是classic不支持Raft协议。需要在创建队列的时候把类型设置为quorum。这里需要注意如果是从classic改为quorum类型的时候如果之前业务里使用到了队列的优先级参数需要调整，classic支持10级，但是quorunm只有两级一种是普通一种是高优先级（1-4为普通；5-9高）这里说的是给 BasicProperties中Priority赋的值

MQ配置方面

修改erlang的cookie文件

查看主服务器上cookie的值

shell 复制代码

cat /var/lib/rabbitmq/.erlang.cookie

把备服务器和从服务器的cookie的值修改为主服务器上的值

vi /var/lib/rabbitmq/.erlang.cookie
加入集群

shell 复制代码

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster rabbit@master
rabbitmqctl start_app

开启mqtt服务

shell 复制代码

systemctl rabbitmq-plugins enable rabbitmq_mqtt
ststemctl rabbitmq-plugins enable rabbitmq_management

集群运维常用命令

查看队列的Leader所在位置

shell 复制代码

 rabbitmq-queues quorum_status 队列名称

集群重新平衡

shell 复制代码

rabbitmq-queues rebalance quorum

查看你集群状态

shell 复制代码

sudo rabbitmqctl cluster_status

查看队列所属节点信息

shell 复制代码

sudo rabbitmqctl list_queues name type pid | grep quorum

集群问题解决

只有三台机器组成集群时，当某台机器上的rabbitmq挂掉之后，如果它是某个队列的leader，那它挂掉这个队列就无法正常使用了

1.1 执行以下命令查看是那台机器挂掉了

shell 复制代码

rabbitmq-diagnostics cluster_status

1.2 查看队列的leader处于那个节点上

shell 复制代码

 rabbitmq-queues quorum_status 队列名

1.3 执行手动转移 Leader

shell 复制代码

sudo  rabbitmq-queues rebalance quorum --vhost-pattern ".*" --queue-pattern ".*"

执行这个命令耗时会比较长，执行完成后结果如下图：

etcd安装

注意这个插件三台机器上都需要安装，最好不要放到数据库服务器上

下载

在github上下载：Releases · etcd-io/etcd

如果你的Linux是x64版本的就可以直接选择amd64

安装

解压压缩包

shell 复制代码

tar -xvf etcd-v3.6.10-linux-amd64.tar.gz

安装到系统

shell 复制代码

cd etcd-v3.6.10-linux-amd64

shell 复制代码

sudo cp etcd etcdctl /usr/local/bin/

验证是否安装成功

shell 复制代码

etcd --version
etcdctl version

配置

创建配置文件

shell 复制代码

vi /etc/etcd/etcd.conf

1.6机器上配置如下

context 复制代码

name: 'etcd1'
data-dir: /var/lib/etcd
initial-cluster-token: 'patroni-cluster'
initial-cluster: 'etcd1=http://192.168.1.6:2380,etcd2=http://192.168.1.89:2380,etcd3=http://192.168.1.83:2380'
initial-cluster-state: 'new'
listen-peer-urls: http://192.168.1.6:2380
listen-client-urls: http://192.168.1.6:2379,http://localhost:2379
advertise-client-urls: http://192.168.1.6:2379
initial-advertise-peer-urls: http://192.168.1.6:2380

1.89机器上配置如下

shell 复制代码

name: 'etcd2'
data-dir: /var/lib/etcd
initial-cluster-token: 'patroni-cluster'
initial-cluster: 'etcd1=http://192.168.1.6:2380,etcd2=http://192.168.1.89:2380,etcd3=http://192.168.1.83:2380'
initial-cluster-state: 'new'
listen-peer-urls: http://192.168.1.89:2380
listen-client-urls: http://192.168.1.89:2379,http://localhost:2379
advertise-client-urls: http://192.168.1.89:2379
initial-advertise-peer-urls: http://192.168.1.89:2380

1.83机器上配置如下

shell 复制代码

name: 'etcd3'
data-dir: /var/lib/etcd
initial-cluster-token: 'patroni-cluster'
initial-cluster: 'etcd1=http://192.168.1.6:2380,etcd2=http://192.168.1.89:2380,etcd3=http://192.168.1.83:2380'
initial-cluster-state: 'new'
listen-peer-urls: http://192.168.1.83:2380
listen-client-urls: http://192.168.1.83:2379,http://localhost:2379
advertise-client-urls: http://192.168.1.83:2379
initial-advertise-peer-urls: http://192.168.1.83:2380

其他命令

重载 systemd 配置

shell 复制代码

sudo systemctl daemon-reload

设置为开机自启

shell 复制代码

sudo systemctl enable --now etcd

查看服务状态

shell 复制代码

sudo systemctl status etcd

节点子检查

shell 复制代码

etcdctl endpoint health

查看集群详情

shell 复制代码

etcdctl member list

所有节点健康检查

shell 复制代码

etcdctl endpoint health --cluster -w table

PgBouncer安装

在安装了Postgresql的机器上都要安装

安装

执行以下命令进行安装

shell 复制代码

sudo apt update && sudo apt install -y pgbouncer

配置

修改配置文件

shell 复制代码

vi /etc/pgbouncer/pgbouncer.ini

主要需要注意的配置有以下几个，有些默认是被注释掉的需要恢复。

需要注意连接databases配置端口和dbname不要配置错了

context 复制代码

[databases]
# 将所有数据库请求路由到本地的 PostgreSQL 服务
* = host=localhost port=5433 dbname=fsmgw user=postgres

[pgbouncer]
# 监听设置
listen_addr = 0.0.0.0
listen_port = 6432
# Unix socket 设置（可选）
unix_socket_dir = /var/run/postgresql

# 认证设置
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# 连接池设置
pool_mode = transaction      # 推荐使用事务级池化
default_pool_size = 50       # 每个数据库的默认连接池大小
max_client_conn = 500        # PgBouncer 能接受的最大客户端连接数

# 日志设置
logfile = /var/log/postgresql/pgbouncer.log
pidfile = /var/run/postgresql/pgbouncer.pid

# 管理设置
admin_users = postgres
stats_users = stats, postgres

配置用户认证

shell 复制代码

vi /etc/pgbouncer/userlist.txt

加入以下内容

context 复制代码

"postgres" "数据库连接密码"

服务相关命令

启动服务并查看服务是否正常

shell 复制代码

 sudo systemctl restart    pgbouncer && systemctl status pgbouncer

cron安装

安装插件,注意以下命令的18是指pgsql的版本

shell 复制代码

sudo apt-get install postgresql-18-cron

这里单机版本下一步是直接去配置pgsql的配置文件，但是我们这里是集群安装，后续在安装Patroni的配置中进行设置

patroni安装

执行以下命令进行安装,注意以下命令的18是指pgsql的版本

shell 复制代码

sudo apt install postgresql-18 patroni

主服务器配置（1.6）

创建配置文件

shell 复制代码

vi  /etc/patroni/config.yml

把以下内容复制到文件中

context 复制代码

scope: "18-clusteredData"
namespace: "/postgresql-common/"
name: fsmgw-clustered-1-6

etcd3:
  hosts:
    - 192.168.1.6:2379
    - 192.168.1.89:2379
    - 192.168.1.83:2379   # 修正为 83 的 2379，删掉 2382

tags:
  failover_priority: 100

restapi:
  listen: 192.168.1.6:8008
  connect_address: 192.168.1.6:8008

bootstrap:
  method: pg_createcluster
  pg_createcluster:
    command: /usr/share/patroni/pg_createcluster_patroni
  dcs:   # 注意缩进：dcs 是 bootstrap 的子项
    ttl: 30
    loop_wait: 10
    retry_timeout: 5
    maximum_lag_on_failover: 1048576
    check_timeline: true
    postgresql:
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true
      remove_data_directory_on_diverged_timelines: true
      use_slots: true
      pg_hba:
        - local   all   all   peer
        - host    all   all   127.0.0.1/32   scram-sha-256
        - host    all   all   ::1/128        scram-sha-256
        - host    replication   replicator   192.168.1.0/24   scram-sha-256
        - host    replication   replicator   127.0.0.1/32     scram-sha-256
        - host    all   all   192.168.1.0/24   scram-sha-256
      parameters:
        wal_log_hints: 'on'
        hot_standby_feedback: 'on'
        max_replication_slots: 10
        max_wal_senders: 10
        wal_keep_size: 1GB
        password_encryption: scram-sha-256
        shared_preload_libraries: pg_cron
        cron.database_name: fsmgw
        cron.use_background_workers: 'on'
    recovery_conf:   # 正确位置：dcs 的子项
      standby_mode: 'on'
    slots:           # 正确位置：dcs 的子项，与 postgresql、recovery_conf 同级
      fsmgw-clustered-1-6:
        type: physical
      fsmgw-clustered-1-89:
        type: physical
      fsmgw-clustered-1-83:
        type: physical

postgresql:
  create_replica_method:
    - pg_clonecluster
  pg_clonecluster:
    command: /usr/share/patroni/pg_clonecluster_patroni
  listen: 0.0.0.0:5433
  connect_address: 192.168.1.6:5433
  use_unix_socket: true
  data_dir: /data2/postgresql/18/clusteredData
  bin_dir: /usr/lib/postgresql/18/bin
  config_dir: /etc/postgresql/18/clusteredData
  pgpass: /var/lib/postgresql/18-clusteredData.pgpass
  authentication:
    replication:
      username: "replicator"
      password: "copyP@ssw0rd"
    superuser:
      username: "postgres"
      password: "maiyuan!2#"
    rewind:
      username: "rewind_user"
      password: "maiyuan!2#"
  parameters:
    unix_socket_directories: '/var/run/postgresql/'
    logging_collector: 'on'
    log_directory: '/var/log/postgresql'
    log_filename: 'postgresql-18-clusteredData.log'
    shared_buffers: '8GB' # 这里根据机器实际情况设置
    effective_cache_size: '24GB' # 这里根据机器实际情况设置

设置文件权限

shell 复制代码

 sudo chown postgres:postgres /etc/patroni/config.yml

从服务器配置（1.89）

从服务器的配置只有config.yml文件内容的差异，其他步骤按照上面主库的操作步骤执行即可

context 复制代码

scope: "18-clusteredData"
namespace: "/postgresql-common/"
name: fsmgw-clustered-1-89

etcd3:
  hosts:
    - 192.168.1.6:2379
    - 192.168.1.89:2379
    - 192.168.1.83:2379   

tags:
  failover_priority: 90

restapi:
  listen: 192.168.1.89:8008
  connect_address: 192.168.1.89:8008

bootstrap:
  method: pg_createcluster
  pg_createcluster:
    command: /usr/share/patroni/pg_createcluster_patroni
  dcs:   # 注意缩进：dcs 是 bootstrap 的子项
    ttl: 30
    loop_wait: 10
    retry_timeout: 5
    maximum_lag_on_failover: 1048576
    check_timeline: true
    postgresql:
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true
      remove_data_directory_on_diverged_timelines: true
      use_slots: true
      pg_hba:
        - local   all   all   peer
        - host    all   all   127.0.0.1/32   scram-sha-256
        - host    all   all   ::1/128        scram-sha-256
        - host    replication   replicator   192.168.1.0/24   scram-sha-256
        - host    replication   replicator   127.0.0.1/32     scram-sha-256
        - host    all   all   192.168.1.0/24   scram-sha-256
      parameters:
        wal_log_hints: 'on'
        hot_standby_feedback: 'on'
        max_replication_slots: 10
        max_wal_senders: 10
        wal_keep_size: 1GB
        password_encryption: scram-sha-256
        shared_preload_libraries: pg_cron
        cron.database_name: fsmgw
        cron.use_background_workers: 'on'
    recovery_conf:   # 正确位置：dcs 的子项
      standby_mode: 'on'
    slots:           # 正确位置：dcs 的子项，与 postgresql、recovery_conf 同级
      fsmgw-clustered-1-6:
        type: physical
      fsmgw-clustered-1-89:
        type: physical
      fsmgw-clustered-1-83:
        type: physical

postgresql:
  create_replica_method:
    - pg_clonecluster
  pg_clonecluster:
    command: /usr/share/patroni/pg_clonecluster_patroni
  listen: 0.0.0.0:5433
  connect_address: 192.168.1.89:5433
  use_unix_socket: true
  data_dir: /data2/postgresql/18/clusteredData
  bin_dir: /usr/lib/postgresql/18/bin
  config_dir: /etc/postgresql/18/clusteredData
  pgpass: /var/lib/postgresql/18-clusteredData.pgpass
  authentication:
    replication:
      username: "replicator"
      password: "copyP@ssw0rd"
    superuser:
      username: "postgres"
      password: "maiyuan!2#"
    rewind:
      username: "rewind_user"
      password: "maiyuan!2#"
  parameters:
    unix_socket_directories: '/var/run/postgresql/'
    logging_collector: 'on'
    log_directory: '/var/log/postgresql'
    log_filename: 'postgresql-18-clusteredData.log'
    shared_buffers: '2GB' # 这里根据机器实际情况设置
    effective_cache_size: '10GB' # 这里根据机器实际情况设置

备服务器配置（1.83）

备服务器的配置只有config.yml文件内容的差异，其他步骤按照上面主库的操作步骤执行即可

context 复制代码

scope: "18-clusteredData"
namespace: "/postgresql-common/"
name: fsmgw-clustered-1-83

tags:
  failover_priority: 2  # 设置一个正数但不高于主库的值
etcd3:
  hosts:
    - 192.168.1.6:2379
    - 192.168.1.89:2379
    - 192.168.1.83:2379

restapi:
  listen: 192.168.1.83:8008
  connect_address: 192.168.1.83:8008

bootstrap:
  method: initdb
  initdb:
    - encoding: UTF8
    - data-checksums
    #pg_createcluster:
    # command: /usr/share/patroni/pg_createcluster_patroni

  dcs:
    # 与主库保持一致（内容相同，是全局配置）
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    check_timeline: true
    postgresql:
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true
      remove_data_directory_on_diverged_timelines: true
      use_slots: true
      pg_hba:
        - local   all   all   peer
        - host    all   all   127.0.0.1/32   scram-sha-256
        - host    all   all   ::1/128        scram-sha-256
        - host    replication   replicator   192.168.1.0/24   scram-sha-256
        - host    replication   replicator   127.0.0.1/32     scram-sha-256
        - host    all   all   192.168.1.0/24   scram-sha-256
      parameters:
        max_replication_slots: 10
        max_wal_senders: 10
        wal_keep_size: 1GB
        password_encryption: scram-sha-256

postgresql:
  create_replica_method:
    - pg_clonecluster
  pg_clonecluster:
    command: /usr/share/patroni/pg_clonecluster_patroni

  listen: 0.0.0.0:5433
  connect_address: 192.168.1.83:5433
  use_unix_socket: true

  data_dir: /data/postgresql/18/clusteredData   # 备库也应使用独立路径，但建议与主库路径相同（如果磁盘布局一致）
  bin_dir: /usr/lib/postgresql/18/bin
  config_dir: /etc/postgresql/18/clusteredData
  pgpass: /var/lib/postgresql/18-clusteredData.pgpass

  authentication:
    replication:
      username: "replicator"
      password: "copyP@ssw0rd"
    superuser:
      username: "postgres"
      password: "maiyuan!2#"
    rewind:
      username: "rewind_user"
      password: "maiyuan!2#"

  parameters:
    unix_socket_directories: '/var/run/postgresql/'
    logging_collector: 'on'
    log_directory: '/var/log/postgresql'
    log_filename: 'postgresql-18-clusteredData.log'
    shared_buffers: '2GB' # 这里根据机器实际情况设置
    effective_cache_size: '10GB' # 这里根据机器实际情况设置

配置cron

这里我们执行以下命令进行配置

shell 复制代码

export EDITOR=vim
patronictl -c /etc/patroni/config.yml edit-config 18-clusteredData

修改配置文件内容如下

context 复制代码

check_timeline: true
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    cron.database_name: fsmgw #cron作用的表
    cron.use_background_workers: 'on' #开启后台线程执行
    max_replication_slots: 10
    max_wal_senders: 10
    password_encryption: scram-sha-256
    shared_preload_libraries: pg_cron
    wal_keep_size: 1GB
  pg_hba:
  - local   all             all                               peer
  - host    all             all             127.0.0.1/32      scram-sha-256
  - host    all             all             ::1/128           scram-sha-256
  - host    replication     replicator      192.168.1.0/24    scram-sha-256
  - host    all             all             192.168.1.0/24    scram-sha-256
  recovery_conf:
    standby_mode: 'on'
  remove_data_directory_on_diverged_timelines: true
  remove_data_directory_on_rewind_failure: true
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
ttl: 30

在主库机器上执行以下语句创建插件

sql 复制代码

sudo -u postgres psql -h 127.0.0.1 -p 5433 -d fsmgw -c "CREATE EXTENSION pg_cron;"

验证是否配置成功，在集群中的任意一个机器上执行以下sql

sql 复制代码

SELECT datname FROM pg_database 
WHERE datname = (
    SELECT setting FROM pg_settings 
    WHERE name = 'cron.database_name'
);

服务相关命令

启动服务

shell 复制代码

systemctl start patroni

检查服务状态

shell 复制代码

journalctl -u patroni -f

查看集群状态

shell 复制代码

 patronictl -c /etc/patroni/config.yml list

强制指定leader

shell 复制代码

sudo patronictl -c /etc/patroni/config.yml failover --candidate fsmgw-clustered-1-6 --force

补充知识：

集群里的每个集群上patroni都有自己的配置文件在：/etc/patroni/config.yml路径下。

vi /etc/patroni/config.yml 和patronictl -c /etc/patroni/config.yml edit-config看到的内容是不同的，因为前者查看的是 Patroni 进程本身启动时读取的本地静态配置文件 ；后者查看或修改的是存储在 DCS (Distributed Configuration Store，分布式配置存储) 中的集群动态配置 。本地静态配置文件的优先级要比分布式集群中存储的配置文件优先级高 （本地配置>集群配置）

HAproxy安装

为了稳定可以配置多个HAProxy实现高可用，防止一台机器上的HAProxy服务挂了数据库直接访问不了的尴尬场景,这里我们在主库(1.6)和备库(1.89)上都安装一个HAproxy

在主库和备库上都按照以下步骤执行，配置文件不需要改动。

安装

执行以下命令进行安装

shell 复制代码

sudo apt install -y haproxy

配置

shell 复制代码

sudo vi /etc/haproxy/haproxy.cfg

按照以下格式进行配置

context 复制代码

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

defaults
        log global
        mode tcp                # 使用 TCP 模式，这是 PostgreSQL 协议所必需的
        option tcplog
        retries 3
        timeout connect 10s
        timeout client 1m
        timeout server 1m
        # 确保长连接不会因空闲而被断开
        timeout client 1h
        timeout server 1h
        #errorfile 400 /etc/haproxy/errors/400.http
        #errorfile 403 /etc/haproxy/errors/403.http
        #errorfile 408 /etc/haproxy/errors/408.http
        #errorfile 500 /etc/haproxy/errors/500.http
        #errorfile 502 /etc/haproxy/errors/502.http
        #errorfile 503 /etc/haproxy/errors/503.http
        #errorfile 504 /etc/haproxy/errors/504.http


# 用于管理界面的后端配置
frontend stats
        mode http
        bind *:7000
        stats enable
        stats uri /stats
        stats refresh 10s
        stats admin if LOCALHOST

# ---------------------------------------------------------------------
# 1.pgsql 读写端口：只路由到主库
# ---------------------------------------------------------------------
listen postgres_primary
    bind *:5431
    mode tcp
    option httpchk
    http-check send meth GET uri /primary ver HTTP/1.0
    http-check expect status 200
    default-server inter 5s fall 2 rise 3 on-marked-down shutdown-sessions
    server pg-node1 192.168.1.6:6432 check port 8008
    server pg-node2 192.168.1.89:6432 check port 8008

# ---------------------------------------------------------------------
# 2.pgsql 只读端口：路由到所有健康副本 (Replicas)
# ---------------------------------------------------------------------
listen postgres_replica
    bind *:5430
    mode tcp
    balance roundrobin
    option httpchk
    http-check send meth GET uri /replica ver HTTP/1.0
    http-check expect status 200
    default-server inter 5s fall 2 rise 3 on-marked-down shutdown-sessions
    server pg-node1 192.168.1.6:6432 check port 8008
    server pg-node2 192.168.1.89:6432 check port 8008

# ---------------------------------------------------------------------
# RabbitMQ 负载均衡核心配置
# ---------------------------------------------------------------------
# RabbitMQ MQTT 代理配置
listen rabbitmq_mqtt
    bind *:1880                # 1. 监听 MQTT 默认端口
    mode tcp                   # 2. 使用四层代理模式
    balance roundrobin         # 3. 采用轮询负载均衡算法
    # 4. 后端服务器列表，配置你的RabbitMQ节点
    server rabbit-node-1 192.168.1.6:1883 check inter 5000 rise 2 fall 3
    server rabbit-node-2 192.168.1.89:1883 check inter 5000 rise 2 fall 3
    server rabbit-node-3 192.168.1.83:1883 check inter 5000 rise 2 fall 3

服务相关命令

启动服务

shell 复制代码

systemctl start haproxy

验证是否监听相关端口

shell 复制代码

 sudo netstat -tulpn | grep 5431

刷新配置文件

shell 复制代码

sudo systemctl reload haproxy

验证是否正常运行，可前往以下页面查看集群状态

http://192.168.1.100:7000/stats

keepalived安装

它是为了保障Haproxy高可用的，所以在安装了haproxy的机器上都要安装一个keepalived.

这里我们在主库(1.6)和备库(1.89)上都安装一个。

安装

执行以下命令进行安装

shell 复制代码

sudo apt install keepalived

配置

创建健康检查脚本

shell 复制代码

vi /etc/keepalived/check_haproxy.sh

将以下内容复制到检查脚本中

context 复制代码

#!/bin/bash
if pgrep haproxy > /dev/null; then
    exit 0
fi
# 尝试重启
systemctl start haproxy
sleep 2
if pgrep haproxy > /dev/null; then
    exit 0
else
    # 重启失败，停止 keepalived 释放 VIP
    systemctl stop keepalived
    exit 1
fi

授予脚本执行权限

shell 复制代码

chmod +x /etc/keepalived/check_haproxy.sh

主服务器配置（1.6）

创建配置文件

shell 复制代码

vim /etc/keepalived/keepalived.conf

将以下内容复制到配置文件中

注意以下配置文件中'eno1'为网卡名称，需要改为自己机器上的网卡名称

context 复制代码

global_defs {
    router_id RABBIT1
    script_user root
    enable_script_security
}

vrrp_script chk_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eno1
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.1.100/24
    }
    track_script {
        chk_haproxy
    }
}

从服务器配置（1.89）

都需要创建配置文件和健康检查脚本，以及给健康检查脚本进行授权。这里不多赘述，不同的地方只有keepalived.conf文件的内容，因此下面只展示keepalived.conf文件内容

context 复制代码

global_defs {
    router_id RABBIT2
    script_user root
    enable_script_security
}

vrrp_script chk_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens160
    virtual_router_id 51
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass maiyuan!2#
    }
    virtual_ipaddress {
        192.168.1.100/24 dev ens160
    }
    track_script {
        chk_haproxy
    }
}

服务相关命令

启动服务

shell 复制代码

systemctl start keepalived

设置开机自启

shell 复制代码

systemctl enable keepalived

查看服务状态

shell 复制代码

systemctl status keepalived

验证 VIP 是否已绑定

shell 复制代码

ip addr show eno1 | grep 192.168.1.100

当配置完成之后就可以通过我们指定的ip(192.168.1.100)去访问mq和pgsql的服务了