使用postgresql + etcd + patroni + haproxy + keepalived可以实现PG的高可用集群，其中，以postgresql做数据库，Patroni监控本地的PostgreSQL状态，并将本地PostgreSQL信息/状态写入etcd来存储集群状态，所以，patroni与etcd结合可以实现数据库集群故障切换（自动或手动切换），而haproxy可以实现数据库读写分离+读负载均衡（通过不同端口实现），keepalived实现VIP跳转，对haproxy提供了高可用，防止haproxy宕机。

Etcd用于Patroni节点之间共享信息。Patroni监控本地的PostgreSQL状态。如果主库(Primary)故障，Patroni把一个从库(Standby)拉起来，作为新的主(Primary)数据库，如果一个故障PostgreSQL被抢救过来了，能够重新自动或手动加入集群。

Patroni基于Python开发的模板，结合DCS(Distributed Configuration Store，例如 ZooKeeper, etcd, Consul )可以定制PostgreSQL高可用方案。Patroni接管PostgreSQL数据库的启停，同时监控本地的PostgreSQL数据库，并将本地的PostgreSQL数据库信息写入DCS。Patroni的主备端是通过是否能获得 leader key 来控制的，获取到了leader key的Patroni为主节点，其它的为备节点。

其中Patroni不仅简单易用而且功能非常强大。
支持自动failover和按需switchover
支持一个和多个备节点
支持级联复制
支持同步复制，异步复制
支持同步复制下备库故障时自动降级为异步复制（功效类似于MySQL的半同步，但是更加智能）
支持控制指定节点是否参与选主，是否参与负载均衡以及是否可以成为同步备机
支持通过pg_rewind自动修复旧主
支持多种方式初始化集群和重建备机，包括pg_basebackup和支持wal_e，pgBackRest，barman等备份工具的自定义脚本
支持自定义外部callback脚本
支持REST API
支持通过watchdog防止脑裂
支持k8s，docker等容器化环境部署
支持多种常见DCS(Distributed Configuration Store)存储元数据，包括etcd，ZooKeeper，Consul，Kubernetes

这个架构中，PostgreSQL 提供数据服务，Patroni 负责主从切换，etcd 提供一致性存储，HAProxy 提供访问路由，Keepalived 提供网络VIP高可用，Watchdog 提供节点存活及脑裂防护机制。六者协同组成一个企业级高可用数据库集群

一、环境准备

软件版本：

Postgresql： 14.6

patroni： 3.1.1

etcd : 3.3.11

HAProxy ：1.5.18

Keepalived ：1.3.5

系统规划：

|---------|---------------|-------|-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-------------|
| 主机 | IP | 接口 | 组件 | 系统版本 | 备注 |
| pgtest1 | 192.168.24.11 | ens33 | PostgreSQL、Patroni、Etcd，haproxy、keepalived | Centos7.9(3.10.0-1160.88.1.el7.x86_64) | 主节点Master |
| pgtest2 | 192.168.24.12 | ens33 | PostgreSQL、Patroni、Etcd，haproxy、keepalived | Centos7.9(3.10.0-1160.88.1.el7.x86_64) | 备节点1 BACKUP |
| pgtest3 | 192.168.24.13 | ens33 | PostgreSQL、Patroni、Etcd，haproxy、keepalived | Centos7.9(3.10.0-1160.88.1.el7.x86_64) | 备节点2 BACKUP |
| VIP | 192.168.24.15 | 绑定接口 || ens33 ||

关闭防火墙（比较彻底，也可以放行相应端口）：

systemctl stop firewalld

systemctl disable firewalld

关闭selinux

vi /etc/selinux/config，设置selinux=disabled

配置sudo（免密）

cat >>/etc/sudoers <<EOF

postgres ALL=(ALL) NOPASSWD: ALL

EOF

修改/etc/hosts

cat >>/etc/hosts <<EOF

192.168.24.11 pgtest1

192.168.24.12 pgtest2

192.168.24.13 pgtest3

EOF

所有节点修改主机时间，确保节点间时间和时区同步，有条件的同步时间服务器

systemctl start chronyd.service

安装依赖包

yum install -y perl-ExtUtils-Embed readline zlib-devel pam-devel libxml2-devel libxslt-devel openldap-devel python-devel gcc-c++ openssl-devel cmake gcc* readline-devel zlib bison flex bison-devel flex-devel openssl openssl-devel

二、部署postgresql集群

2.1 数据库安装

-- 数据库软件安装（三个节点安装）

--创建用户、目录（root）

useradd postgres

echo "postgres" | passwd --stdin postgres

mkdir -p /postgresql/{pg14,pgdata,arch,soft}

chown -R postgres. /postgresql/

chmod -R 700 /postgresql/

--安装（postgres）

tar zxvf postgresql-14.6.tar.gz

cd postgresql-14.6/

./configure --prefix=/postgresql/pg14

make world &&make install-world

--配置环境变量（postgres）

vi /home/postgres/.bash_profile文件添加以下内容

export LANG=en_US.UTF-8

export PGHOME=/postgresql/pg14

export PGDATA=/postgresql/pgdata

export LD_LIBRARY_PATH= $PGHOME/lib:$ LD_LIBRARY_PATH

export PATH= $PGHOME/bin:$ PATH

source ~/.bash_profile

2.2 数据库配置

主库：

--初始化数据库

postgres@pgtest1 \~\]$ initdb -D $PGDATA --postgresql.conf文件末尾插入如下内容 cat \>\> /postgresql/pgdata/postgresql.conf \<\< "EOF" listen_addresses = '\*' archive_mode = on archive_command = 'cp %p /postgresql/arch/%f' log_destination = 'csvlog' logging_collector = on EOF --pg_hba.conf文件插入host all all 0.0.0.0/0 scram-sha-256 sed -i '/\^host\[\[:space:\]\]\\+all\[\[:space:\]\]\\+all\[\[:space:\]\]\\+127.0.0.1\\/32\[\[:space:\]\]\\+trust/i\\ host all all 0.0.0.0/0 scram-sha-256' /postgresql/pgdata/pg_hba.conf --pg_hba.conf文件插入host replication all 0.0.0.0/0 scram-sha-256 sed -i '/\^host\[\[:space:\]\]\\+replication\[\[:space:\]\]\\+all\[\[:space:\]\]\\+127.0.0.1\\/32\[\[:space:\]\]\\+trust/i\\ host replication all 0.0.0.0/0 scram-sha-256' /postgresql/pgdata/pg_hba.conf --启动数据库 \[postgres@pgtest1 \~\]$ pg_ctl start --修改postgres用户默认密码 \[postgres@pgtest1 postgresql-14.6\]$ psql psql (14.6) Type "help" for help. postgres=# alter user postgres password 'postgres'; ALTER ROLE **备库** **1**： **--** **复制备库** **1** \[postgres@pgtest2 \~\]$ pg_basebackup -Fp -Pv -Xs -R -D /postgresql/pgdata -h 192.168.24.11 -p 5432 -Upostgres **--** **启动备库** **1** \[postgres@pgtest2 \~\]$ pg_ctl start **备库** **2** **：** **--** **复制备库** **2** \[postgres@pgtest3 \~\]$ pg_basebackup -Fp -Pv -Xs -R -D /postgresql/pgdata -h 192.168.24.11 -p 5432 -Upostgres **--** **启动备库** **2** \[postgres@pgtest3 \~\]$ pg_ctl start #### **2.3 主备库状态** **主库状态：** \[postgres@pgtest1 \~\]$ pg_controldata \|grep cluster Database cluster state: in production **备库1状态：** \[postgres@pgtest2 \~\]$ pg_controldata \|grep cluster Database cluster state: in archive recovery **备库2状态：** \[postgres@pgtest3 \~\]$ pg_controldata \|grep cluster Database cluster state: in archive recovery #### **2.4 集群状态** # 1.通过pg_controldata输出： \[postgres@pgtest1 postgresql-14.6\]$pg_controldata \|grep state Database cluster state: in production Database cluster state: in archive recovery # 2.通过数据字典表pg_stat_replication，主机表中能查到记录，备机表中无记录 postgres=#select pid,state,client_addr,sync_priority,sync_state from pg_stat_replication; pid \| state \| client_addr \| sync_priority \| sync_state -------+-----------+---------------+---------------+------------ 84644 \| streaming \| 192.168.24.13 \| 0 \| async 84638 \| streaming \| 192.168.24.12 \| 0 \| async (2 rows) # 3.通过wal进程查看，显示 walsender 的是主机，显示 walreceiver 的是备机 \[postgres@pgtest1 \~\]$ ps -ef\|grep wal postgres 84435 84430 0 20:05 ? 00:00:00 postgres: walwriter postgres 84638 84430 0 20:08 ? 00:00:00 postgres: walsender postgres 192.168.24.12(50458) streaming 0/5000060 postgres 84644 84430 0 20:08 ? 00:00:00 postgres: walsender postgres 192.168.24.13(35594) streaming 0/5000060 # 4. 通过自带函数判断，select pg_is_in_recovery(); #主库 \[postgres@pgtest1 \~\]$ psql -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- f (1 row) #备库1 \[postgres@pgtest2 \~\]$ psql -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- t (1 row) #备库2 \[postgres@pgtest3 \~\]$ psql -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- t (1 row) ### **三、部署watchdog** **3** **个节点安装配置** \[root@pgtest1 etc\]# modprobe softdog \[root@pgtest1 etc\]# chown postgres:postgres /dev/watchdog ### **四、部署ETCD** **3** **个节点安装配置** #### **4.1 添加环境变量** 编辑root用户下的.bash_profile文件添加如下变量： cat \>\> \~/.bash_profile \<\ /etc/etcd/etcd.conf \<\ /etc/etcd/etcd.conf \<\ /etc/etcd/etcd.conf \<\:python2:g" /usr/bin/yum \[root@pgtest1 \~\]# sed -i "s:\\\:python2:g" /usr/libexec/urlgrabber-ext-down #### **5.2 安装PIP** * 下载pip.py文件（每个节点顺序下载，不可同时下载） #curl [https://bootstrap.pypa.io/pip/3.6/get-pip.py -o get-pip.py](https://bootstrap.pypa.io/pip/3.6/get-pip.py%20-o%20get-pip.py "https://bootstrap.pypa.io/pip/3.6/get-pip.py -o get-pip.py") ++如果下载很慢，则访问++ [https://bootstrap.pypa.io/get-pip.py](https://bootstrap.pypa.io/get-pip.py "https://bootstrap.pypa.io/get-pip.py")++，将这些代码复制并粘贴到文本编辑器中，再将文件保存为++++get-pip.py++++。++ ++或者++ ++scp++++传输++++get-pip.py++++到其他节点++ * 安装.py文件 --使用清华园 #python3 get-pip.py -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host mirrors.aliyun.com #### **5.3 安装patroni** pip3 install psycopg2-binary -i https://pypi.tuna.tsinghua.edu.cn/simple pip3 install psycopg2==2.7.5 -i [Simple Index](https://pypi.tuna.tsinghua.edu.cn/simple "Simple Index") pip3 install cdiff -i https://pypi.tuna.tsinghua.edu.cn/simple pip3 install "patroni\[etcd,consul\]==3.1.1" -i [Simple Index](https://pypi.tuna.tsinghua.edu.cn/simple "Simple Index") #### **5.4 查看已安装patroni版本** \[root@pgtest1 etcd\]# patroni --version patroni 3.1.1 #### **5.5 编辑patroni配置文件** \[root@pgtest1 \]# mkdir -p /app/patroni 创建/app/patroni/patroni_config.yml文件 * **pgtest1** **节点配置** cat \> /app/patroni/patroni_config.yml \<\< EOF scope: postgres_cluster namespace: /service/ name: pgtest1 restapi: listen: 192.168.24.11:8008 connect_address: 192.168.24.11:8008 etcd: host: 192.168.24.11:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: on postgresql: use_pg_rewind: true use_slots: true parameters: listen_addresses: "0.0.0.0" port: 5432 wal_level: "replica" hot_standby: "on" wal_keep_segments: 1000 max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" initdb: - encoding: UTF8 - data-checksums postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.24.11:5432 data_dir: /postgresql/pgdata bin_dir: /postgresql/pg14/bin authentication: replication: username: postgres password: postgres superuser: username: postgres password: postgres rewind: username: postgres password: postgres tags: nofailover: false noloadbalance: false clonefrom: false nosync: false EOF * **pgtest2** **节点配置** cat \> /app/patroni/patroni_config.yml \<\< EOF scope: postgres_cluster namespace: /service/ name: pgtest2 restapi: listen: 192.168.24.12:8008 connect_address: 192.168.24.12:8008 etcd: host: 192.168.24.12:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: on postgresql: use_pg_rewind: true use_slots: true parameters: listen_addresses: "0.0.0.0" port: 5432 wal_level: "replica" hot_standby: "on" wal_keep_segments: 1000 max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" initdb: - encoding: UTF8 - data-checksums postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.24.12:5432 data_dir: /postgresql/pgdata bin_dir: /postgresql/pg14/bin authentication: replication: username: postgres password: postgres superuser: username: postgres password: postgres rewind: username: postgres password: postgres tags: nofailover: false noloadbalance: false clonefrom: false nosync: false EOF * **pgtest3** **节点配置** cat \> /app/patroni/patroni_config.yml \<\< EOF scope: postgres_cluster namespace: /service/ name: pgtest3 restapi: listen: 192.168.24.13:8008 connect_address: 192.168.24.13:8008 etcd: host: 192.168.24.13:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 master_start_timeout: 300 synchronous_mode: on postgresql: use_pg_rewind: true use_slots: true parameters: listen_addresses: "0.0.0.0" port: 5432 wal_level: "replica" hot_standby: "on" wal_keep_segments: 1000 max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" initdb: - encoding: UTF8 - data-checksums postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.24.13:5432 data_dir: /postgresql/pgdata bin_dir: /postgresql/pg14/bin authentication: replication: username: postgres password: postgres superuser: username: postgres password: postgres rewind: username: postgres password: postgres tags: nofailover: false noloadbalance: false clonefrom: false nosync: false EOF #### **5.6 配置systemd管理Patroni service** 所有节点执行相同操作： cat \>\>/usr/lib/systemd/system/patroni.service\<\

Description=patroni - a high-availability PostgreSQL

Documentation=https://patroni.readthedocs.io/en/latest/index.html

After=syslog.target network.target etcd.target

Wants=network-online.target

Service

Type=simple

User=postgres

Group=postgres

PermissionsStartOnly=true

ExecStart=/usr/local/bin/patroni /app/patroni/patroni_config.yml

ExecReload=/bin/kill -HUP $MAINPID

LimitNOFILE=65536

KillMode=process

KillSignal=SIGINT

Restart=on-abnormal

RestartSec=30s

TimeoutSec=0

Install

WantedBy=multi-user.target

EOF

5.7 启动patroni

在所有节点依次启动

#chown -R postgres. /app/patroni

#systemctl start patroni

5.8 查看patroni状态

5.8.1 查看patroni服务状态

主节点

PostgreSQL14 +patroni+etcd+haproxy+keepalived 集群部署指南