网上有很多关于patroni的文章很多,绝大多数文章是通过手动搭建的方式,仅做出了一个patroni的环境搭建,包括各种微信群等,对于patroni参数的使用,故障转移的原理以及实操都只字未提,本文通过Ubuntu 20 环境下 patroni 自动化安装,一分钟快速搭建 patroni 集群来快速搭建一个集群,以及实操的方式实现故障转移的测试和验证,通过模拟真实的故障以及故障转移的日志,来分析故障转移的实现和效果。
0,patroni 集群状态
ubuntu11 注,ubuntu12,ubuntu13 为从,以下测试始终保持Ubuntu11 为主,Ubuntu 12 Ubuntu 13为从的架构
root@ubuntu11:/usr/local/patroni_install# patronictl -c /usr/local/pgsql17/patroni/patroni.yml list
+ Cluster: pg_cluster_wy_prod (7641831362696373502) ---------+----+-------------+-----+------------+-----+
| Member | Host | Role | State | TL | Receive LSN | Lag | Replay LSN | Lag |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
| ubuntu11 | 192.168.152.121:9000 | Leader | running | 6 | | | | |
| ubuntu12 | 192.168.152.122:9000 | Sync Standby | streaming | 6 | 0/F000348 | 0 | 0/F000348 | 0 |
| ubuntu13 | 192.168.152.123:9000 | Replica | streaming | 6 | 0/F000348 | 0 | 0/F000348 | 0 |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
鉴于测试目的,设置patroni的systemctl service服务的自动启动为no
postgres@ubuntu11:~$ cat /etc/systemd/system/patroni.service
[Unit]
Description=Patroni
After=network.target etcd.service
Wants=etcd.service
[Service]
Type=simple
User=postgres
Group=postgres
Environment="TZ=Asia/Shanghai"
Environment="PYTHONUNBUFFERED=1"
ExecStart=/usr/local/bin/patroni /usr/local/pgsql17/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -TERM $MAINPID
#Restart=on-failure
Restart=no
RestartSec=10
TimeoutStartSec=120
TimeoutStopSec=60
LimitNOFILE=65536
StandardOutput=null
StandardError=journal
SyslogIdentifier=patroni
[Install]
WantedBy=multi-user.target
postgres@ubuntu11:~$
Ubuntu 11主节点日志, 每隔 10 秒轮询一次集群状态,轮训间隔由参数loop_wait决定
2026-05-21 08:46:37,142 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-21 08:46:47,145 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-21 08:46:57,190 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-21 08:47:07,148 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-21 08:47:17,145 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-21 08:47:27,189 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-21 08:47:37,153 INFO: no action. I am (ubuntu11), the leader with the lock
Ubuntu 12 从节点日志
2026-05-21 08:46:47,628 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:46:57,628 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:07,671 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:17,632 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:27,675 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:37,139 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:47,233 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:57,681 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
Ubuntu 13 从节点日志
2026-05-21 08:46:57,643 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:07,696 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:17,647 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:27,688 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:37,155 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:47,255 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
2026-05-21 08:47:57,696 INFO: no action. I am (ubuntu13), a secondary, and following a leader (ubuntu11)
1,自动故障转移场景1:主节点OS正常,patroni服务异常故障
主节点状态正常,关闭主节点patroni服务模拟主节点故障
root@ubuntu11:/usr/local/patroni_install# patronictl -c /usr/local/pgsql17/patroni/patroni.yml list
+ Cluster: pg_cluster_wy_prod (7641831362696373502) ---------+----+-------------+-----+------------+-----+
| Member | Host | Role | State | TL | Receive LSN | Lag | Replay LSN | Lag |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
| ubuntu11 | 192.168.152.121:9000 | Leader | running | 6 | | | | |
| ubuntu12 | 192.168.152.122:9000 | Sync Standby | streaming | 6 | 0/F000348 | 0 | 0/F000348 | 0 |
| ubuntu13 | 192.168.152.123:9000 | Replica | streaming | 6 | 0/F000348 | 0 | 0/F000348 | 0 |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
root@ubuntu11:/usr/local/patroni_install# systemctl stop patroni
root@ubuntu11:/usr/local/patroni_install#
从节点Ubuntu12上观察到的集群状态,此时原始主节点已处于停止状态
root@ubuntu12:/usr/local/patroni_install# patronictl -c /usr/local/pgsql17/patroni/patroni.yml list
+ Cluster: pg_cluster_wy_prod (7641831362696373502) ---------+----+-------------+-----+------------+-----+
| Member | Host | Role | State | TL | Receive LSN | Lag | Replay LSN | Lag |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
| ubuntu11 | 192.168.152.121:9000 | Replica | stopped | | unknown | | unknown | |
| ubuntu12 | 192.168.152.122:9000 | Leader | running | 7 | | | | |
| ubuntu13 | 192.168.152.123:9000 | Sync Standby | streaming | 7 | 0/100001A8 | 0 | 0/100001A8 | 0 |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
原始从节点Ubuntu12,成为新的主节点,日志如下
......
2026-05-21 08:55:27,680 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:55:37,723 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:55:47,683 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:55:57,681 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-21 08:56:06,109 WARNING: Request failed to ubuntu11: GET http://192.168.152.121:8008/patroni (HTTPConnectionPool(host='192.168.152.121', port=8008): Max retries exceeded with url: /patroni (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))))
2026-05-21 08:56:06,169 INFO: promoted self to leader by acquiring session lock
2026-05-21 08:56:06,169 INFO: Lock owner: ubuntu12; I am ubuntu12
2026-05-21 08:56:06,172 INFO: updated leader lock during promote
server promoting
2026-05-21 08:56:07,185 INFO: Lock owner: ubuntu12; I am ubuntu12
2026-05-21 08:56:07,195 INFO: Assigning synchronous standby status to ['ubuntu13']
server signaled
2026-05-21 08:56:09,324 INFO: Synchronous standby status assigned to ['ubuntu13']
2026-05-21 08:56:09,369 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-21 08:56:17,196 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-21 08:56:27,187 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-21 08:56:37,242 INFO: no action. I am (ubuntu12), the leader with the lock
......
这种场景下的故障转移的流程:
手动关闭Ubuntu11 Patroni 主节点模拟故障 --------->Ubuntu 11上的patroni主动删除 DCS 中的 leader key---------> Ubuntu12 从节点经过loop_wait轮训后检测到DSC无主 ---------> 获取锁提升为 Leader---------> promote 本地PostgreSQL为主库
2,自动故障转移场景2:主节点服务器断电
Ubuntu11 通过"关机"(而非关闭客户机)来模拟服务器突然断电,这种场景需要深刻理解租约寿命,也就是ttl(默认 30 秒)参数的概念

新的主节点Ubuntu 12上看到的集群状态
root@ubuntu13:/usr/local/patroni_install# patronictl -c /usr/local/pgsql17/patroni/patroni.yml list
+ Cluster: pg_cluster_wy_prod (7642212398676862997) ----+----+-------------+-----+------------+-----+------------------------+
| Member | Host | Role | State | TL | Receive LSN | Lag | Replay LSN | Lag | Tags |
+----------+----------------------+---------+-----------+----+-------------+-----+------------+-----+------------------------+
| ubuntu11 | 192.168.152.121:9000 | Leader | running | 10 | | | | | failover_priority: 100 |
| ubuntu12 | 192.168.152.122:9000 | Replica | streaming | 10 | 0/C000000 | 0 | 0/C000358 | 0 | failover_priority: 80 |
| ubuntu13 | 192.168.152.123:9000 | Replica | streaming | 10 | 0/C000380 | 0 | 0/C000380 | 0 | failover_priority: 60 |
+----------+----------------------+---------+-----------+----+-------------+-----+------------+-----+------------------------+
root@ubuntu13:/usr/local/patroni_install#
root@ubuntu13:/usr/local/patroni_install# patronictl -c /usr/local/pgsql17/patroni/patroni.yml list
+ Cluster: pg_cluster_wy_prod (7642212398676862997) ---------+----+-------------+-----+------------+-----+-----------------------+
| Member | Host | Role | State | TL | Receive LSN | Lag | Replay LSN | Lag | Tags |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+-----------------------+
| ubuntu12 | 192.168.152.122:9000 | Leader | running | 11 | | | | | failover_priority: 80 |
| ubuntu13 | 192.168.152.123:9000 | Sync Standby | streaming | 11 | 0/C000688 | 0 | 0/C000688 | 0 | failover_priority: 60 |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+-----------------------+
root@ubuntu13:/usr/local/patroni_install#
新的主节点Ubuntu12上patroni的日志
2026-05-22 13:53:59,956 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:54:10,451 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:54:20,026 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:54:30,458 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:54:40,461 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:54:50,499 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:55:00,456 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 13:55:10,457 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)######差不多这个点开始对Ubuntu11掉电
2026-05-22 13:55:20,498 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)######为什么这个点,检测到的Ubuntu11还是正常状态?因为Ubuntu11的lease也就是租约还没有过期
2026-05-22 13:55:32,106 WARNING: Request failed to ubuntu11: GET http://192.168.152.121:8008/patroni (HTTPConnectionPool(host='192.168.152.121', port=8008): Max retries exceeded with url: /patroni (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f893566d880>, 'Connection to 192.168.152.121 timed out. (connect timeout=2)')))
2026-05-22 13:55:32,114 INFO: promoted self to leader by acquiring session lock
2026-05-22 13:55:32,114 INFO: Lock owner: ubuntu12; I am ubuntu12
2026-05-22 13:55:32,115 INFO: updated leader lock during promote
2026-05-22 13:55:33,137 INFO: Lock owner: ubuntu12; I am ubuntu12
2026-05-22 13:55:33,193 INFO: Assigning synchronous standby status to ['ubuntu13']
2026-05-22 13:55:35,316 INFO: Synchronous standby status assigned to ['ubuntu13']
2026-05-22 13:55:35,322 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-22 13:55:35,377 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-22 13:55:45,324 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-22 13:55:55,367 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-22 13:56:05,329 INFO: no action. I am (ubuntu12), the leader with the lock
新的主节点通过psql查看身份状态
postgres=#
postgres=#
postgres=# select now(),pg_is_in_recovery(); ###########################这里开始对原始主节点Ubuntu11 掉电,然后连续查询
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:10.849473+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:11.665724+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
------------------------------+-------------------
2026-05-22 13:55:12.32947+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:13.017149+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:13.799962+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:14.902866+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:15.672331+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:16.435662+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:17.070935+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:17.816528+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:18.546785+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:19.393943+08 | t
(1 row)
#......中间省略掉......
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:29.759037+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:30.417626+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:31.089604+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:31.775459+08 | t
(1 row)
postgres=# select now(),pg_is_in_recovery(); ###########################22秒之后,新的主节点才真正promote起来
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:32.400935+08 | f
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
------------------------------+-------------------
2026-05-22 13:55:33.27183+08 | f
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:33.950342+08 | f
(1 row)
postgres=# select now(),pg_is_in_recovery();
now | pg_is_in_recovery
-------------------------------+-------------------
2026-05-22 13:55:34.758651+08 | f
(1 row)
postgres=# postgres=#
postgres-# postgres=#
结合上述日志,来理解ttl的概念,从时间的维度来观察:
1,2026-05-22 13:55:10,457,上面提到差不多在这个是时间点开始对原主节点Ubuntu11断电,
2,2026-05-22 13:55:20,498 ,patroni日志中检测到的Ubuntu11还是正常状态?
3, 2026-05-22 13:55:32.400935,通过查询新的主节点的pg_is_in_recovery,发现pg_is_in_recovery才变为f,也即故障转移成功
日志是否与实际操作的不符合,明明Ubuntu11在13:55:10就断电了,为什么13:55:20还在检测的时候还是正常的,但是直到13:55:32,新的主节点才真正开始工作,这是不是矛盾的?
这是因为,在13:55:10断电,在13:55:10前几秒(减去一个loop_wait的时间点,loop_wait默认10秒), Ubuntu11上的patroni对etcd中的leader key续约,续约一次生效时间为向后推30秒,lease也就是租约还没有过期,其租约大概在13:55:30之后才过期,因此在13:55:20这个时间点,接替它的从节点上的patroni服务,检测到leader key 并没有过期。
直到下一个检测周期,也即13:55:30的时候,这一轮检查的时候才发现"2026-05-22 13:55:32,106 WARNING: Request failed to ubuntu11: GET http://192.168.152.121:8008/patroni (HTTPConnectionPool(host='192.168.152.121', port=8008): Max retries exceeded with url: /patroni (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f893566d880>, 'Connection to 192.168.152.121 timed out. (connect timeout=2)')))"原始主节点异常,为什么日志是13:55:32,在13:55:30的基础上加了2秒呢?因为connect timeout=2
以上才是patroni参数中ttl的真正含义。
这种场景下的故障转移的流程:
关闭Ubuntu11 电源 模拟主节点故障 --------->10秒后 Ubuntu 11上的leader 租约扔有效(实际上此时Ubuntu已宕机) --------->10秒后 Ubuntu 11上的leader 租约扔有效 (实际上此时Ubuntu已宕机) --------->10秒后 Ubuntu 12检测到leader 失效---------> 抢占leader key,promote 本地PostgreSQL为主库
因此如果想提到patroni的故障转移的灵敏性,需要减小ttl的值,也即减少leader key的租约时间,同时也要减小loop_wait,增加判断leader key的频率,来提升故障检测以及转移的灵敏性,但也要意识到,调小这两个参数,可能在网络抖动是会带来的预期之外的故障转移。
3,自动故障转移场景3:主节点网络分区
用iptables -A OUTPUT -d 192.168.152.121 -j DROP
从节点1
root@ubuntu12:/usr/local/patroni_install# sudo iptables -A OUTPUT -d 192.168.152.121 -j DROP
root@ubuntu12:/usr/local/patroni_install# sudo iptables -A INPUT -s 192.168.152.121 -j DROP
root@ubuntu12:/usr/local/patroni_install#
从节点2
root@ubuntu13:/usr/local/patroni_install# sudo iptables -A OUTPUT -d 192.168.152.121 -j DROP
root@ubuntu13:/usr/local/patroni_install# sudo iptables -A INPUT -s 192.168.152.121 -j DROP
root@ubuntu13:/usr/local/patroni_install#
网络分区已形成
此时对于Ubuntu12已经成功接管主节点
2026-05-22 14:47:38,980 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 14:47:49,402 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 14:47:58,941 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 14:48:09,491 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 14:48:19,441 INFO: no action. I am (ubuntu12), a secondary, and following a leader (ubuntu11)
2026-05-22 14:48:31,104 WARNING: Request failed to ubuntu11: GET http://192.168.152.121:8008/patroni (HTTPConnectionPool(host='192.168.152.121', port=8008): Max retries exceeded with url: /patroni (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f8988fdb1f0>, 'Connection to 192.168.152.121 timed out. (connect timeout=2)')))
2026-05-22 14:48:31,183 INFO: promoted self to leader by acquiring session lock
2026-05-22 14:48:31,187 INFO: Lock owner: ubuntu12; I am ubuntu12
2026-05-22 14:48:31,239 INFO: updated leader lock during promote
2026-05-22 14:48:32,206 INFO: Lock owner: ubuntu12; I am ubuntu12
2026-05-22 14:48:32,214 INFO: Assigning synchronous standby status to ['ubuntu13']
2026-05-22 14:48:34,337 INFO: Synchronous standby status assigned to ['ubuntu13']
2026-05-22 14:48:34,385 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-22 14:48:42,245 INFO: no action. I am (ubuntu12), the leader with the lock
2026-05-22 14:48:52,256 INFO: no action. I am (ubuntu12), the leader with the lock
此时原主节点日志已无法连接至Ubuntu12 和Ubuntu 13,注意日志
2026-05-22 14:48:17,257 ERROR: Error communicating with DCS
2026-05-22 14:48:17,258 INFO: demoting self because DCS is not accessible and I was a leader
2026-05-22 14:48:17,258 INFO: Demoting self (offline)
原始主节点网络分区之后,自动降级为只读状态,同时会连续不停地尝试连接到Ubuntu12和ubuntu13上的etcd集群(日志在持续生成,没有贴全),以确保网络恢复后自动加入集群
2026-05-22 14:47:38,896 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-22 14:47:38,963 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-22 14:47:48,912 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-22 14:47:58,948 INFO: no action. I am (ubuntu11), the leader with the lock
2026-05-22 14:48:08,903 INFO: Lock owner: ubuntu11; I am ubuntu11
2026-05-22 14:48:12,244 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=3.3332171243333355)")
2026-05-22 14:48:12,244 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:48:12,244 INFO: Retrying on http://192.168.152.123:2379
2026-05-22 14:48:13,913 ERROR: Request to server http://192.168.152.123:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.123', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b2b0>, 'Connection to 192.168.152.123 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:48:13,913 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:48:13,913 INFO: Retrying on http://192.168.152.122:2379
2026-05-22 14:48:15,583 ERROR: Request to server http://192.168.152.122:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b2e0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:48:15,583 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:48:17,253 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b520>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:48:17,256 ERROR: watchprefix failed: ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2026-05-22 14:48:17,257 ERROR: Error communicating with DCS
2026-05-22 14:48:17,258 INFO: demoting self because DCS is not accessible and I was a leader
2026-05-22 14:48:17,258 INFO: Demoting self (offline)
2026-05-22 14:48:18,932 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b970>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:00,355 INFO: postmaster pid=3525
2026-05-22 14:49:01,400 INFO: demoted self because DCS is not accessible and I was a leader
2026-05-22 14:49:01,403 WARNING: Loop time exceeded, rescheduling immediately.
2026-05-22 14:49:01,405 INFO: Lock owner: ubuntu11; I am ubuntu11
2026-05-22 14:49:01,405 INFO: establishing a new patroni heartbeat connection to postgres
2026-05-22 14:49:04,749 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=3.33254870033331)")
2026-05-22 14:49:04,749 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:04,749 INFO: Retrying on http://192.168.152.123:2379
2026-05-22 14:49:06,419 ERROR: Request to server http://192.168.152.123:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.123', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04bfa0>, 'Connection to 192.168.152.123 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:06,419 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:06,419 INFO: Retrying on http://192.168.152.122:2379
2026-05-22 14:49:08,089 ERROR: Request to server http://192.168.152.122:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e451c0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:08,089 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:09,758 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c53fa90>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:11,417 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=1.00350682653891)")
2026-05-22 14:49:11,417 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:13,086 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04ba60>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:13,088 ERROR: Error communicating with DCS
2026-05-22 14:49:13,088 INFO: DCS is not accessible
2026-05-22 14:49:13,088 ERROR: watchprefix failed: ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2026-05-22 14:49:13,090 WARNING: Loop time exceeded, rescheduling immediately.
2026-05-22 14:49:14,757 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b820>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:14,763 INFO: Lock owner: ubuntu11; I am ubuntu11
2026-05-22 14:49:18,103 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=3.3331819403333234)")
2026-05-22 14:49:18,103 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:18,103 INFO: Retrying on http://192.168.152.122:2379
2026-05-22 14:49:19,773 ERROR: Request to server http://192.168.152.122:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e450d0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:19,773 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:19,773 INFO: Retrying on http://192.168.152.123:2379
2026-05-22 14:49:21,441 ERROR: Request to server http://192.168.152.123:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.123', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e45370>, 'Connection to 192.168.152.123 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:21,442 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:23,112 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e45670>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:23,114 ERROR: Error communicating with DCS
2026-05-22 14:49:23,114 INFO: DCS is not accessible
2026-05-22 14:49:23,114 ERROR: watchprefix failed: ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2026-05-22 14:49:23,115 WARNING: Loop time exceeded, rescheduling immediately.
2026-05-22 14:49:24,784 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e45b50>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:24,790 INFO: Lock owner: ubuntu11; I am ubuntu11
2026-05-22 14:49:28,128 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=3.332673626333379)")
2026-05-22 14:49:28,128 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:28,128 INFO: Retrying on http://192.168.152.123:2379
2026-05-22 14:49:29,799 ERROR: Request to server http://192.168.152.123:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.123', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e5c220>, 'Connection to 192.168.152.123 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:29,799 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:29,799 INFO: Retrying on http://192.168.152.122:2379
2026-05-22 14:49:31,469 ERROR: Request to server http://192.168.152.122:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c41f460>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:31,469 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:33,138 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c53f9a0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:34,794 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=1.1904628130222932)")
2026-05-22 14:49:34,794 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:36,464 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b0a0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:36,468 ERROR: watchprefix failed: ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2026-05-22 14:49:36,468 ERROR: Error communicating with DCS
2026-05-22 14:49:36,468 INFO: DCS is not accessible
2026-05-22 14:49:36,470 WARNING: Loop time exceeded, rescheduling immediately.
2026-05-22 14:49:38,140 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b8b0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:38,145 INFO: Lock owner: ubuntu11; I am ubuntu11
2026-05-22 14:49:41,485 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=3.33319548833335)")
2026-05-22 14:49:41,485 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:41,485 INFO: Retrying on http://192.168.152.122:2379
2026-05-22 14:49:43,153 ERROR: Request to server http://192.168.152.122:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e45fa0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:43,153 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:43,153 INFO: Retrying on http://192.168.152.123:2379
2026-05-22 14:49:44,823 ERROR: Request to server http://192.168.152.123:2379 failed: MaxRetryError("HTTPConnectionPool(host='192.168.152.123', port=2379): Max retries exceeded with url: /v3/lease/keepalive (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e459d0>, 'Connection to 192.168.152.123 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:44,823 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:46,493 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e45ac0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:48,150 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=1.4432267216546961)")
2026-05-22 14:49:48,150 INFO: Reconnection allowed, looking for another server.
2026-05-22 14:49:49,819 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e45100>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 14:49:49,821 ERROR: Error communicating with DCS
2026-05-22 14:49:49,821 ERROR: watchprefix failed: ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2026-05-22 14:49:49,821 INFO: DCS is not accessible
Ubuntu 12解除网络分区,
root@ubuntu12:/usr/local/patroni_install# sudo iptables -D OUTPUT -d 192.168.152.121 -j DROP
root@ubuntu12:/usr/local/patroni_install# sudo iptables -D INPUT -s 192.168.152.121 -j DROP
Ubuntu 13上也解除网络分区
root@ubuntu13:/usr/local/patroni_install# sudo iptables -D OUTPUT -d 192.168.152.121 -j DROP
root@ubuntu13:/usr/local/patroni_install# sudo iptables -D INPUT -s 192.168.152.121 -j DROP
root@ubuntu13:/usr/local/patroni_install#
root@ubuntu13:/usr/local/patroni_install#
可以发现被隔离的Ubuntu11自动以从节点身份加入集群。
root@ubuntu12:/usr/local/patroni_install#
root@ubuntu12:/usr/local/patroni_install#
root@ubuntu12:/usr/local/patroni_install# patronictl -c /usr/local/pgsql17/patroni/patroni.yml list
+ Cluster: pg_cluster_wy_prod (7642589780522937440) ---------+----+-------------+-----+------------+-----+
| Member | Host | Role | State | TL | Receive LSN | Lag | Replay LSN | Lag |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
| ubuntu11 | 192.168.152.121:9000 | Replica | streaming | 5 | 0/60043F0 | 0 | 0/60043F0 | 0 |
| ubuntu12 | 192.168.152.122:9000 | Leader | running | 5 | | | | |
| ubuntu13 | 192.168.152.123:9000 | Sync Standby | streaming | 5 | 0/60043F0 | 0 | 0/60043F0 | 0 |
+----------+----------------------+--------------+-----------+----+-------------+-----+------------+-----+
root@ubuntu12:/usr/local/patroni_install#
root@ubuntu12:/usr/local/patroni_install#
Ubuntu11上的日志
2026-05-22 15:02:40,971 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e5c4c0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 15:02:42,626 ERROR: Request to server http://192.168.152.121:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='192.168.152.121', port=2379): Read timed out. (read timeout=1.3500901345867078)")
2026-05-22 15:02:42,626 INFO: Reconnection allowed, looking for another server.
2026-05-22 15:02:44,295 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2027e5cb80>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 15:02:44,296 ERROR: Error communicating with DCS
2026-05-22 15:02:44,297 ERROR: watchprefix failed: ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
2026-05-22 15:02:44,297 INFO: DCS is not accessible
2026-05-22 15:02:44,298 WARNING: Loop time exceeded, rescheduling immediately.
2026-05-22 15:02:45,967 ERROR: Failed to get list of machines from http://192.168.152.122:2379/v3: MaxRetryError("HTTPConnectionPool(host='192.168.152.122', port=2379): Max retries exceeded with url: /v3/cluster/member/list (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f205c04b3a0>, 'Connection to 192.168.152.122 timed out. (connect timeout=1.6666666666666667)'))")
2026-05-22 15:02:45,975 INFO: Lock owner: ubuntu11; I am ubuntu11
2026-05-22 15:02:47,985 ERROR: failed to update leader lock
2026-05-22 15:02:47,994 INFO: not promoting because failed to update leader lock in DCS
2026-05-22 15:02:47,994 INFO: Lock owner: ubuntu12; I am ubuntu11
2026-05-22 15:02:48,001 INFO: Local timeline=4 lsn=0/70000A0
2026-05-22 15:02:48,027 INFO: primary_timeline=5
2026-05-22 15:02:48,030 INFO: primary: history=1 0/504F580 no recovery target specified
2 0/6003D20 no recovery target specified
3 0/6003EC0 no recovery target specified
4 0/6004148 no recovery target specified
2026-05-22 15:02:48,049 INFO: running pg_rewind from ubuntu12
2026-05-22 15:02:49,312 INFO: running pg_rewind from dbname=postgres user=rewind_user host=192.168.152.122 port=9000 target_session_attrs=read-write
2026-05-22 15:02:50,305 INFO: pg_rewind exit code=0
2026-05-22 15:02:50,305 INFO: stdout=
2026-05-22 15:02:50,305 INFO: stderr=pg_rewind: servers diverged at WAL location 0/6004148 on timeline 4
pg_rewind: rewinding from last common checkpoint at 0/6004038 on timeline 4
pg_rewind: Done!
2026-05-22 15:02:50,307 WARNING: Postgresql is not running.
2026-05-22 15:02:50,308 INFO: Lock owner: ubuntu12; I am ubuntu11
2026-05-22 15:02:50,319 INFO: pg_controldata:
pg_control version number: 1700
Catalog version number: 202406281
Database system identifier: 7642589780522937440
Database cluster state: in archive recovery
pg_control last modified: Fri May 22 15:02:50 2026
Latest checkpoint location: 0/6004340
Latest checkpoint's REDO location: 0/60042E8
Latest checkpoint's REDO WAL file: 000000050000000000000006
Latest checkpoint's TimeLineID: 5
Latest checkpoint's PrevTimeLineID: 5
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:762
Latest checkpoint's NextOID: 24576
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 731
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 762
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Fri May 22 14:53:31 2026
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/60043F0
Min recovery ending loc's timeline: 5
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 3587dd0ff212f7ed05a16aa24aa1d6a6f187f55d5d6a2e158ce45327a7e55005
2026-05-22 15:02:50,320 INFO: Lock owner: ubuntu12; I am ubuntu11
2026-05-22 15:02:50,367 INFO: starting as a secondary
2026-05-22 15:02:50,368 INFO: closed patroni connections to postgres
2026-05-22 15:02:50,738 INFO: postmaster pid=3952
2026-05-22 15:02:51,774 INFO: Lock owner: ubuntu12; I am ubuntu11
2026-05-22 15:02:51,774 INFO: establishing a new patroni heartbeat connection to postgres
2026-05-22 15:02:51,795 INFO: Local timeline=5 lsn=0/60043F0
2026-05-22 15:02:51,803 INFO: primary_timeline=5
2026-05-22 15:02:51,812 INFO: no action. I am (ubuntu11), a secondary, and following a leader (ubuntu12)
2026-05-22 15:02:52,281 INFO: no action. I am (ubuntu11), a secondary, and following a leader (ubuntu12)
2026-05-22 15:03:02,819 INFO: no action. I am (ubuntu11), a secondary, and following a leader (ubuntu12)
2026-05-22 15:03:12,778 INFO: no action. I am (ubuntu11), a secondary, and following a leader (ubuntu12)
2026-05-22 15:03:22,777 INFO: no action. I am (ubuntu11), a secondary, and following a leader (ubuntu12)
4,总结
本文通过三种实际的故障,严苛测试了patroni故障转移集群的高可用性,可以发现patroni可以完美处理各种故障,实现集群的高可用性,同时对于故障转移集群的ttl参数,以及loop_wait参数,在故障转移中的作用,做了实操性的验证,笔者本身也对这两个参数有了更加深刻的认识。