手册
https://patroni.readthedocs.io/en/latest/index.html
patroni运维
1.列出节点信息
patronictl -c /etc/patroni.yml list
2.重做备库
reinit先是移除了整个data目录。然后选择正确的节点进行备份恢复。
patronictl -c /etc/patroni.yml reinit patnori-test
language
# patronictl -c /etc/patroni.yml reinit pgsql
+ Cluster: pgsql (6972099274779350082)+------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+---------------------+--------+---------+----+-----------+
| pgsql_node1 | 192.168.22.128:5432 | | running | 3 | 0 |
| pgsql_node2 | 192.168.22.129:5432 | Leader | running | 3 | |
| pgsql_node3 | 192.168.22.130:5432 | | running | 3 | 0 |
+-------------+---------------------+--------+---------+----+-----------+
Select the following node names to add: pgsql_node3
Are you sure you want to reinitialize members pgsql_node3?[y/N]: y
Success: for member pgsql_node3 Perform initialization
参数查看参数
patronictl -c /etc/patroni.yaml show-config
更改参数
patronictl -c /etc/patroni.yaml edit-config
reload加载参数(同时在三个节点生效)
patronictl -c /etc/patroni.yaml reload patnori-test
重启节点/关闭节点
加- -force 强制
加- -scheduled 2023-09-13T18:00-03:00 定时
1.仅重启当前节点(node1节点)
[root@pgtest1 ~]# patronictl restart pg_cluster node1
2.如果节点是 pending 状态的,才会执行重启操作
[root@pgtest1 ~]# patronictl restart pg_cluster --pending
3.重启所有成员
[root@pgtest1 ~]# patronictl restart pg_cluster
维护模式,脱离patroni的集群管理
patronictl pause
patronictl pause暂时将Patroni集群置于维护模式并禁用自动故障转移。
在某些情况下,Patroni需要暂时退出集群管理,同时仍然在DCS中保留集群状态。可能的用例是集群上不常见的活动,例如主要版本升级或损坏恢复。在这些活动期间,节点经常因为Patroni不知道的原因而启动和停止,有些节点甚至可以暂时提升,这违反了只运行一个主节点的假设。因此,Patroni需要能够与正在运行的集群"分离",在Pacemaker中实现与维护模式相当的功能。
patronictl resume
patronictl resume将使Patroni集群退出维护模式,并重新启用自动故障转移。
自动拉起所有数据库
switchover切换
patronictl switchover
language
# Switchover
[postgres@pgtest1 ~]$ patronictl switchover
Master [pgtest1]:
Candidate ['pgtest2', 'pgtest3'] []: pgtest2
When should the switchover take place (e.g. 2021-10-28T04:45 ) [now]:
Current cluster topology
... ...
Are you sure you want to switchover cluster pg_cluster, demoting current master pgtest1? [y/N]: y
2021-10-28 03:45:35.91763 Successfully switched over to "pgtest2"
... ...
language
数据库从 pgtest1 switchover 到 pgtest2
[root@pgtest1 ~]# curl -s http://192.168.58.10:8008/switchover -XPOST -d '{"leader":"pgtest1","candidate":"pgtest2"}'
Successfully switched over to "pgtest2"
failover切换
patronictl failover
# Failover
[postgres@pgtest1 ~]$ patronictl failover
Candidate ['pgtest1', 'pgtest3'] []: pgtest1
Current cluster topology
... ...
Are you sure you want to failover cluster pg_cluster, demoting current master pgtest2? [y/N]: y
2021-10-28 03:47:56.13486 Successfully failed over to "pgtest1"
... ...
执行到特定节点的 failover,在节点都正常的情况下,执行 failover 实际上和执行 Switchover 一样
curl -s http://192.168.58.10:8008/failover -XPOST -d '{"candidate":"pgtest2"}'
language
[root@pgtest1 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Leader | running | 5 | |
| pgtest2 | 192.168.58.11 | Replica | running | 5 | 0 |
| pgtest3 | 192.168.58.12 | Replica | running | 5 | 0 |
+---------+---------------+---------+---------+----+-----------+
[root@pgtest1 ~]# curl -s http://192.168.58.10:8008/failover -XPOST -d '{"candidate":"pgtest2"}'
使用 patronictl 执行数据库查询操作
language
[postgres@pgtest1 ~]$ cat aa.sql
select * from test_1;
[postgres@pgtest1 ~]$ patronictl -c /enmo/app/patroni/pg_test01.yml query -f aa.sql --password
Password:
id create_time
1 2021-10-16 17:47:34
2 2021-10-16 17:55:06
[postgres@pgtest1 ~]$ patronictl -c /enmo/app/patroni/pg_test01.yml query -c "select * from test_1;" --password
Password:
id create_time
1 2021-10-16 17:47:34
2 2021-10-16 17:55:06
获取主节点dsn信息
$ ./patronictl -c postgres.yml dsn pgsql
host=192.168.1.143 port=5432
ETCD
1.查看etcd节点
etcdctl endpoint status --cluster -w table
language
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.58.11:2379 | 3f414532c235ce16 | 3.5.1 | 20 kB | false | false | 4 | 16 | 16 | |
| http://192.168.58.12:2379 | 41cf8a739c2e9b50 | 3.5.1 | 20 kB | false | false | 4 | 16 | 16 | |
| http://192.168.58.10:2379 | caef4208a95efee8 | 3.5.1 | 25 kB | true | false | 4 | 16 | 16 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
2.转移leader节点
etcdctl move-leader 3f414532c235ce16
3.保存数据快照(一个节点执行)
etcdctl snapshot save etcd_bak.db
4.查看快照信息
etcdctl snapshot status etcd_bak.db -w table
[root@pgtest1 ~]# etcdctl snapshot status etcd_bak.db -w table
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 230fea56 | 0 | 8 | 25 kB |
+----------+----------+------------+------------+
Zookeeper
1.查看zookeeper状态
./zkServer.sh status
2.启动zookeeper
./zkServer.sh start
3.停止zookeeper
./zkServer.sh stop
4.连接到zookeeper服务
./zkCli.sh -server localhost:2181
ls(查看当前节点数据),
ls2(查看当前节点数据并能看到更新次数等数据)
create(创建一个节点)
get(得到一个节点,包含数据和更新次数等数据)
set(修改节点)
delete(删除一个节点)
5.查看patroni key值
ls
ls /service/batman
查看相关key值
[leader, optime, failover, members, initialize, history, config, sync]
batman是patroni配置文件中scope的名字。
leader(主节点的名字)
get /service/batman/leader
leader记录主节点的名字,是临时节点,当session时间超过ttl后未响应,zookeeper就会删除该节点。
language
zk: localhost:2181(CONNECTED) 33] get /service/batman/leader
postgresql1
sync(同步复制的状态)
get /service/batman/sync
sync记录同步复制的状态,是持久节点不会因为session到期,删除该key值
language
[zk: localhost:2181(CONNECTED) 30] get /service/batman/sync
{"leader":"postgresql0","sync_standby":"postgresql1"}
optime(主库最后一次lsn位置)
get /service/batman/optime/leader
optime/leader是主库最后一次操作后的lsn位置,是持久节点不会因为session到期,删除该key值。
language
[zk: localhost:2181(CONNECTED) 71] get /service/batman/optime/leader
1342177280
failover(记录计划的切换任务)
get /service/batman/failover
failover是记录计划的切换任务,是持久节点不会因为session到期,删除该key值。
language
[zk: localhost:2181(CONNECTED) 117] get /service/batman/failover
{"leader":"postgresql1","scheduled_at":"2018-11-09T14:30:00+08:00"}
members(记录所有数据节点的连接信息和重要的状态信息)
get /service/batman/members/postgresql0
get /service/batman/members/postgresql
members是分别记录了所有数据节点的连接信息和重要的状态信息,是临时节点,当session时间超过ttl后未响应,zookeeper就会删除该节点。
language
[zk: localhost:2181(CONNECTED) 120] get /service/batman/members/postgresql0
{"conn_url":"postgres://192.168.56.5:5432/postgres","api_url":"http://192.168.56.5:8008
/patroni","timeline":58,"state":"running","role":"replica","xlog_location":1426063952}
initialize(记录数据库集群初始化的信息)
get /service/batman/initialize
initialize记录了数据库集群初始化的信息,是持久节点不会因为session到期,删除该key值。
language
[zk: localhost:2181(CONNECTED) 123] get /service/batman/initialize
6618183861621602635
//这个id是控制文件信息里的 Database system identifier
[postgres@node2 ~]$ pg_controldata
pg_control version number: 1100
Catalog version number: 201809051
Database system identifier: 6618183861621602635
history(记录的是集群中时间线变化的过程)
get /service/batman/history
history记录的是集群中时间线变化的过程,是持久节点不会因为session到期,删除该key值
language
[zk: localhost:2181(CONNECTED) 26] get /service/batman/history
[[1,67109464,"no recovery target specified"],[2,83886232,"no recovery target
specified"],[3,100663448,"no recovery target specified"],[4,218103960,"no recovery
target specified"],[5,251658392,"no recovery target specified"],[6,268435608,"no
recovery target specified"],[7,318767256,"no recovery target specified"],
[8,335544472,"no recovery target specified"],[9,352321688,"no recovery target
specified"],[10,369098904,"no recovery target specified"],[11,402653336,"no recovery
target specified"],[12,419430552,"no recovery target specified"],[13,436207768,"no
recovery target specified"],[14,452984984,"no recovery target specified"],
[15,469762200,"no recovery target specified"],[16,486539416,"no recovery target
specified"],[17,503316632,"no recovery target specified"],[18,536871064,"no recovery
target specified"],[19,553648280,"no recovery target specified"],[20,570425496,"no
recovery target specified"],[21,603979928,"no recovery target specified"],
[22,620757144,"no recovery target specified"],[23,637534360,"no recovery target
specified"],[24,654311576,"no recovery target specified"],[25,671088792,"no recovery
target specified","2018-11-05T16:44:52+08:00"],[26,704643224,"no recovery target
specified","2018-11-05T16:49:12+08:00"],[27,721420440,"no recovery target
specified","2018-11-05T23:21:12+08:00"],[28,771752088,"no recovery target
specified","2018-11-06T04:32:40+08:00"],[29,805306520,"no recovery target
specified","2018-11-06T05:22:45+08:00"],[30,822083736,"no recovery target
config(记录的是patroni配置文件的配置信息)
get /service/batman/config
config记录patroni配置文件的配置信息,是持久节点不会因为session到期,删除该key值
language
[zk: localhost:2181(CONNECTED) 28] get /service/batman/config
{"retry_timeout": 10, "postgresql": {"use_slots": true, "use_pg_rewind": true,
"parameters": {"hot_standby": "on", "wal_keep_segments": 8, "wal_level": "hot_standby",
"archive_command": "mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p
../wal_archive/%f", "wal_log_hints": "on", "max_wal_senders": 10, "archive_timeout":
2000, "archive_mode": "on", "max_replication_slots": 10, "max_connections": 300},
"recovery_conf": {"restore_command": "cp ../wal_archive/%f %p"}}, "synchronous_mode":
true, "maximum_lag_on_failover": 1048576, "loop_wait": 10, "max_connection": 300,
"archive_timeout": "2000s", "ttl": 30, "max_connections": 300}
6.zk清理快照
保留最近20个snap文件
./zkCleanup.sh -n 20
使用一些linux命令删除n天前的数据或日志
find /zookeeperData/version-2/ -name "snap*" -mtime +10 | xargs rm -f
find /zookeeperDataLog/version-2/ -name "log*" -mtime +10 | xargs rm -f
find /opt/apache-zookeeper-3.7.1-bin/logs -name "zookeeper.log.*" -mtime +10 | xargs rm --f
配置自动清理日志
从3.4.0开始,会自动清理日志了,所以这个通常不用配置。
配置autopurge.snapRetainCount和autopurge.purgeInterval参数。
保留的snapshop的数量,默认是3个,最小也是3。
language
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
3.4.0之前的版本可以通过zookeeper的配置自行对snap进行管理。如下这三个参数分别表示一个小时清理一次,log的大小(单位是kb)和快照的数量。
autopurge.purgeInterval=1
preAllocSize=131072
snapCount=300000
7.通过下面命令查看zookeeper启动的各个参数,包括java路径等
./bin/zkServer.sh print-cmd
参考链接
1.https://www.modb.pro/db/73762
2.https://www.modb.pro/db/75268
3.https://www.modb.pro/topic/152353
升级