今天一台PostgreSQL 14 数据库服务器容器在主机重启后出现数据目录权限异常报错不能启动,本文记录了问题的处理过程。
一、问题现象
1、应用侧现象:
应用连PostgreSQL 14数据库异常,telnet 5432端口不通。
2、主机侧现象:
该数据库采用容器方式部署
2.1 检查容器运行状态
bash
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
441b9c1e61b4 pg14withtimescale:v14.5-v3 "/sbin/init" 10 months ago Up 17 minutes 0.0.0.0:5432->5432/tcp pgtsdb
检查结果显示容器正常运行中
2.2 检查容器日志
bash
]# docker logs pgtsdb
SELinux: Could not open policy file <= /etc/selinux/targeted/policy/policy.33: No such file or directory
Welcome to CentOS Linux 8!
[ OK ] Reached target Swap.
[ OK ] Set up automount Arbitrary Executab...rmats File System Automount Point.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Listening on Process Core Dump Socket.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Listening on udev Kernel Socket.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Local File Systems.
[ OK ] Reached target Slices.
[ OK ] Reached target Network is Online.
[ OK ] Listening on udev Control Socket.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
Starting Read and set NIS domainname from /etc/sysconfig/network...
Starting udev Coldplug all Devices...
Starting Load/Save Random Seed...
Starting Restore /run/initramfs on shutdown...
Mounting Kernel Debug File System...
Starting Journal Service...
Mounting Kernel Configuration File System...
Starting Create Static Device Nodes in /dev...
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Paths.
Starting Apply Kernel Variables...
[ OK ] Started Load/Save Random Seed.
[ OK ] Started Apply Kernel Variables.
[ OK ] Started Restore /run/initramfs on shutdown.
[ OK ] Started Create Static Device Nodes in /dev.
Starting udev Kernel Device Manager...
[ OK ] Started udev Coldplug all Devices.
[ OK ] Mounted Kernel Debug File System.
[ OK ] Started Read and set NIS domainname from /etc/sysconfig/network.
[ OK ] Mounted Kernel Configuration File System.
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage...
[ OK ] Started Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[ OK ] Started Create Volatile Files and Directories.
Starting Update UTMP about System Boot/Shutdown...
[ OK ] Started Update UTMP about System Boot/Shutdown.
[ OK ] Started udev Kernel Device Manager.
[ OK ] Reached target System Initialization.
[ OK ] Started dnf makecache --timer.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Timers.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
[ OK ] Started D-Bus System Message Bus.
Starting PostgreSQL 14 database server...
Starting Crash recovery kernel arming...
Starting Permit User Sessions...
[ OK ] Started Permit User Sessions.
[ OK ] Started Crash recovery kernel arming.
[FAILED] Failed to start PostgreSQL 14 database server.
See 'systemctl status postgresql-14.service' for details.
[ OK ] Reached target Multi-User System.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Started Update UTMP about System Runlevel Changes.
显示"[FAILED] Failed to start PostgreSQL 14 database server.See 'systemctl status postgresql-14.service' for details.",数据库确实没有正常启动。
2.3 检查数据库服务运行状态
bash
# docker exec -it pgtsdb /bin/bash
[root@441b9c1e61b4 /]# systemctl status postgresql-14.service
● postgresql-14.service - PostgreSQL 14 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2023-12-28 02:40:36 UTC; 19min ago
Docs: https://www.postgresql.org/docs/14/static/
Process: 129 ExecStart=/usr/pgsql-14/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
Process: 117 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 129 (code=exited, status=1/FAILURE)
Dec 28 02:40:36 441b9c1e61b4 systemd[1]: Starting PostgreSQL 14 database server...
Dec 28 02:40:36 441b9c1e61b4 systemd[1]: postgresql-14.service: Main process exited, code=exited, status=1/FAILURE
Dec 28 02:40:36 441b9c1e61b4 systemd[1]: postgresql-14.service: Failed with result 'exit-code'.
Dec 28 02:40:36 441b9c1e61b4 systemd[1]: Failed to start PostgreSQL 14 database server.
[root@441b9c1e61b4 /]#
可以看到主进程启动时退出,数据库服务没能启动。
2.4 查看出错原因
bash
See "systemctl status postgresql-14.service" and "journalctl -xe" for details.
[root@441b9c1e61b4 /]# journalctl -xe
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit systemd-tmpfiles-clean.service has begun starting up.
Dec 28 02:52:25 441b9c1e61b4 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- The unit systemd-tmpfiles-clean.service has successfully entered the 'dead' state.
Dec 28 02:52:25 441b9c1e61b4 systemd[1]: Started Cleanup of Temporary Directories.
-- Subject: Unit systemd-tmpfiles-clean.service has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit systemd-tmpfiles-clean.service has finished starting up.
--
-- The start-up result is done.
Dec 28 02:57:47 441b9c1e61b4 kernel: xfs filesystem being remounted at /run/systemd/unit-root/var/tmp supports timestamps unti>
Dec 28 02:57:47 441b9c1e61b4 kernel: xfs filesystem being remounted at /run/systemd/unit-root/etc supports timestamps until 20>
Dec 28 02:57:47 441b9c1e61b4 kernel: xfs filesystem being remounted at /run/systemd/unit-root/etc supports timestamps until 20>
Dec 28 02:57:47 441b9c1e61b4 kernel: xfs filesystem being remounted at /run/systemd/unit-root/var/tmp supports timestamps unti>
Dec 28 03:04:27 441b9c1e61b4 systemd[1]: Starting PostgreSQL 14 database server...
-- Subject: Unit postgresql-14.service has begun start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit postgresql-14.service has begun starting up.
Dec 28 03:04:27 441b9c1e61b4 postmaster[222]: 2023-12-28 03:04:27.433 UTC [222] FATAL: data directory "/var/lib/pgsql/14/data>
Dec 28 03:04:27 441b9c1e61b4 postmaster[222]: 2023-12-28 03:04:27.433 UTC [222] DETAIL: Permissions should be u=rwx (0700) or>
Dec 28 03:04:27 441b9c1e61b4 systemd[1]: postgresql-14.service: Main process exited, code=exited, status=1/FAILURE
Dec 28 03:04:27 441b9c1e61b4 systemd[1]: postgresql-14.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- The unit postgresql-14.service has entered the 'failed' state with result 'exit-code'.
Dec 28 03:04:27 441b9c1e61b4 systemd[1]: Failed to start PostgreSQL 14 database server.
-- Subject: Unit postgresql-14.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit postgresql-14.service has failed.
--
-- The result is failed.
...skipping...
....
可以看到报在检查数据目录"FATAL: data directory "/var/lib/pgsql/14/data>"后,报"DETAIL: Permissions should be u=rwx (0700) or>"程序即发生异常并退出了,可见问题原因是PostgreSQL 14数据存储目录权限异常所致。
二、解决办法
问题原因是PostgreSQL 14数据存储目录权限异常,非0700,因此作以下处理:
1、修正PostgreSQL 14数据存储目录权限
bash
[root@441b9c1e61b4 /]# chmod 700 -R /var/lib/pgsql/14/data
2、重新启动数据库服务
bash
[root@441b9c1e61b4 /]# systemctl start postgresql-14.service
3、检查数据库服务状态
bash
[root@441b9c1e61b4 /]# systemctl status postgresql-14.service
● postgresql-14.service - PostgreSQL 14 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-12-28 03:07:22 UTC; 7s ago
Docs: https://www.postgresql.org/docs/14/static/
Process: 228 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 233 (postmaster)
Tasks: 14 (limit: 304871)
Memory: 600.7M
CGroup: /docker/441b9c1e61b49e9ef7bb16c066060472984b3b01005aabf6a77539cc1497b250/system.slice/postgresql-14.service
├─233 /usr/pgsql-14/bin/postmaster -D /var/lib/pgsql/14/data/
├─234 postgres: logger
├─236 postgres: checkpointer
├─237 postgres: background writer
├─238 postgres: walwriter
├─239 postgres: autovacuum launcher
├─240 postgres: stats collector
├─241 postgres: TimescaleDB Background Worker Launcher
├─242 postgres: logical replication launcher
├─244 postgres: TimescaleDB Background Worker Scheduler
├─261 postgres: zabbix zabbix 192.168.128.2(41006) idle
├─262 postgres: zabbix zabbix 192.168.128.2(41008) SELECT
├─263 postgres: zabbix zabbix 192.168.128.2(41010) SELECT
└─265 postgres: zabbix zabbix 192.168.128.2(35946) SELECT
Dec 28 03:07:21 441b9c1e61b4 systemd[1]: Starting PostgreSQL 14 database server...
Dec 28 03:07:21 441b9c1e61b4 postmaster[233]: 2023-12-28 03:07:21.871 UTC [233] LOG: redirecting log output to logging collec>
Dec 28 03:07:21 441b9c1e61b4 postmaster[233]: 2023-12-28 03:07:21.871 UTC [233] HINT: Future log output will appear in direct>
Dec 28 03:07:22 441b9c1e61b4 systemd[1]: Started Postgr
数据库服务正常拉起,业务恢复。