文章目录
环境
系统平台:银河麒麟 (X86_64),Linux x86-64 Red Hat Enterprise Linux 8,Linux x86-64 Red Hat Enterprise Linux 7
版本:9.0.4,4.5.10,4.5.8,9.0.3
文档用途
通过系统化的数据库日常检查,主动发现并解决数据库潜在问题,确保其高可用性、高性能及数据安全性,降低因数据库问题导致的业务中断风险。
详细信息
1. 磁盘空间
查看总体磁盘使用率。
df -h
文件系统 容量 已用 可用 已用% 挂载点
devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 128K 7.7G 1% /dev/shm
tmpfs 7.7G 27M 7.7G 1% /run
"已用%"超过85%需预警。提前进行磁盘扩容。
检查数据库相关文件占用的磁盘空间大小。包括数据文件、备份、归档文件。
du -sm
- 数据目录:数据库参数data_directory定义,可查看此参数。通常会使用变量 PGADA定义,也可查看此变量。会使用变量PGADA定义,可查看此变量。
- 备份目录:查看备份脚本确认
- 归档目录:数据库参数archive_command定义,可查看此参数
2. 数据库状态
sql
pg_ctl status
pg_ctl: 正在运行服务器进程(PID: 2129)
/opt/highgo/hgdb-see-4.5.8/bin/postgres
如果有一个服务器正在运行,其PID和用来调用它的命令行选项将被显示。如果没有正在运行则返回空。
3. 集群状态
查看集群状态
sql
hghactl list
+ Cluster: hgdb ---------------+--------------+---------+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
+--------+---------------------+--------------+---------+----+-----------+-----------------+
| hgdb01 | xxx:5866 | Sync Standby | running | 15 | 0 | * |
| hgdb02 | xxx:5866 | Replica | running | 15 | 0 | * |
| hgdb03 | xxx:5866 | Leader | running | 15 | | * |
+--------+---------------------+--------------+---------+----+-----------+-----------------+
- Member: 集群节点名
- Host:集群节点的IP地址和端口号
- Role:节点在集群中的角色。Leader表示主库。Replica表示从库
- State:节点状态。正常运行状态是running。备库状态可能是streaming
注意:其他状态需预警,检查集群是否存在问题。 - Lag in MB: 节点数据同步延迟信息。正常状态是0。
注意:出现延迟需要预警。检查网络是否正常。如果延迟持续增加,检查数据是否存在问题。
4. ETCD状态(集群存储)
sql
etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://xxx:2379 | fdd02f4686cd14c1 | 3.5.9 | 34 MB | false | false | 10 | 314919 | 314919 | |
| http://xxx:2379 | c1172683a2531908 | 3.5.9 | 34 MB | true | false | 10 | 314919 | 314919 | |
| http://xxx:2379 | dcd245c7752fe1b5 | 3.5.9 | 34 MB | false | false | 10 | 314919 | 314919 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
- ENDPOINT:ETCD节点的地址。
- ID:ETCD:节点标识
- DB SIZE:ETCD节点存储大小。
注意:节点存储大小不要超过8GB。大小接近8GB时需要预警。 - IS LEADER:ETCD节点角色。true表示ETCD的领导者。
5. 备份检查
查看备份计划任务。
crontab -l
查看备份路径
检查计划任务中的备份脚本。脚本开头的变量backup_db_cluster指定了备份路径
检查备份状态
备份路径下,会存在备份和备份日志。备份日志以log为后缀。查看备份日志确认备份状态。
sql
tail -f 日志.log
#eg:
tail -f hgdbbak_2025xxxx_1.log
...
3827933/3827933 kB (100%), 3/3 多个表空间
pg_basebackup: 预写日志结束点: E1/C9000220
pg_basebackup: 等待后台进程结束流操作...
pg_basebackup: 同步数据到磁盘...
pg_basebackup: 基础备份已完成
2025-xx-xx 11:22:33 The name of the backup file is :hgdbbak_20251021_1,the name of the archive is:00000001000000E1000000C9.00000028.backup
...
- 基础备份已完成(base backup complete): 备份正常结束标识。
注意:没有此标识需预警。备份可能不正常。 - The name of the backup file is:显示本次备份文件的名字
6. 内存检查
检查服务器内存使用情况。
free -h
total used free shared buff/cache available
Mem: 15G 7.1G 720M 252M 7.6G 6.4G
Swap: 3.8G 2.2M 3.8G
- available:可用内存大小。
注意:available大小不要低于2GB。低于2GB或物理内存10%时需要预警。
7. 数据库连接数
检查当前数据库的连接情况,包括总连接数、当前连接数。通过检查判断是否需要增大最大连接数上限,避免连接数满影响业务正常运行。
sql
select max_conn,
max_conn - now_conn as resi_conn,
now_conn
from ( select setting::int8 as max_conn,
(select count(*) from pg_stat_activity) as now_conn
from pg_settings
where name = 'max_connections'
) t;
max_conn | resi_conn | now_conn
----------+-----------+----------
1000 | 990 | 10
- max_conn:最大连接数
- resi_conn:剩余连接数
- now_conn:当前连接数
8. 表空间大小
查询表空间大小。对比表空间和磁盘空间,表空间占比过大时需提前进行磁盘扩容。
sql
SELECT
tbs.spcname,
pg_tablespace_size(tbs.oid)/1024/1024 as used_bytes,
CASE
WHEN tbs.spcname = $$pg_default$$ THEN
(SELECT current_setting($$data_directory$$))
WHEN tbs.spcname = $$pg_global$$ THEN
(SELECT current_setting($$data_directory$$))
ELSE pg_tablespace_location(tbs.oid)
END AS location
FROM pg_tablespace tbs;
spcname | used_bytes | location
------------+------------+---------------------------------
pg_default | 3692 | /opt/highgo/hgdb-see-4.5.8/data
pg_global | 0 | /opt/highgo/hgdb-see-4.5.8/data
tbs1 | 16 | /opt/tbs
tbs2 | 0 | /opt/tbs2
- spcname:表空间名
- used_bytes:表空间使用大小(单位:MB)
- location:表空间路径
9. 数据库用户有效期
查询用户密码有效期,避免密码到期。应用用户登陆。
sql
select usename,valuntil from pg_user;
usename | valuntil
------------+------------------------
testuser | 2025-12-31 00:00:00+08
sysdba | infinity
- usename:用户名
- valuntil:密码过期时间。infinity表示永久有效
10. 归档日志
检查归档日志是否正常。如果归档失败,可能造成WAL日志积压。造成磁盘空间异常增长。
sql
select pg_walfile_name(pg_current_wal_lsn()) now_wal, * from pg_stat_archiver;
-[ RECORD 1 ]------+-----------------------------------------
now_wal | 00000001000000E1000000CA
archived_count | 36
last_archived_wal | 00000001000000E1000000C9.00000028.backup
last_archived_time | 2025-xx-xx 11:22:33.084855+08
failed_count | 0
last_failed_wal |
last_failed_time |
stats_reset | 2025-xx-xx 09:41:52.222+08
- last_archived_time: 最近归档时间。
- last_failed_time: 最近归档失败时间。
11. 数据库年龄
查询数据库年龄,剩余年龄百分比。
sql
select datname,
datfrozenxid,
age(datfrozenxid),
round((2 ^ 31 - age(datfrozenxid))::numeric / 2 ^ 31::numeric * 100,2) age_remain_percent,
current_setting($$autovacuum_freeze_max_age$$)
from pg_database
order by age(datfrozenxid) desc;
- datname:数据库名
- datfrozenxid:数据库冻结xid
- age(datfrozenxid):数据库年龄
- age_remain_percent:数据库剩余年龄百分比
注意:剩余年龄百分少于50%时需预警。排查是否有长事务阻止年龄回收。 - autovacuum_freeze_max_age:自动Vacuum冻结最大年龄
12. 表膨胀
检查表膨胀情况。表膨胀会导致查询性能下降和空间浪费,建议及时进行表膨胀检查和优化。
sql
SELECT
current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta,
ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs*(relpages-otta))::bigint || $$ bytes$$ END AS wastedsize,
iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta,
ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,
CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs*(ipages-iotta))::bigint || $$ bytes$$ END AS wastedisize,
CASE WHEN relpages < otta THEN
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta::bigint) END
ELSE CASE WHEN ipages < iotta THEN bs*(relpages-otta::bigint)
ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint) END
END AS totalwastedbytes
FROM (
SELECT
nn.nspname AS schemaname,
cc.relname AS tablename,
COALESCE(cc.reltuples,0) AS reltuples,
COALESCE(cc.relpages,0) AS relpages,
COALESCE(bs,0) AS bs,
COALESCE(CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta,
COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM
pg_class cc
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$
LEFT JOIN
(
SELECT
ma,bs,foo.nspname,foo.relname,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
ns.nspname, tbl.relname, hdr, ma, bs,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
) AS nullhdr
FROM pg_attribute att
JOIN pg_class tbl ON att.attrelid = tbl.oid
JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname,
(
SELECT
(SELECT current_setting($$block_size$$)::numeric) AS bs,
CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$)
IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
WHERE att.attnum > 0 AND tbl.relkind=$$r$$
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
ON cc.relname = rs.relname AND nn.nspname = rs.nspname
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml order by wastedbytes desc limit 5;
-[ RECORD 1 ]----+-----------------------------
db | testdb
schemaname | tuser
tablename | test_dump
tups | 2199979
pages | 25000
otta | 20460
tbloat | 1.2
wastedpages | 4540
wastedbytes | 37191680
wastedsize | 37191680 bytes
iname | idx_test_dump_id
itups | 2199979
ipages | 6043
iotta | 15345
ibloat | 0.4
wastedipages | 0
wastedibytes | 0
wastedisize | 0 bytes
totalwastedbytes | 37191680
- tbloat:表膨胀比例。
注意:比例超过5,说明膨胀严重。建议及时处理。
wastedsize:表膨胀导致浪费的磁盘空间大小。
13. 数据库日志查看
数据库相关的操作记录,告警(WARNING),错误(ERROR)都会记录到日志中。查看日志确认是否存在异常情况。
- 数据库日志位置:参数log_directory定义了日志位置。通常位于$PGDATA/.../hgdb_log。
- 数据库日志名字:参数log_filename定义了日志名字。通常以csv为后缀,以highgodb为前缀。如/data/highgo/hgdb_log/highgodb_5.csv。
日志文件是文本格式,可通过文本编辑器查看。关注日志中的告警(WARNING),错误(ERROR)信息。
获取更多支持请联系瀚高400技术支持热线(400-708-8006转3)。