PostgreSQL 16 服务器配置与数据库监控终极指南 ---语法、案例与实战
✅ 一、服务器配置概述
PostgreSQL 16 的服务器配置主要通过 配置文件 和 SQL 命令 控制,涵盖连接、资源、日志、查询优化、统计收集等核心模块。
📁 主要配置文件:
postgresql.conf:主配置文件(全局参数)pg_hba.conf:客户端认证配置(Host-Based Authentication)pg_ident.conf:用户映射配置(可选)
⚙️ 配置方式优先级:
- 会话级设置 :
SET LOCAL ...(事务内生效)- 会话级设置 :
SET ...(当前连接生效)- 数据库级设置 :
ALTER DATABASE ... SET ...- 用户级设置 :
ALTER ROLE ... SET ...- postgresql.conf(全局默认)
✅ 二、服务器配置文件详解
2.1 postgresql.conf ------ 主配置文件
📌 位置 :通常位于
$PGDATA/postgresql.conf
📌 语法:
ini# 注释 parameter = value # 字符串值加引号 parameter = 'value' # 数值、布尔值可不加 parameter = 123 parameter = on/off # 布尔值
✅ 案例1:查看和修改配置
bash
# ✅ 查看当前配置文件路径
psql -U postgres -c "SHOW config_file;"
# 输出示例:
# config_file
# ---------------------------------------
# /var/lib/pgsql/16/data/postgresql.conf
# ✅ 查看某参数当前值
psql -U postgres -c "SHOW shared_buffers;"
# ✅ 在SQL中临时修改(会话级)
SET work_mem = '64MB';
SHOW work_mem;
# ✅ 永久修改(需编辑 postgresql.conf)
# 编辑文件:
nano /var/lib/pgsql/16/data/postgresql.conf
# 修改后重载(无需重启):
pg_ctl reload -D /var/lib/pgsql/16/data
# 或使用SQL:
SELECT pg_reload_conf(); -- 返回 true 表示成功
✅ 常用查看命令:
sqlSHOW ALL; -- 查看所有参数 SHOW shared_buffers; -- 查看特定参数 SELECT name, setting, unit FROM pg_settings WHERE name LIKE '%work%';
✅ 三、连接与认证配置(pg_hba.conf)
📌 功能:控制哪些主机、用户、数据库可以连接,以及认证方式。
📌 语法:
# TYPE DATABASE USER ADDRESS METHOD [OPTIONS] host all all 192.168.1.0/24 md5 local mydb myuser peer hostssl all all 0.0.0.0/0 scram-sha-256
✅ 案例2:配置远程连接与认证
bash
# ✅ 编辑 pg_hba.conf
nano /var/lib/pgsql/16/data/pg_hba.conf
# 添加以下行(允许192.168.1网段MD5密码登录)
host all all 192.168.1.0/24 md5
# 允许本地socket使用peer认证(操作系统用户映射)
local all all peer
# 允许所有IP通过SSL+SCRAM-SHA-256登录(生产推荐)
hostssl all all 0.0.0.0/0 scram-sha-256
# ✅ 修改 postgresql.conf 允许监听所有IP
listen_addresses = '*' # 默认是 'localhost'
# ✅ 重启服务使配置生效
sudo systemctl restart postgresql-16
# ✅ 测试远程连接(从192.168.1.100)
psql -h 192.168.1.50 -U myuser -d mydb
✅ 认证方式说明:
peer:本地socket,使用操作系统用户名md5:密码MD5加密(不推荐,易受中间人攻击)scram-sha-256:PostgreSQL 10+ 推荐,更安全的密码认证cert:SSL证书认证reject:拒绝连接
✅ 四、资源消耗配置
📌 关键参数:控制内存、CPU、并发资源使用。
✅ 案例3:配置内存与并发
ini
# postgresql.conf
# 共享内存缓冲区(建议物理内存25%)
shared_buffers = 4GB
# 临时缓冲区(每个会话的临时表使用)
temp_buffers = 32MB
# 每个操作(排序、哈希)可用内存
work_mem = 64MB # 注意:每个操作独立,高并发时总内存 = work_mem * 并发操作数
# 维护操作内存(VACUUM, CREATE INDEX)
maintenance_work_mem = 2GB
# 自动清理内存
autovacuum_work_mem = -1 # -1 表示使用 maintenance_work_mem
# 最大连接数
max_connections = 200
# 超级用户保留连接
superuser_reserved_connections = 3
# 空闲事务超时(防连接泄漏)
idle_in_transaction_session_timeout = 10min
✅ 动态查看资源使用:
sql-- 查看当前连接数 SELECT count(*) FROM pg_stat_activity; -- 查看活跃连接 SELECT pid, usename, application_name, state, query FROM pg_stat_activity WHERE state = 'active'; -- 查看等待的锁 SELECT * FROM pg_locks WHERE granted = false;
✅ 五、预写式日志(WAL)配置
📌 WAL(Write-Ahead Logging):保证数据一致性和崩溃恢复的核心机制。
✅ 案例4:配置 WAL 参数
ini
# postgresql.conf
# WAL 日志级别(replica 支持流复制)
wal_level = replica
# 同步提交(on=强一致,off=高性能,remote_write=折中)
synchronous_commit = on
# WAL 缓冲区大小(默认 -1 = shared_buffers 的 1/32)
wal_buffers = 16MB
# 检查点间隔(时间 or WAL 大小触发)
checkpoint_timeout = 30min
max_wal_size = 4GB
min_wal_size = 1GB
# 归档模式(用于PITR备份)
archive_mode = on
archive_command = 'cp %p /backup/wal/%f' # Linux 示例
# archive_command = 'copy "%p" "C:\\backup\\wal\\%f"' # Windows 示例
✅ 监控 WAL 状态:
sql-- 查看 WAL 写入统计 SELECT * FROM pg_stat_bgwriter; -- 查看当前 WAL LSN(日志序列号) SELECT pg_current_wal_lsn(); -- 查看 WAL 文件大小(需超级用户) SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);
✅ 六、查询规划配置
📌 控制优化器行为,影响执行计划选择。
✅ 案例5:配置查询优化器
ini
# postgresql.conf
# 统计信息目标(越高越精确,但 ANALYZE 越慢)
default_statistics_target = 100
# 随机页成本(SSD 环境降低)
random_page_cost = 1.1 # HDD 默认4.0,SSD 建议1.1
# 顺序页成本
seq_page_cost = 1.0
# CPU 成本
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0025
# 并行查询
max_parallel_workers_per_gather = 4
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1
# JIT 编译(PostgreSQL 11+)
jit = on
jit_above_cost = 100000
✅ 强制使用索引示例:
sql-- 如果优化器错误选择全表扫描,可临时调整成本 SET enable_seqscan = off; -- 强制不使用顺序扫描(调试用,生产慎用!) EXPLAIN SELECT * FROM employees WHERE salary > 80000; SET enable_seqscan = on; -- 恢复默认
✅ 七、错误报告和日志配置
📌 PostgreSQL 16 增强日志功能,便于监控和调试。
✅ 案例6:配置详细日志
ini
# postgresql.conf
# 日志目的地
log_destination = 'stderr'
# 启用日志收集器
logging_collector = on
# 日志目录和文件名
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
# 日志轮转
log_rotation_age = 1d
log_rotation_size = 100MB
# 记录慢查询(>=1秒)
log_min_duration_statement = 1000
# 记录所有 DDL
log_statement = 'ddl' # none, ddl, mod, all
# 记录连接/断开
log_connections = on
log_disconnections = on
# 记录锁等待
log_lock_waits = on
# PostgreSQL 16 新增:记录错误参数
log_parameter_max_length_on_error = 1024
# 记录临时文件(>=0字节)
log_temp_files = 0
# 记录检查点
log_checkpoints = on
✅ 查看日志:
bash# 实时查看日志 tail -f /var/lib/pgsql/16/data/log/postgresql-*.log # 查找慢查询 grep "duration:" /var/lib/pgsql/16/data/log/postgresql-*.log | grep " > 1000ms"
✅ 在SQL中查看日志相关参数:
sqlSHOW log_min_duration_statement; SHOW log_statement;
✅ 八、运行时统计配置
📌 统计收集器:收集数据库活动信息,用于性能分析。
✅ 案例7:启用统计收集
ini
# postgresql.conf
# 启用统计收集器
track_activities = on
track_counts = on
track_io_timing = on # I/O 时间统计(需 pg_stat_statements)
track_functions = all # pl, all, none
# 统计信息保留时间
stats_temp_directory = 'pg_stat_tmp' # 建议放在内存盘(如 /dev/shm/pg_stat_tmp)
✅ 查看统计信息:
sql-- 查看数据库统计 SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit FROM pg_stat_database; -- 查看表统计 SELECT schemaname, tablename, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del FROM pg_stat_user_tables; -- 查看索引使用情况 SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes;
✅ 九、自动清理(Autovacuum)配置
📌 自动清理:回收死元组空间,更新统计信息,防止表膨胀。
✅ 案例8:配置自动清理
ini
# postgresql.conf
# 启用自动清理
autovacuum = on
# 自动分析(更新统计信息)
autoanalyze = on
# 触发自动清理的阈值
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2 # 表变化20%触发
# 触发自动分析的阈值
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1 # 表变化10%触发
# 并行自动清理
autovacuum_max_workers = 6
autovacuum_naptime = 1min
autovacuum_vacuum_cost_limit = 2000
✅ 手动触发清理:
sql-- 清理并分析表 VACUUM ANALYZE employees; -- 仅清理(不锁表) VACUUM employees; -- 强制清理(锁表,重建索引) VACUUM FULL employees; -- 仅分析(更新统计信息) ANALYZE employees;
✅ 监控自动清理状态:
sqlSELECT schemaname, tablename, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE tablename = 'employees';
✅ 十、客户端连接默认配置
📌 控制新连接的默认行为。
✅ 案例9:设置客户端默认值
ini
# postgresql.conf
# 默认事务隔离级别
default_transaction_isolation = 'read committed'
# 默认事务只读
default_transaction_read_only = off
# 默认时区
timezone = 'UTC'
# 默认字符编码
client_encoding = 'UTF8'
# 默认日期风格
datestyle = 'ISO, MDY'
# 默认搜索路径
search_path = '"$user", public'
✅ 在SQL中设置用户/数据库级默认值:
sql-- 为用户设置默认 search_path ALTER ROLE myuser SET search_path = 'myschema, public'; -- 为数据库设置默认 timezone ALTER DATABASE mydb SET timezone = 'Asia/Shanghai'; -- 查看用户设置 SELECT rolname, rolconfig FROM pg_roles WHERE rolname = 'myuser';
✅ 十一、锁管理配置
📌 控制锁等待行为和死锁检测。
✅ 案例10:配置锁参数
ini
# postgresql.conf
# 死锁检测时间(毫秒)
deadlock_timeout = 1000
# 锁等待超时(语句级)
lock_timeout = 0 # 0=无限等待,建议生产设为 5000(5秒)
# 语句超时
statement_timeout = 30s # 30秒超时(防慢查询拖垮系统)
# 空闲事务超时
idle_in_transaction_session_timeout = 10min
✅ 监控锁等待:
sql-- 查看当前锁 SELECT locktype, relation::regclass, mode, granted, pid, query FROM pg_locks l JOIN pg_stat_activity a ON l.pid = a.pid WHERE NOT granted; -- 未授予的锁(等待中) -- 查看阻塞链 SELECT blocked_locks.pid AS blocked_pid, blocked_activity.query AS blocked_query, blocking_locks.pid AS blocking_pid, blocking_activity.query AS blocking_query FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted;
✅ 十二、版本和平台兼容性配置
📌 确保跨版本/平台兼容性。
✅ 案例11:配置兼容性参数
ini
# postgresql.conf
# SQL 标准兼容性
sql_standard_conforming_strings = on
# 旧版本兼容性(如8.3行为)
array_nulls = on
# 日期时间兼容性
DateStyle = 'ISO, MDY'
# 字符串函数兼容性
standard_conforming_strings = on
# 整数除法行为
div_precision_increment = 0
✅ 查看版本信息:
sqlSELECT version(); -- PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605, 64-bit SHOW server_version; SHOW server_version_num;
✅ 十三、监控数据库的活动
📌 实时监控连接、查询、锁、资源使用。
✅ 案例12:监控活跃会话
sql
-- ✅ 查看所有活动会话
SELECT
pid,
usename,
application_name,
client_addr,
backend_start,
state,
state_change,
query,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY state_change DESC;
-- ✅ 查看长时间运行的查询(>5分钟)
SELECT
pid,
now() - query_start as duration,
usename,
query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '5 minutes'
ORDER BY duration DESC;
-- ✅ 查看空闲事务(可能泄漏)
SELECT
pid,
now() - state_change as idle_duration,
usename,
query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY idle_duration DESC;
-- ✅ 终止慢查询(需超级用户)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE now() - query_start > interval '10 minutes'
AND state = 'active';
✅ 十四、配置统计收集器
📌 启用详细统计收集(需重启)。
✅ 案例13:启用 pg_stat_statements(最常用扩展)
sql
-- ✅ 安装扩展(需在 postgresql.conf 中配置 shared_preload_libraries)
-- 先修改 postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- 重启 PostgreSQL
sudo systemctl restart postgresql-16
-- ✅ 创建扩展
CREATE EXTENSION pg_stat_statements;
-- ✅ 查看最耗时的SQL(按总时间排序)
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows,
shared_blks_hit,
shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- ✅ 重置统计信息
SELECT pg_stat_statements_reset();
✅ postgresql.conf 配置:
inishared_preload_libraries = 'pg_stat_statements' pg_stat_statements.track = all # top, all, none pg_stat_statements.max = 10000 # 最多跟踪的SQL数 pg_stat_statements.save = on # 重启后保留统计
✅ 十五、查看收集到的统计信息
📌 系统视图 :
pg_stat_*,pg_statio_*
✅ 案例14:全面统计分析
sql
-- ✅ 数据库级统计
SELECT
datname,
numbackends AS connections,
xact_commit,
xact_rollback,
blks_read,
blks_hit,
pg_size_pretty(blks_read * 8192) AS read_bytes,
pg_size_pretty(blks_hit * 8192) AS hit_bytes,
(blks_hit * 100.0 / (blks_hit + blks_read)) AS hit_ratio
FROM pg_stat_database
WHERE datname NOT IN ('template0', 'template1');
-- ✅ 表级IO统计
SELECT
schemaname,
tablename,
heap_blks_read,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
tidx_blks_hit
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC;
-- ✅ 索引使用率(识别无用索引)
SELECT
schemaname,
tablename,
indexname,
idx_scan AS scans,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 50 -- 使用次数少于50次
ORDER BY pg_relation_size(indexrelid) DESC;
-- ✅ 查询最频繁的表
SELECT
schemaname,
tablename,
seq_scan + idx_scan AS total_scans,
n_tup_ins AS inserts,
n_tup_upd AS updates,
n_tup_del AS deletes
FROM pg_stat_user_tables
ORDER BY total_scans DESC
LIMIT 10;
✅ 十六、监控磁盘的使用
📌 监控表空间、数据库、表、索引的磁盘占用。
✅ 案例15:监控磁盘使用量
sql
-- ✅ 查看数据库大小
SELECT
datname,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- ✅ 查看表空间大小
SELECT
spcname,
pg_size_pretty(pg_tablespace_size(oid)) AS size
FROM pg_tablespace;
-- ✅ 查看当前数据库中所有表大小(含索引)
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename) - pg_relation_size(schemaname || '.' || tablename)) AS index_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
-- ✅ 查看单个表详细大小
SELECT
'table' AS type,
pg_size_pretty(pg_relation_size('employees')) AS size
UNION ALL
SELECT
'indexes',
pg_size_pretty(pg_indexes_size('employees'))
UNION ALL
SELECT
'toast',
pg_size_pretty(pg_total_relation_size('employees') - pg_relation_size('employees') - pg_indexes_size('employees'))
UNION ALL
SELECT
'total',
pg_size_pretty(pg_total_relation_size('employees'));
-- ✅ 查看最大表(整个实例)
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;
✅ 十七、磁盘满导致的失效
📌 磁盘满会导致:
- 无法写入WAL → 数据库崩溃
- 无法创建临时文件 → 查询失败
- 无法写入日志 → 无法诊断问题
✅ 案例16:模拟和处理磁盘满
sql
-- ✅ 检查磁盘空间(SQL内无法直接查,需结合OS命令)
-- 在操作系统执行:
df -h /var/lib/pgsql/16/data
-- ✅ 如果磁盘满,紧急处理:
-- 1. 清理 pg_log(如果日志在数据目录)
rm /var/lib/pgsql/16/data/log/postgresql-*.log
-- 2. 清理 pg_wal/archive_status(如果归档卡住)
rm /var/lib/pgsql/16/data/pg_wal/archive_status/*.ready
-- 3. 扩大磁盘或迁移表空间(长期方案)
-- ✅ 在SQL中监控表膨胀(可能导致磁盘满)
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
n_dead_tup,
n_live_tup,
(n_dead_tup * 100.0 / (n_live_tup + n_dead_tup + 1)) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
-- ✅ 清理膨胀表
VACUUM FULL VERBOSE ANALYZE employees; -- 锁表,慎用!
-- 或使用 pg_repack(在线清理,需安装扩展)
✅ 预防措施:
- 设置监控告警(磁盘>80%告警)
- 定期
VACUUM和REINDEX- 使用
pg_cron定期清理日志- 分离 WAL、日志、数据到不同磁盘
✅ 十八、综合实战案例
🎯 案例1:一键生成数据库健康报告
sql
-- 创建健康检查函数
CREATE OR REPLACE FUNCTION get_db_health_report()
RETURNS TABLE(
category TEXT,
item TEXT,
value TEXT,
recommendation TEXT
) AS $$
BEGIN
-- 连接数检查
RETURN QUERY
SELECT
'Connections'::TEXT,
'Active Connections'::TEXT,
count(*)::TEXT,
CASE
WHEN count(*) > (SELECT setting::INT * 0.8 FROM pg_settings WHERE name = 'max_connections')
THEN '接近最大连接数,考虑增加 max_connections 或优化应用'
ELSE '正常'
END
FROM pg_stat_activity WHERE state = 'active';
-- 锁等待检查
RETURN QUERY
SELECT
'Locks'::TEXT,
'Blocking Sessions'::TEXT,
count(*)::TEXT,
CASE
WHEN count(*) > 0 THEN '存在阻塞会话,需排查'
ELSE '正常'
END
FROM pg_locks WHERE NOT granted;
-- 表膨胀检查
RETURN QUERY
SELECT
'Bloat'::TEXT,
tablename::TEXT,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))::TEXT,
'建议执行 VACUUM FULL 或使用 pg_repack'
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
AND (n_dead_tup * 100.0 / (n_live_tup + n_dead_tup + 1)) > 20
ORDER BY n_dead_tup DESC
LIMIT 5;
-- 无用索引检查
RETURN QUERY
SELECT
'Indexes'::TEXT,
indexname::TEXT,
'使用次数: ' || idx_scan::TEXT,
'考虑删除此索引以节省空间和写性能'
FROM pg_stat_user_indexes
WHERE idx_scan < 50
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 5;
-- 磁盘空间检查(需超级用户)
RETURN QUERY
SELECT
'Disk'::TEXT,
'Database Size'::TEXT,
pg_size_pretty(pg_database_size(current_database()))::TEXT,
CASE
WHEN pg_database_size(current_database()) > 100*1024^3 THEN '数据库超过100GB,考虑分区或归档'
ELSE '正常'
END;
END;
$$ LANGUAGE plpgsql;
-- ✅ 执行健康检查
SELECT * FROM get_db_health_report();
🎯 案例2:自动化磁盘监控脚本
bash
#!/bin/bash
# disk_monitor.sh
PGDATA="/var/lib/pgsql/16/data"
THRESHOLD=80 # 80% 告警
ADMIN_EMAIL="admin@company.com"
# 获取磁盘使用率
USAGE=$(df $PGDATA | tail -1 | awk '{print $5}' | sed 's/%//')
echo "PostgreSQL 数据目录磁盘使用率: ${USAGE}%"
if [ $USAGE -gt $THRESHOLD ]; then
echo "⚠️ 警告:磁盘使用率超过 ${THRESHOLD}%!" >&2
# 发送邮件告警
echo "PostgreSQL 磁盘空间不足!当前使用率: ${USAGE}%" | \
mail -s "【紧急】PostgreSQL 磁盘告警" $ADMIN_EMAIL
# 记录到日志
echo "$(date): 磁盘使用率 ${USAGE}%,超过阈值 ${THRESHOLD}%" >> /var/log/pg_disk_alert.log
# 尝试清理(示例)
echo "正在清理旧日志..."
find $PGDATA/log -name "postgresql-*.log" -mtime +7 -delete
# 检查大表
psql -U postgres -c "
SELECT
schemaname || '.' || tablename as table_name,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 5;
"
else
echo "✅ 磁盘使用率正常"
fi
✅ 设置定时任务:
bash# 每小时检查一次 crontab -e 0 * * * * /path/to/disk_monitor.sh
🎯 案例3:完整 PostgreSQL 16 生产配置
ini
# 🚀 postgresql.conf - PostgreSQL 16 生产环境最佳配置
# 连接
listen_addresses = '*'
port = 5432
max_connections = 300
superuser_reserved_connections = 5
idle_in_transaction_session_timeout = 10min
# 认证(pg_hba.conf 配置)
# hostssl all all 0.0.0.0/0 scram-sha-256
# 内存
shared_buffers = 8GB # 25% of 32GB RAM
effective_cache_size = 24GB # 75% of 32GB RAM
work_mem = 64MB # 300 connections * 64MB = 19.2GB (max)
maintenance_work_mem = 2GB
autovacuum_work_mem = -1
# WAL
wal_level = replica
synchronous_commit = on
wal_buffers = 16MB
checkpoint_timeout = 30min
max_wal_size = 4GB
min_wal_size = 1GB
archive_mode = on
archive_command = 'test ! -f /backup/wal/%f && cp %p /backup/wal/%f'
# 查询优化
random_page_cost = 1.1 # SSD
effective_io_concurrency = 200
default_statistics_target = 500
max_parallel_workers_per_gather = 4
jit = on
# 日志
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_min_duration_statement = 1000 # 1秒慢查询
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_parameter_max_length_on_error = 1024 # PG16
# 统计
track_activities = on
track_counts = on
track_io_timing = on
stats_temp_directory = '/dev/shm/pg_stat_tmp' # RAM disk
# 自动清理
autovacuum = on
autovacuum_max_workers = 6
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
# 锁与超时
deadlock_timeout = 1s
lock_timeout = 5s
statement_timeout = 30s
# 扩展
shared_preload_libraries = 'pg_stat_statements,pg_cron'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
pg_stat_statements.save = on
✅ 十九、常见问题及解答(FAQ)
❓ 疑问1:当服务器配置出现冲突时,采用什么方式的优先级?
答案 :PostgreSQL 配置遵循 "就近原则",优先级从高到低:
- 会话内设置 (
SET LOCAL)→ 仅当前事务有效- 会话设置 (
SET)→ 当前连接有效- 数据库设置 (
ALTER DATABASE ... SET ...)- 用户设置 (
ALTER ROLE ... SET ...)- postgresql.conf → 全局默认值
✅ 案例演示优先级:
sql
-- ✅ 1. 查看全局默认值
SHOW work_mem; -- 假设 postgresql.conf 中设置为 '4MB'
-- ✅ 2. 用户级设置(需超级用户)
ALTER ROLE myuser SET work_mem = '32MB';
-- 重新连接后:
SHOW work_mem; -- 显示 '32MB'
-- ✅ 3. 数据库级设置(覆盖用户设置)
ALTER DATABASE mydb SET work_mem = '64MB';
-- 重新连接 mydb 后:
SHOW work_mem; -- 显示 '64MB'
-- ✅ 4. 会话级设置(覆盖所有)
SET work_mem = '128MB';
SHOW work_mem; -- 显示 '128MB'
-- ✅ 5. 事务级设置(最高优先级)
BEGIN;
SET LOCAL work_mem = '256MB';
SHOW work_mem; -- 显示 '256MB'
COMMIT;
SHOW work_mem; -- 恢复为 '128MB'
-- ✅ 6. 重置为默认
RESET work_mem;
SHOW work_mem; -- 恢复为数据库级 '64MB'
✅ 查看参数来源:
sqlSELECT name, setting, source FROM pg_settings WHERE name = 'work_mem'; -- source 列显示:configuration file, database, user, session, override 等
📌 黄金法则 :
越具体的设置优先级越高!事务 > 会话 > 数据库 > 用户 > 全局
❓ 疑问2:为什么有时候磁盘没有写满性能仍然很低?
答案:磁盘性能瓶颈 ≠ 磁盘空间不足!常见原因:
- I/O 等待高:磁盘响应慢(HDD、SSD老化、RAID降级)
- 缓冲区命中率低 :
shared_buffers太小,频繁读磁盘- 表膨胀:死元组占用空间,扫描效率低
- 索引失效:统计信息过期,优化器选错计划
- 锁竞争:大量行锁/表锁导致等待
- WAL 写入瓶颈 :
synchronous_commit=on且磁盘慢
✅ 案例:诊断性能问题
sql
-- ✅ 1. 检查缓冲区命中率(应>90%)
SELECT
sum(blks_hit) * 100.0 / (sum(blks_hit) + sum(blks_read)) AS hit_ratio
FROM pg_stat_database;
-- ✅ 2. 检查表膨胀
SELECT
schemaname,
tablename,
n_dead_tup,
(n_dead_tup * 100.0 / (n_live_tup + n_dead_tup + 1)) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_ratio DESC;
-- ✅ 3. 检查I/O等待(需 track_io_timing=on)
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM large_table WHERE condition;
-- 查看 "I/O Timings: read=xxx" 是否很高
-- ✅ 4. 检查锁等待
SELECT * FROM pg_locks WHERE NOT granted;
-- ✅ 5. 检查慢查询
SELECT
query,
total_time,
calls,
mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
-- ✅ 6. 检查WAL写入(后台写入器统计)
SELECT
checkpoints_timed,
checkpoints_req,
checkpoint_write_time,
checkpoint_sync_time
FROM pg_stat_bgwriter;
-- checkpoint_sync_time 高表示磁盘同步慢
✅ 解决方案:
I/O慢 :升级到NVMe SSD,调整
random_page_cost=1.1缓冲区命中低 :增大
shared_buffers,预热关键表表膨胀 :定期
VACUUM,调整autovacuum参数索引失效 :定期
ANALYZE,调整default_statistics_target锁竞争 :优化事务,减少长事务,设置
lock_timeoutWAL瓶颈 :考虑
synchronous_commit=off(可接受少量丢失),或使用更快磁盘
📌 关键指标:缓冲区命中率 > 95%
表膨胀率 < 10%
平均查询时间 < 100ms
锁等待次数 = 0
✅ 二十、监控最佳实践总结
-
配置先行:
- 合理设置
postgresql.conf和pg_hba.conf - 启用
pg_stat_statements和详细日志
- 合理设置
-
实时监控:
- 使用
pg_stat_activity监控活跃会话 - 使用
pg_locks监控锁等待 - 设置
statement_timeout防止慢查询拖垮系统
- 使用
-
定期检查:
- 每日检查磁盘空间
- 每周检查表膨胀和索引使用率
- 每月审查慢查询日志
-
自动化告警:
- 磁盘空间 > 80% 告警
- 连接数 > 80% 告警
- 锁等待 > 30秒 告警
-
PostgreSQL 16 新特性:
- 利用
log_parameter_max_length_on_error增强错误诊断 - 使用更详细的统计信息优化查询
- 利用
🚀 终极口诀 :
配置要合理,监控要实时,告警要及时,优化要持续!
📚 建议结合 Prometheus + Grafana + pg_exporter 构建可视化监控大盘!