前言
💡 痛点: 慢查询优化无从下手?分区表选型困难?MVCC 膨胀导致性能急剧下降?流复制数据不一致?
🎯 解决方案: 从 MVCC 原理→执行计划→索引策略→高级特性→生产运维,系统掌握 PostgreSQL 16。
#mermaid-svg-t50T0y9J06snvbzQ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-t50T0y9J06snvbzQ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-t50T0y9J06snvbzQ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-t50T0y9J06snvbzQ .error-icon{fill:#552222;}#mermaid-svg-t50T0y9J06snvbzQ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-t50T0y9J06snvbzQ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-t50T0y9J06snvbzQ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-t50T0y9J06snvbzQ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-t50T0y9J06snvbzQ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-t50T0y9J06snvbzQ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-t50T0y9J06snvbzQ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-t50T0y9J06snvbzQ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-t50T0y9J06snvbzQ .marker.cross{stroke:#333333;}#mermaid-svg-t50T0y9J06snvbzQ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-t50T0y9J06snvbzQ p{margin:0;}#mermaid-svg-t50T0y9J06snvbzQ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-t50T0y9J06snvbzQ .cluster-label text{fill:#333;}#mermaid-svg-t50T0y9J06snvbzQ .cluster-label span{color:#333;}#mermaid-svg-t50T0y9J06snvbzQ .cluster-label span p{background-color:transparent;}#mermaid-svg-t50T0y9J06snvbzQ .label text,#mermaid-svg-t50T0y9J06snvbzQ span{fill:#333;color:#333;}#mermaid-svg-t50T0y9J06snvbzQ .node rect,#mermaid-svg-t50T0y9J06snvbzQ .node circle,#mermaid-svg-t50T0y9J06snvbzQ .node ellipse,#mermaid-svg-t50T0y9J06snvbzQ .node polygon,#mermaid-svg-t50T0y9J06snvbzQ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-t50T0y9J06snvbzQ .rough-node .label text,#mermaid-svg-t50T0y9J06snvbzQ .node .label text,#mermaid-svg-t50T0y9J06snvbzQ .image-shape .label,#mermaid-svg-t50T0y9J06snvbzQ .icon-shape .label{text-anchor:middle;}#mermaid-svg-t50T0y9J06snvbzQ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-t50T0y9J06snvbzQ .rough-node .label,#mermaid-svg-t50T0y9J06snvbzQ .node .label,#mermaid-svg-t50T0y9J06snvbzQ .image-shape .label,#mermaid-svg-t50T0y9J06snvbzQ .icon-shape .label{text-align:center;}#mermaid-svg-t50T0y9J06snvbzQ .node.clickable{cursor:pointer;}#mermaid-svg-t50T0y9J06snvbzQ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-t50T0y9J06snvbzQ .arrowheadPath{fill:#333333;}#mermaid-svg-t50T0y9J06snvbzQ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-t50T0y9J06snvbzQ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-t50T0y9J06snvbzQ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-t50T0y9J06snvbzQ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-t50T0y9J06snvbzQ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-t50T0y9J06snvbzQ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-t50T0y9J06snvbzQ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-t50T0y9J06snvbzQ .cluster text{fill:#333;}#mermaid-svg-t50T0y9J06snvbzQ .cluster span{color:#333;}#mermaid-svg-t50T0y9J06snvbzQ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-t50T0y9J06snvbzQ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-t50T0y9J06snvbzQ rect.text{fill:none;stroke-width:0;}#mermaid-svg-t50T0y9J06snvbzQ .icon-shape,#mermaid-svg-t50T0y9J06snvbzQ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-t50T0y9J06snvbzQ .icon-shape p,#mermaid-svg-t50T0y9J06snvbzQ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-t50T0y9J06snvbzQ .icon-shape .label rect,#mermaid-svg-t50T0y9J06snvbzQ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-t50T0y9J06snvbzQ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-t50T0y9J06snvbzQ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-t50T0y9J06snvbzQ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 调优工具
PostgreSQL 架构
复制
存储引擎
PostgreSQL
Master(写)
MVCC 机制
B+树/BRIN/GIN/GiST
WAL 日志 (16MB)
流复制
Logical Replication
(Publish/Subscribe)
Replica
(只读)
Subscriber
(异步)
EXPLAIN ANALYZE
VACUUM/AUTOVACUUM
PostgreSQL 16 新特性速览:
| 特性 | 说明 | 性能/功能影响 |
|---|---|---|
| Logical Replication 改进 | 发布端支持过滤(WHERE)+ 订阅端并行 apply | 更灵活的复制策略 |
| 性能提升 | 聚合查询 2x-5x 加速,Vacuum 更少 | CPU 效率提升 |
| pg_stat_io | 监控 I/O 访问模式 | 运维诊断增强 |
| 批量 COPY 改进 | COPY FROM 并行写入 | 数据导入加速 |
| 全表并行扫描 | 全表扫描也可并行 | 分析查询提升 |
| ICU 排序 | 更可靠的排序规则 | 多语言支持 |
一、MVCC 与清理机制
1.1 事务 ID 与快照
sql
-- ===== PostgreSQL MVCC 原理 =====
/*
PostgreSQL MVCC 不使用 undo log,而是在每行保留多个版本
每行数据有两个隐藏列:
xmin: 插入该行的事务 ID
xmax: 删除该行的事务 ID(或更新操作的旧版本)
特性:
1. 读不阻塞写,写不阻塞读
2. 通过 Tuple 版本来实现隔离
3. 清理依赖 VACUUM(无 undo 所以不能回滚旧版本到 undo 中)
*/
-- ===== 事务快照 =====
-- 查看当前事务信息
SELECT txid_current(); -- 当前事务 ID
SELECT txid_snapshot_xmin(txid_current_snapshot()); -- 最旧活跃事务
SELECT txid_snapshot_xmax(txid_current_snapshot()); -- 下一个分配的事务
-- 事务状态
SELECT * FROM pg_xact_status(100); -- 查看事务 100 的状态
-- ===== 元组可见性判断 =====
/*
可见性规则(默认 READ COMMITTED):
1. 当前行的 xmin < 当前事务 ID,且 xmin 不在活跃事务列表 → 可见
2. 当前行的 xmax < 当前事务 ID,且 xmax 不在活跃事务列表 → 已删除
3. 行级锁通过 xmax 上的组合锁机制实现
*/
-- ===== 事务隔离级别 =====
-- 查看当前隔离级别
SHOW transaction_isolation; -- read committed(默认)
-- 设置可重复读
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- 不同之处:
-- RC: 每条 SQL 重新获取快照
-- RR: 整个事务使用首次 SQL 的快照(PostgreSQL 实现了真正的序列化快照隔离 SSI)
COMMIT;
-- 串行化(PostgreSQL 的 Serializable Snapshot Isolation)
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- 使用谓词锁跟踪读写依赖,检测到冲突自动回滚
-- 适合发券、库存高度并发场景
COMMIT;
1.2 VACUUM 与膨胀控制
sql
-- ===== 表膨胀诊断 =====
-- 查看表的死元组和活元组比例
SELECT
relname AS table_name,
n_live_tup AS live_tuples,
n_dead_tup AS dead_tuples,
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY dead_ratio DESC;
-- 查看表的大小和可见性映射
SELECT
relname,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_table_size(relid)) AS table_size,
pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
-- ===== VACUUM 操作 =====
-- 标准 VACUUM(回收空间但不缩小文件)
VACUUM orders;
-- VACUUM 并分析
VACUUM ANALYZE orders;
-- VACUUM FULL(回收 + 缩小文件,但加排他锁,生产慎用)
VACUUM FULL orders;
-- 查看 autovacuum 配置
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name LIKE 'autovacuum%';
-- 查看 autovacuum 队列
SELECT
relname,
age(relfrozenxid) AS xid_age,
pg_size_pretty(pg_total_relation_size(relid)) AS size
FROM pg_class
WHERE relkind = 'r'
AND age(relfrozenxid) > 1000000000; -- 超过 10 亿,需要紧急冻结
-- ===== 手动触发 autovacuum 参数调优 =====
-- autovacuum 配置建议
/*
-- postgresql.conf
# 开启 autovacuum(默认已开启)
autovacuum = on
# 触发 VACUUM 的阈值(表中有多少死元组?)
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.05 -- 表大小 × 5% + 阈值 = 触发值
# 对于大表,scale_factor 太大会导致频繁 VACUUM
# 建议对 10GB 以上大表单独设置:
ALTER TABLE big_orders SET (autovacuum_vacuum_scale_factor = 0.01);
ALTER TABLE big_orders SET (autovacuum_vacuum_threshold = 100);
# autovacuum 并发控制
autovacuum_max_workers = 3 # 最大工作进程数
autovacuum_naptime = 60 # 检查间隔(秒)
autovacuum_vacuum_cost_limit = 200 # 每秒 I/O 配额
*/
-- ===== VACUUM 性能监控 =====
-- 查看 autovacuum 正在处理哪个表
SELECT *
FROM pg_stat_progress_vacuum;
-- 查看进度信息
SELECT
relname,
phase,
heap_blks_total,
heap_blks_scanned,
heap_blks_vacuumed,
index_vacuum_count,
max_dead_tuples,
num_dead_tuples
FROM pg_stat_progress_vacuum pv
JOIN pg_class c ON pv.oid = c.relfilenode;
二、EXPLAIN 执行计划
2.1 执行计划解读
sql
-- ===== EXPLAIN ANALYZE 详解 =====
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT o.order_id, u.username, o.amount, o.created_at
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed'
AND o.created_at >= '2024-01-01'
ORDER BY o.amount DESC
LIMIT 100;
-- 输出示例:
-- Limit (cost=1234.56..1289.01 rows=100 width=64) (actual time=45.67..48.12 rows=100 loops=1)
-- -> Sort (cost=1234.56..1345.67 rows=44444 width=64) (actual time=45.67..47.89 rows=100 loops=1)
-- Sort Key: o.amount DESC
-- Sort Method: top-N heapsort Memory: 48kB
-- -> Hash Join (cost=567.89..1234.56 rows=44444 width=64) (actual time=10.23..34.56 rows=44444 loops=1)
-- Hash Cond: (o.user_id = u.id)
-- -> Index Scan using idx_orders_status_date on orders o
-- (cost=0.56..667.89 rows=44444 width=48)
-- (actual time=0.12..15.67 rows=44444 loops=1)
-- Index Cond: ((status = 'completed'::text) AND (created_at >= '2024-01-01'::date))
-- -> Hash (cost=456.78..456.78 rows=10000 width=20) (actual time=5.67..5.67 rows=10000 loops=1)
-- Buckets: 16384 Batches: 1 Memory Usage: 1024kB
-- -> Seq Scan on users u (cost=0.00..456.78 rows=10000 width=20)
-- (actual time=0.01..4.56 rows=10000 loops=1)
--
-- Planning Time: 0.45 ms
-- Execution Time: 48.12 ms
--
-- Buffers: shared hit=234 read=56
-- hit: 从共享缓冲区命中(快,无需磁盘 I/O)
-- read: 从磁盘读取(慢,需优化)
-- ===== 关键信息解读 =====
/*
cost: 启动成本..总成本(优化器估算)
actual time: 实际执行时间(第一行..最后一行)
rows: 实际返回行数 / 估算行数
loops: 该节点循环执行次数
Sort Method:
- top-N heapsort: 排序内存足够(优,只保留 top N)
- quicksort: 排序内存足够(优)
- external merge: 排序超出 work_mem,使用磁盘(差)
Join 类型:
- Hash Join: 建哈希表(等值 JOIN,优)
- Merge Join: 排序后合并(有序,优)
- Nested Loop: 嵌套循环(小表驱动大表,优)
- Nested Loop + Materialize: 内表物化,回放(一般)
Scan 类型:
- Index Scan: 索引扫描,回表(优)
- Index Only Scan: 覆盖索引,不回表(最优)
- Bitmap Heap/Index Scan: 位图扫描(适合大量匹配行)
- Seq Scan: 全表扫描(差,需优化)
*/
2.2 常见优化场景
sql
-- ===== 案例1: 查询条件不满足索引 =====
-- ❌ 慢查询:Seq Scan on orders
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
AND status = 'completed';
-- ✅ 添加联合索引(等值条件在前,范围条件在后)
CREATE INDEX idx_orders_status_date ON orders(status, created_at);
-- 再次执行 → Index Scan,时间从 5s → 10ms
-- ===== 案例2: LIMIT + ORDER BY 优化(延迟关联)=====
-- ❌ 深分页:扫描大量行再排序
SELECT * FROM orders
WHERE status = 'completed'
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
-- ✅ 使用游标分页(记住上一页最后 id 或时间戳)
SELECT * FROM orders
WHERE status = 'completed'
AND created_at < '2024-01-15 12:00:00' -- 上一页最后时间
ORDER BY created_at DESC
LIMIT 20;
-- ✅ 使用延迟关联(子查询先走覆盖索引)
SELECT o.* FROM orders o
INNER JOIN (
SELECT id FROM orders
WHERE status = 'completed'
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000
) AS t ON o.id = t.id;
-- ===== 案例3: 索引部分失效 =====
-- ❌ WHERE status IN (1, 2, 3) 可能不走索引
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status IN ('pending', 'processing', 'completed');
-- 解决:将 IN 转为 OR 或使用临时表
-- 如果 IN 列表很长,改用临时表 + JOIN
CREATE TEMP TABLE tmp_status_list (status TEXT);
INSERT INTO tmp_status_list VALUES ('pending'), ('processing'), ('completed');
SELECT o.* FROM orders o
JOIN tmp_status_list t ON o.status = t.status;
-- ===== 案例4: 函数包裹列 =====
-- ❌ 函数包裹列,索引失效
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';
-- ✅ 改为范围查询(利用索引)
SELECT * FROM orders
WHERE created_at >= '2024-01-15'
AND created_at < '2024-01-16';
-- ✅ 使用函数索引(PostgreSQL 特有)
CREATE INDEX idx_orders_date ON orders (DATE(created_at));
三、索引类型与策略
3.1 六种索引对比
sql
-- ===== 1. B+Tree 索引(默认,最常用)=====
CREATE INDEX idx_users_email ON users(email);
-- 适用:等值查询、范围查询、排序、LIKE 'prefix%'
SELECT * FROM users WHERE email = 'alice@example.com'; -- Index Scan
SELECT * FROM users WHERE id > 100 AND id < 200; -- Index Range Scan
SELECT * FROM users ORDER BY email; -- Index Scan(避免排序)
-- ===== 2. BRIN 索引(时序数据,空间极小)=====
-- 适用于:天然有序的大表(日志、时序),数据很少被 UPDATE
CREATE INDEX idx_orders_created_brin ON orders USING BRIN(created_at)
WITH (pages_per_range = 32); -- 每 32 个 page 一个摘要
-- 空间对比(1亿行 orders 表):
-- B+Tree: 约 2.1GB
-- BRIN: 约 4MB (节省 500 倍空间!)
-- 适合查询:大范围扫描(准确查询需回表)
EXPLAIN SELECT COUNT(*) FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';
-- ===== 3. GIN 索引(全文搜索 + JSONB)=====
-- 全文搜索
CREATE INDEX idx_content_gin ON articles USING GIN(to_tsvector('english', content));
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'postgresql & performance');
-- JSONB 查询加速
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
data JSONB NOT NULL
);
CREATE INDEX idx_events_jsonb ON events USING GIN(data);
-- JSONB 查询示例
SELECT * FROM events WHERE data @> '{"type": "login", "source": "mobile"}';
SELECT * FROM events WHERE data ? 'error'; -- 包含 error 键
SELECT * FROM events WHERE data ?| ARRAY['error', 'timeout']; -- 包含任一
SELECT * FROM events WHERE data ?& ARRAY['user', 'device']; -- 包含全部
-- 带路径查询
SELECT * FROM events WHERE data @> '{"user": {"status": "active"}}';
-- ===== 4. GiST 索引(空间 + 全文)=====
CREATE INDEX idx_locations_gist ON places USING GIST(location); -- PostGIS
CREATE INDEX idx_articles_gist ON articles USING GIST(to_tsvector('english', title));
-- 空间查询(PostGIS 扩展)
SELECT * FROM places
WHERE ST_DWithin(location, ST_MakePoint(116.39, 39.91), 5000); -- 5km 内
-- ===== 5. SP-GiST 索引(空间分区,最近邻搜索)=====
CREATE INDEX idx_locations_spgist ON places USING SP-GIST(location);
-- 适合:非平衡分区(空间最近邻搜索)
-- 空间索引默认用 GiST,特定场景 SP-GiST 更快
-- ===== 6. Hash 索引(等值查询,极小)=====
CREATE INDEX idx_user_id_hash ON orders USING HASH(user_id);
-- 适用:仅等值查询
EXPLAIN SELECT * FROM orders WHERE user_id = 1001; -- Hash Index Scan
-- 注意:Hash 索引不支持 ORDER BY 和范围查询
3.2 部分索引与索引合并
sql
-- ===== 部分索引(Partial Index)=====
-- 只索引活跃订单(status = 'completed' 占 90%,不建索引)
CREATE INDEX idx_orders_active ON orders(user_id, created_at)
WHERE status IN ('pending', 'processing');
-- 查询自动使用部分索引
EXPLAIN SELECT * FROM orders
WHERE status = 'pending' AND user_id = 100;
-- -> Index Scan using idx_orders_active (cost=...)
-- ===== 表达索引(Expression Index)=====
-- 手机号后 4 位查询
CREATE INDEX idx_users_phone_last4 ON users (RIGHT(phone, 4));
SELECT * FROM users WHERE RIGHT(phone, 4) = '1234';
-- 不区分大小写查询
CREATE INDEX idx_users_email_ci ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- ===== 包含列索引(INCLUDE)=====
-- 覆盖索引:索引列 + 包含列(PostgreSQL 11+)
CREATE INDEX idx_orders_status_date ON orders(status, created_at)
INCLUDE (amount, user_id);
-- 查询不回表
EXPLAIN SELECT amount, user_id FROM orders
WHERE status = 'completed' AND created_at > '2024-01-01';
-- -> Index Only Scan(覆盖索引,无需回表!)
四、高级 SQL 特性
4.1 CTE 与窗口函数
sql
-- ===== 递归 CTE(树形结构查询)=====
-- 表结构:组织架构树
CREATE TABLE org_tree (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INTEGER REFERENCES org_tree(id)
);
-- 递归查询所有子节点
WITH RECURSIVE org_subtree AS (
-- 锚点:根节点
SELECT id, name, parent_id, 1 AS depth, ARRAY[id] AS path
FROM org_tree
WHERE id = 1 -- 部门 ID
UNION ALL
-- 递归:子节点
SELECT t.id, t.name, t.parent_id, s.depth + 1, s.path || t.id
FROM org_tree t
JOIN org_subtree s ON t.parent_id = s.id
)
SELECT * FROM org_subtree ORDER BY path;
-- ===== CTE 作为递表(Writable CTE)=====
WITH deleted AS (
DELETE FROM orders
WHERE status = 'expired' AND created_at < '2023-01-01'
RETURNING *
)
INSERT INTO orders_archive SELECT * FROM deleted;
-- ===== 窗口函数 =====
-- 排名(每组内排序)
SELECT
user_id,
amount,
created_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rn,
RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS dense_rank
FROM orders;
-- 移动平均值
SELECT
created_at,
amount,
AVG(amount) OVER (ORDER BY created_at ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7d
FROM orders
WHERE user_id = 100;
-- 累计求和
SELECT
created_at,
amount,
SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM orders
WHERE user_id = 100;
-- 滞后分析(对比上一单)
SELECT
created_at,
amount,
LAG(amount, 1) OVER (ORDER BY created_at) AS prev_amount,
amount - LAG(amount, 1) OVER (ORDER BY created_at) AS diff
FROM orders
WHERE user_id = 100;
4.2 JSONB 与全文搜索
sql
-- ===== JSONB 高级查询 =====
-- 创建 JSONB 表
CREATE TABLE product_variants (
id BIGSERIAL PRIMARY KEY,
specs JSONB NOT NULL,
inventory INT NOT NULL
);
-- 索引
CREATE INDEX idx_specs ON product_variants USING GIN(specs);
-- 条件查询
SELECT * FROM product_variants
WHERE specs @> '{"color": "red", "size": "XL"}';
-- JSONB 路径查询(将路径转换为 JSONB)
SELECT * FROM product_variants
WHERE specs @> '{"dimensions": {"height": 30}}';
-- JSONB 提取
SELECT
id,
specs->>'color' AS color,
specs->'dimensions'->>'width' AS width,
(specs->>'price')::NUMERIC AS price
FROM product_variants;
-- JSONB 聚合
SELECT
specs->>'color' AS color,
COUNT(*) AS count,
AVG((specs->>'price')::NUMERIC) AS avg_price
FROM product_variants
GROUP BY specs->>'color';
-- ===== 全文搜索(tsvector/tsquery)=====
-- 创建搜索表
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
body TEXT,
search_vector tsvector -- 预存 tsvector
);
-- 触发更新 search_vector
CREATE OR REPLACE FUNCTION documents_search_update() RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := to_tsvector('english', NEW.title || ' ' || NEW.body);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_documents_search
BEFORE INSERT OR UPDATE ON documents
FOR EACH ROW EXECUTE FUNCTION documents_search_update();
-- 查询
SELECT * FROM documents
WHERE search_vector @@ to_tsquery('english', 'postgresql & performance');
-- 排名搜索结果
SELECT
title,
ts_rank(search_vector, to_tsquery('english', 'postgresql & performance')) AS rank
FROM documents
WHERE search_vector @@ to_tsquery('english', 'postgresql & performance')
ORDER BY rank DESC
LIMIT 10;
-- 高亮结果
SELECT ts_headline('english', body, to_tsquery('english', 'postgresql & performance'),
'StartSel=<b>, StopSel=</b>, MaxWords=50, MinWords=20')
FROM documents
WHERE search_vector @@ to_tsquery('english', 'postgresql & performance');
五、流复制与逻辑复制
5.1 物理流复制
ini
# ===== 主节点配置 =====
# postgresql.conf (master)
wal_level = replica # 日志级别(replica 或 logical)
max_wal_senders = 10 # 最大 WAL 发送进程
wal_keep_size = 1GB # 保留 WAL 大小
synchronous_standby_names = '' # 同步复制模式(空 = 异步)
# pg_hba.conf (master)
# 允许从节点连接
host replication replicator 192.168.1.0/24 md5
# ===== 从节点配置 =====
# postgresql.conf (standby)
primary_conninfo = 'host=192.168.1.100 port=5432 user=replicator password=xxx'
primary_slot_name = 'standby1' # 使用 slot 防止 WAL 被删除
hot_standby = on # 允许只读查询
hot_standby_feedback = on # 防止主节点清理从节点还在使用的数据
# ===== 创建物理副本 =====
# 从节点
pg_basebackup -h 192.168.1.100 -D /var/lib/postgresql/data -P -R \
-S standby1 --slot -X stream
# -R: 自动创建 standby.signal 和 primary_conninfo 到 postgresql.conf
# -S: 创建复制槽名称
# -X stream: 复制流式 WAL
# ===== 同步复制配置 =====
-- 主节点查看复制状态
SELECT * FROM pg_stat_replication;
-- 配置同步复制(等至少 1 个从节点确认)
-- postgresql.conf
synchronous_standby_names = 'FIRST 1 (*)' -- 至少 1 个从节点同步
synchronous_commit = 'remote_write' -- 写策略
-- synchronous_commit 选项:
-- off: 主提交后立即返回(可能丢数据)
-- local: 写入本地 WAL 后返回(默认)
-- remote_write: 从节点收到 WAL 后返回
-- on: 从节点持久化到磁盘后返回
-- remote_apply: 从节点应用 WAL 后返回(最长延迟,最安全)
5.2 逻辑复制
sql
-- ===== 发布端(Master)=====
-- 创建发布(发布整张表)
CREATE PUBLICATION orders_pub
FOR TABLE orders
-- 发布指定列 + WHERE 过滤(PostgreSQL 15+)
CREATE PUBLICATION orders_partial_pub
FOR TABLE orders (order_id, user_id, amount, status)
WHERE (status = 'completed');
-- 发布所有表
CREATE PUBLICATION all_tables_pub FOR ALL TABLES;
-- ===== 订阅端(Subscriber)=====
-- 创建订阅(同步订阅,需目标表已存在)
CREATE SUBSCRIPTION orders_sub
CONNECTION 'host=192.168.1.100 port=5432 dbname=mydb user=replicator password=xxx'
PUBLICATION orders_pub;
-- 订阅指定发布
CREATE SUBSCRIPTION orders_partial_sub
CONNECTION 'host=192.168.1.100 port=5432 dbname=mydb user=replicator password=xxx'
PUBLICATION orders_partial_pub;
-- ===== 逻辑复制管理 =====
-- 查看复制状态
SELECT * FROM pg_stat_subscription;
SELECT * FROM pg_stat_replication WHERE application_name LIKE '%pglogical%';
-- 停止/恢复订阅
ALTER SUBSCRIPTION orders_sub DISABLE;
ALTER SUBSCRIPTION orders_sub ENABLE;
-- 刷新订阅(发布端新增表后需刷新)
ALTER SUBSCRIPTION orders_sub REFRESH PUBLICATION;
-- 删除订阅
DROP SUBSCRIPTION orders_sub;
-- ===== 流复制 vs 逻辑复制 =====
/*
物理流复制:
- 复制整个实例(所有库所有表)
- 从节点只读
- 支持触发故障转移
- 适合 HA 和读扩展
逻辑复制:
- 复制指定表(跨版本兼容)
- 订阅端可读写
- 支持数据过滤
- 适合微服务、数据同步
*/
-- ===== 并行 apply(PostgreSQL 16)=====
-- 订阅端并行应用 WAL
ALTER SUBSCRIPTION orders_sub SET (parallel = on, write_parallel_workers = 4);
六、分区表
6.1 声明式分区
sql
-- ===== 范围分区(按时间,最常用)=====
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT NOT NULL,
amount NUMERIC(10,2),
status TEXT,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- 按月创建分区
CREATE TABLE orders_202401 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_202402 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- 后续分区
CREATE TABLE orders_202403 PARTITION OF orders
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ===== 列表分区(按固定值)=====
CREATE TABLE t_region (
id SERIAL,
region TEXT NOT NULL,
data JSONB
) PARTITION BY LIST (region);
CREATE TABLE t_region_north PARTITION OF t_region
FOR VALUES IN ('Beijing', 'Tianjin', 'Hebei');
CREATE TABLE t_region_south PARTITION OF t_region
FOR VALUES IN ('Guangdong', 'Shenzhen', 'Hainan');
CREATE TABLE t_region_others PARTITION OF t_region DEFAULT;
-- ===== 哈希分区(均匀分布)=====
CREATE TABLE t_shard (
id BIGSERIAL,
user_id BIGINT NOT NULL,
data TEXT
) PARTITION BY HASH (user_id);
CREATE TABLE t_shard_0 PARTITION OF t_shard FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE t_shard_1 PARTITION OF t_shard FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE t_shard_2 PARTITION OF t_shard FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE t_shard_3 PARTITION OF t_shard FOR VALUES WITH (MODULUS 4, REMAINDER 3);
-- ===== 分区管理 =====
-- 查看分区结构
SELECT
parent.relname AS parent,
child.relname AS child,
pg_get_expr(child.relpartbound, child.oid) AS boundary
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'orders';
-- 删除旧分区
DROP TABLE IF EXISTS orders_202301;
-- 或者解绑保留表
ALTER TABLE orders DETACH PARTITION orders_202301;
-- 添加新分区
CREATE TABLE orders_202404 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-05-01');
-- ===== 分区查询优化 =====
-- 查询自动分区裁剪
EXPLAIN SELECT * FROM orders WHERE created_at = '2024-02-15';
-- 只扫描 orders_202402 分区
-- 分区上创建索引(每个分区独立索引)
CREATE INDEX ON orders(created_at);
-- PostgreSQL 会自动在每个分区上创建本地索引
七、配置优化
7.1 核心参数
ini
# ===== PostgreSQL 核心参数 =====
# postgresql.conf
# === 内存配置 ===
shared_buffers = 6GB # 共享缓冲区(物理内存的 25%)
effective_cache_size = 18GB # OS 文件缓存(物理内存的 50-75%)
work_mem = 64MB # 排序/哈希操作内存(排序索引后文件)
maintenance_work_mem = 1GB # VACUUM/索引创建内存(可大一些)
wal_buffers = 16MB # WAL 缓冲区
# === 并行查询 ===
max_parallel_workers = 8 # 最大并行工作进程
max_parallel_workers_per_gather = 4 # 每个查询的最大并行度
parallel_tuple_cost = 0.1 # 并行元组传输成本(调低鼓励并行)
parallel_setup_cost = 1000 # 并行启动成本
# === WAL 配置 ===
wal_level = replica # 复制的日志级别
max_wal_size = 8GB # 最大 WAL 大小
min_wal_size = 2GB # 最小 WAL 大小
checkpoint_completion_target = 0.9 # 检查点完成目标
wal_compression = zstd # WAL 压缩(PostgreSQL 15+ 支持 zstd)
wal_log_hints = on # WAL 提示位(pg_rewind 需要)
# === 统计信息 ===
default_statistics_target = 1000 # 统计信息采样数(默认 100,大表调高)
track_activities = on
track_counts = on
track_io_timing = on # 开启 I/O 时间统计(正常性能影响小)
track_functions = all
# === 清理 ===
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 60
autovacuum_vacuum_cost_limit = 200
7.2 资源配置与监控
sql
-- ===== 查看当前配置 =====
SELECT name, setting, unit, boot_val, reset_val, source
FROM pg_settings
WHERE name IN ('shared_buffers', 'work_mem', 'maintenance_work_mem',
'effective_cache_size', 'max_connections');
-- ===== 监控慢查询 =====
-- 开启慢查询日志
-- postgresql.conf
-- log_min_duration_statement = 1000 -- 记录超过 1 秒的查询
-- 查看慢查询
SELECT
query,
calls,
total_exec_time / 1000 AS total_sec,
mean_exec_time / 1000 AS avg_sec,
rows,
shared_blks_hit,
shared_blks_read,
temp_blks_written,
ROUND(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS hit_ratio
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- ===== 索引使用统计 =====
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
ROUND(idx_scan::NUMERIC / NULLIF(pg_relation_size(indexrelid), 0) * 100, 2) AS scan_mb_ratio
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- 从未使用过
ORDER BY pg_relation_size(indexrelid) DESC;
-- ===== 表访问统计 =====
SELECT
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup
FROM pg_stat_user_tables
ORDER BY seq_scan DESC
LIMIT 10;
八、生产运维
8.1 日常巡检
sql
-- ===== 数据库大小统计 =====
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;
-- ===== 表大小 TOP 10 =====
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_table_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;
-- ===== 未使用的索引(空间浪费)=====
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- 从未使用
ORDER BY pg_relation_size(indexrelid) DESC;
-- ===== 长时间运行的事务 =====
SELECT
pid,
state,
now() - query_start AS duration,
query,
wait_event,
wait_event_type
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '5 minutes'
ORDER BY duration DESC;
-- ===== 锁等待定位 =====
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.query AS blocked_query,
blocking_locks.pid AS blocking_pid,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
8.2 pg_stat_statements
sql
-- ===== pg_stat_statements 配置 =====
-- postgresql.conf
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.track = all
-- pg_stat_statements.max = 10000
-- 创建扩展
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- ===== 按总耗时排名 =====
SELECT
query,
calls,
total_exec_time / 1000 AS total_sec,
mean_exec_time / 1000 AS avg_sec,
rows,
ROUND(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS hit_ratio,
shared_blks_read,
shared_blks_written,
temp_blks_written,
blk_read_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- ===== 重置统计 =====
SELECT pg_stat_statements_reset();
-- ===== 查询缓存命中率分析 =====
SELECT
query,
calls,
ROUND(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS hit_ratio,
shared_blks_read
FROM pg_stat_statements
WHERE shared_blks_read > 1000 -- 磁盘读较多的查询
ORDER BY shared_blks_read DESC
LIMIT 10;
九、总结
技术全景
| 层 | 核心概念 | 关键点 |
|---|---|---|
| MVCC | xmin/xmax 版本链 | VACUUM 回收死元组 |
| EXPLAIN | Hash Join / Nested Loop / Seq Scan | 实际时间 VS 成本估算 |
| 索引 | B+Tree / BRIN / GIN / GiST / Hash | 时序用 BRIN,JSONB 用 GIN |
| 高级 SQL | CTE + 窗口函数 + JSONB + 全文搜索 | 递归 CTE 树形查询 |
| 复制 | 流复制 / 逻辑复制 | 物理=全库,逻辑=按表 |
| 分区 | Range / List / Hash | 分区裁剪自动优化 |
| 配置 | shared_buffers/effective_cache_size/work_mem | shared_buffers = 内存 25% |
| 监控 | pg_stat_statements | 慢查询、锁等待、膨胀 |
最佳实践
| 实践 | 说明 |
|---|---|
| autovacuum | 监控死元组率,大表降低 scale_factor |
| BRIN 索引 | 时序数据首选,空间比 B+Tree 小 500 倍 |
| gin 索引 | JSONB 和全文搜索必建 |
| 表膨胀 | 及时 VACUUM,监控 n_dead_tup |
| 慢查询 | log_min_duration_statement = 1000ms |
| 分区 | 超 2GB 表按时间或 ID 范围分区 |
| 部分索引 | 只索引热点数据(WHERE 过滤) |
| pg_stat_statements | 定位 TOP N 慢查询 |
| 流复制 slot | 防止主节点 WAL 被清理 |
本文涵盖 PostgreSQL 16 完整知识:MVCC 原理与 VACUUM 清理、EXPLAIN ANALYZE 执行计划解读、六种索引类型对比(B+Tree/BRIN/GIN/GiST/Hash)+ 部分索引/包含列索引、高级 SQL 特性(递归 CTE/窗口函数/JSONB 查询/全文搜索)、物理流复制与逻辑复制、声明式分区、核心配置参数调优、生产巡检与 pg_stat_statements 监控。