前言
💡 痛点: PostgreSQL 怎么分库分表?PostGIS 空间查询怎么做?逻辑复制 vs 物理复制怎么选?慢查询怎么优化?高并发怎么扛?
🎯 解决方案: 本文覆盖 PostgreSQL 高级特性:Range/List/Hash 分区表设计与分区裁剪优化、PostGIS 空间索引与地理位置查询、逻辑复制搭建与故障切换、EXPLAIN ANALYZE 深度分析、连接池 PgBouncer、性能调优 checklist、生产运维最佳实践。
#mermaid-svg-Y0STUmAJX2p4dgg6{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Y0STUmAJX2p4dgg6 .error-icon{fill:#552222;}#mermaid-svg-Y0STUmAJX2p4dgg6 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Y0STUmAJX2p4dgg6 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .marker.cross{stroke:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Y0STUmAJX2p4dgg6 p{margin:0;}#mermaid-svg-Y0STUmAJX2p4dgg6 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster-label text{fill:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster-label span{color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster-label span p{background-color:transparent;}#mermaid-svg-Y0STUmAJX2p4dgg6 .label text,#mermaid-svg-Y0STUmAJX2p4dgg6 span{fill:#333;color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node rect,#mermaid-svg-Y0STUmAJX2p4dgg6 .node circle,#mermaid-svg-Y0STUmAJX2p4dgg6 .node ellipse,#mermaid-svg-Y0STUmAJX2p4dgg6 .node polygon,#mermaid-svg-Y0STUmAJX2p4dgg6 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .rough-node .label text,#mermaid-svg-Y0STUmAJX2p4dgg6 .node .label text,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape .label{text-anchor:middle;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .rough-node .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .node .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape .label{text-align:center;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node.clickable{cursor:pointer;}#mermaid-svg-Y0STUmAJX2p4dgg6 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .arrowheadPath{fill:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Y0STUmAJX2p4dgg6 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster text{fill:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster span{color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Y0STUmAJX2p4dgg6 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 rect.text{fill:none;stroke-width:0;}#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape p,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape .label rect,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Y0STUmAJX2p4dgg6 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Y0STUmAJX2p4dgg6 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 性能优化
高级特性
分区策略
复制与高可用
逻辑复制
逻辑订阅/发布
物理复制
流复制/异步/同步
Patroni
高可用切换
Range 分区
按时间/数值范围
List 分区
按枚举值
Hash 分区
均匀散列
PostGIS
空间地理查询
全文搜索
tsvector/tsquery
JSONB
半结构化数据
外部数据封装
fdw
索引
B-tree/Hash/GiST/SP-GiST/GIN
EXPLAIN ANALYZE
执行计划分析
PgBouncer
连接池
VACUUM
垃圾回收
一、分区表深度实战
1.1 分区策略选择
sql
-- ======== 创建分区表(Range 分区 - 按时间) ========
-- 适用于:日志、订单、监控数据等时间序列数据
-- 主表(无数据,仅定义结构)
CREATE TABLE orders (
order_id BIGSERIAL,
user_id BIGINT NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
metadata JSONB DEFAULT '{}'
) PARTITION BY RANGE (created_at);
-- 创建默认分区(捕获未匹配的数据)
CREATE TABLE orders_default PARTITION OF orders DEFAULT;
-- 创建月度分区
CREATE TABLE orders_2025_01 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE orders_2025_02 PARTITION OF orders
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
CREATE TABLE orders_2025_03 PARTITION OF orders
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');
-- 创建季度分区(更长期规划)
CREATE TABLE orders_2025_q2 PARTITION OF orders
FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
-- ======== List 分区 - 按地区/类别 ========
CREATE TABLE sales (
id BIGSERIAL,
region VARCHAR(50) NOT NULL,
product_id BIGINT NOT NULL,
amount DECIMAL(12, 2) NOT NULL,
sold_at TIMESTAMPTZ NOT NULL
) PARTITION BY LIST (region);
-- 按地区分区
CREATE TABLE sales_east PARTITION OF sales
FOR VALUES IN ('Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen');
CREATE TABLE sales_west PARTITION OF sales
FOR VALUES IN ('Chengdu', 'Chongqing', 'Xi''an', 'Kunming');
CREATE TABLE sales_north PARTITION OF sales
FOR VALUES IN ('Harbin', 'Changchun', 'Shenyang', 'Dalian');
CREATE TABLE sales_default PARTITION OF sales DEFAULT;
-- ======== Hash 分区 - 均匀分布 ========
CREATE TABLE users (
user_id BIGSERIAL,
username VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
status SMALLINT NOT NULL DEFAULT 1
) PARTITION BY HASH (user_id);
-- 4 个 Hash 分区(取模 4)
CREATE TABLE users_p0 PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE users_p2 PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE users_p3 PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
-- ======== 多级分区(复合分区)=======
-- 按年和月复合分区
CREATE TABLE event_logs (
event_id BIGSERIAL,
event_type VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
-- 年级分区
CREATE TABLE event_logs_2025 PARTITION OF event_logs
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
-- 月级分区(在父分区上继续分区)
CREATE TABLE event_logs_2025_m01 PARTITION OF event_logs_2025
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
1.2 分区管理(自动分区)
sql
-- ======== 自动分区管理函数 ========
-- 使用 PL/pgSQL 自动创建新分区
CREATE OR REPLACE FUNCTION create_monthly_partition()
RETURNS TRIGGER AS $$
DECLARE
partition_date DATE;
partition_name TEXT;
start_date TEXT;
end_date TEXT;
BEGIN
-- 从新行的 created_at 提取年月
partition_date := DATE_TRUNC('month', NEW.created_at);
partition_name := 'orders_' || TO_CHAR(partition_date, 'YYYY_MM');
start_date := TO_CHAR(partition_date, 'YYYY-MM-DD');
end_date := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');
-- 检查分区是否已存在
IF NOT EXISTS (
SELECT 1 FROM pg_tables
WHERE tablename = partition_name
) THEN
-- 创建新分区
EXECUTE format(
'CREATE TABLE %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
-- 创建索引
EXECUTE format(
'CREATE INDEX ON %I (user_id)',
partition_name
);
EXECUTE format(
'CREATE INDEX ON %I (status)',
partition_name
);
RAISE NOTICE 'Created partition: %', partition_name;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- 触发器自动创建分区
CREATE TRIGGER trg_create_partition
BEFORE INSERT ON orders
FOR EACH ROW
EXECUTE FUNCTION create_monthly_partition();
-- ======== 分区维护 ========
-- 分离旧分区(转为独立表,可单独归档)
ALTER TABLE orders_2024_01 DETACH PARTITION orders_2024_01;
-- 将分离的分区归档到冷存储
CREATE TABLE orders_archive_2024_01 (LIKE orders_2024_01 INCLUDING ALL);
INSERT INTO orders_archive_2024_01 SELECT * FROM orders_2024_01;
DROP TABLE orders_2024_01;
-- 删除旧分区(数据过期策略)
DROP TABLE IF EXISTS orders_2023_01;
-- 查看分区信息
SELECT
parent.relname AS parent_table,
child.relname AS partition_name,
pg_get_expr(child.relpartbound, child.oid, true) AS partition_range,
pg_size_pretty(pg_relation_size(child.oid)) AS partition_size
FROM pg_inherits
JOIN pg_class parent ON parent.oid = pg_inherits.inhparent
JOIN pg_class child ON child.oid = pg_inherits.inhrelid
WHERE parent.relname = 'orders'
ORDER BY child.relname;
1.3 分区裁剪优化
sql
-- ======== 分区裁剪(Partition Pruning)=======
-- PostgreSQL 自动识别只需扫描的分区
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at >= '2025-03-01'
AND created_at < '2025-04-01'
AND user_id = 12345;
-- 关键看输出中是否包含 "Index Scan using ... on orders_2025_03"
-- 如果分区裁剪生效,只会扫描 2025_03 分区,不会扫描全部分区
-- 强制启用分区裁剪
SET enable_partition_pruning = on; -- 默认开启
-- 禁用分区裁剪(调试用)
SET enable_partition_pruning = off;
-- ======== 分区上的索引 ========
-- 在分区上创建本地索引
CREATE INDEX ON orders_2025_01 (user_id);
CREATE INDEX ON orders_2025_02 (user_id);
-- 在主表上创建索引(自动继承到所有分区,PostgreSQL 11+)
CREATE INDEX ON orders (user_id); -- 自动创建到每个分区
-- 局部索引(仅在特定分区上)
CREATE INDEX ON orders_2025_01 (status)
WHERE status IN ('completed', 'cancelled'); -- 局部索引
-- ======== 分区统计信息 ========
-- 收集分区级别统计信息
ANALYZE orders;
-- 查看每个分区的统计
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum,
last_analyze
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC;
二、PostGIS 地理空间查询
2.1 PostGIS 安装与基础
sql
-- ======== 安装 PostGIS ========
-- Ubuntu/Debian
-- apt install postgresql-15-postgis-3
-- CentOS/RHEL
-- yum install postgis30_15
-- 创建 PostGIS 扩展
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS postgis_raster;
CREATE EXTENSION IF NOT EXISTS postgis_topology;
CREATE EXTENSION IF NOT EXISTS postgis_sfcgal;
-- 验证安装
SELECT postgis_full_version();
-- ======== 地理空间数据表 ========
-- 创建带地理坐标的门店表
CREATE TABLE stores (
store_id BIGSERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
brand VARCHAR(100),
category VARCHAR(50),
address VARCHAR(500),
location GEOGRAPHY(Point, 4326) NOT NULL, -- WGS84 坐标系
geom GEOMETRY(Point, 4326), -- 几何类型(用于投影计算)
area_sqm DECIMAL(10, 2),
opening_hours JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- 创建门店地理围栏表
CREATE TABLE store_zones (
zone_id BIGSERIAL PRIMARY KEY,
store_id BIGINT REFERENCES stores(store_id),
zone_type VARCHAR(20), -- 'delivery' | 'service' | 'parking'
boundary GEOGRAPHY(Polygon, 4326) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- 创建用户位置表
CREATE TABLE user_locations (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
location GEOGRAPHY(Point, 4326) NOT NULL,
accuracy DECIMAL(8, 2), -- 米
recorded_at TIMESTAMPTZ DEFAULT NOW()
);
-- ======== 创建空间索引 ========
-- GiST 索引(最常用,支持范围查询)
CREATE INDEX idx_stores_location ON stores USING GIST (location);
CREATE INDEX idx_user_locations_location ON user_locations USING GIST (location);
CREATE INDEX idx_store_zones_boundary ON store_zones USING GIST (boundary);
-- ======== 批量插入数据 ========
-- 使用 WKT (Well-Known Text) 插入点
INSERT INTO stores (name, brand, category, address, location, geom, area_sqm) VALUES
(
'朝阳门店',
'星巴克',
'咖啡',
'北京市朝阳区建国路88号',
ST_SetSRID(ST_MakePoint(116.4852, 39.9123), 4326)::geography,
ST_MakePoint(116.4852, 39.9123),
250.00
),
(
'海淀黄庄店',
'星巴克',
'咖啡',
'北京市海淀区中关村大街1号',
ST_SetSRID(ST_MakePoint(116.3147, 39.9863), 4326)::geography,
ST_MakePoint(116.3147, 39.9863),
180.00
),
(
'浦东陆家嘴店',
'瑞幸咖啡',
'咖啡',
'上海市浦东新区世纪大道88号',
ST_SetSRID(ST_MakePoint(121.5064, 31.2455), 4326)::geography,
ST_MakePoint(121.5064, 31.2455),
120.00
);
2.2 距离查询与最近邻
sql
-- ======== 距离查询 ========
-- 查询用户位置周边 N 公里内的所有门店
DO $$
DECLARE
user_point GEOGRAPHY;
BEGIN
-- 设置用户位置(上海人民广场附近)
user_point := ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography;
-- 查询 3 公里内的门店,按距离排序
RAISE NOTICE '%',
(SELECT json_agg(json_build_object(
'name', name,
'distance_m', ST_Distance(location, user_point),
'distance_km', ST_Distance(location, user_point) / 1000
))
FROM stores
WHERE ST_DWithin(location, user_point, 3000) -- 3km 范围过滤(索引加速)
ORDER BY location <-> user_point -- <-> 是最近邻操作符(使用空间索引)
LIMIT 10
);
END $$;
-- ======== 最近邻查询优化 ========
-- 使用 <-> 操作符 + 空间索引加速最近邻搜索
EXPLAIN ANALYZE
SELECT
store_id,
name,
brand,
ST_Distance(location, ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography) AS distance_m
FROM stores
WHERE ST_DWithin(
location,
ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography,
5000 -- 5km
)
ORDER BY location <-> ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
LIMIT 5;
-- ======== KNN 最近邻查询(无距离限制)=======
-- 使用空间索引实现真正的 KNN
SELECT
store_id,
name,
ST_Distance(
location,
ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
) AS distance_m
FROM stores
ORDER BY location <-> ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
LIMIT 10;
2.3 空间关系与地理围栏
sql
-- ======== 地理围栏判断 ========
-- 判断用户是否在某个门店配送范围内
CREATE OR REPLACE FUNCTION is_within_delivery_zone(
p_user_point GEOGRAPHY,
p_store_id BIGINT
) RETURNS BOOLEAN AS $$
DECLARE
zone_boundary GEOGRAPHY;
BEGIN
SELECT boundary INTO zone_boundary
FROM store_zones
WHERE store_id = p_store_id AND zone_type = 'delivery'
LIMIT 1;
IF zone_boundary IS NULL THEN
RETURN FALSE;
END IF;
-- ST_Within: 点是否在多边形内
RETURN ST_Within(p_user_point, zone_boundary);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- 查询所有可配送门店
SELECT DISTINCT ON (s.store_id)
s.store_id,
s.name,
s.brand,
sz.zone_id
FROM stores s
JOIN store_zones sz ON sz.store_id = s.store_id AND sz.zone_type = 'delivery'
WHERE is_within_delivery_zone(
ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography,
s.store_id
)
LIMIT 10;
-- ======== 点是否在多边形内 ========
-- ST_Contains / ST_Within: 几何包含关系
SELECT name, brand
FROM stores
WHERE ST_Contains(
(SELECT geom FROM city_boundaries WHERE city_name = 'Shanghai'),
stores.geom
);
-- ======== 区域交叉查询 ========
-- 查找与某个区域相交的所有门店
SELECT s.name, sz.zone_type
FROM stores s
JOIN store_zones sz ON sz.store_id = s.store_id
WHERE ST_Intersects(
sz.boundary,
ST_SetSRID(ST_MakeBox2D(
ST_MakePoint(116.3, 39.8),
ST_MakePoint(116.6, 40.1)
), 4326)::geography
);
-- ======== 路径规划辅助 ========
-- 计算两点间的球面距离(haversine 公式)
SELECT
ST_Distance(
ST_SetSRID(ST_MakePoint(116.4852, 39.9123), 4326)::geography,
ST_SetSRID(ST_MakePoint(121.5064, 31.2455), 4326)::geography
) / 1000 AS distance_km; -- 结果:约 1068 km
-- ======== 聚合统计 ========
-- 按城市聚合门店数量和总营业面积
SELECT
CASE
WHEN ST_Within(
location,
(SELECT geom FROM city_boundaries WHERE city_name = 'Beijing')
) THEN 'Beijing'
WHEN ST_Within(
location,
(SELECT geom FROM city_boundaries WHERE city_name = 'Shanghai')
) THEN 'Shanghai'
ELSE 'Other'
END AS city,
COUNT(*) AS store_count,
SUM(area_sqm) AS total_area,
AVG(ST_Distance(
location,
ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
)) / 1000 AS avg_distance_km
FROM stores
GROUP BY 1
ORDER BY store_count DESC;
2.4 轨迹数据处理
sql
-- ======== 轨迹表 ========
CREATE TABLE user_trajectories (
trajectory_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
trip_id UUID NOT NULL,
point GEOGRAPHY(Point, 4326) NOT NULL,
speed_kmh DECIMAL(8, 2), -- 速度(公里/小时)
heading DECIMAL(5, 2), -- 方向角(度)
recorded_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (recorded_at);
-- 按月分区
CREATE TABLE user_trajectories_2025_01 PARTITION OF user_trajectories
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE INDEX idx_trajectories_point ON user_trajectories USING GIST (point);
CREATE INDEX idx_trajectories_user_trip ON user_trajectories (user_id, trip_id);
CREATE INDEX idx_trajectories_recorded ON user_trajectories (recorded_at);
-- ======== 轨迹平滑 ========
-- 使用 ST_MakeLine 按时间聚合轨迹
CREATE OR REPLACE FUNCTION get_trip_polyline(
p_trip_id UUID
) RETURNS TEXT AS $$
DECLARE
line GEOMETRY;
BEGIN
SELECT ST_AsEncodedPolyline(
ST_Simplify(
ST_LineMerge(
ST_Collect(point::geometry ORDER BY recorded_at)
),
0.00001 -- 简化阈值(度)
)
) INTO line
FROM user_trajectories
WHERE trip_id = p_trip_id;
RETURN line;
END;
$$ LANGUAGE plpgsql;
-- ======== 轨迹段长度 ========
SELECT
trip_id,
user_id,
ST_Length(
ST_MakeLine(point ORDER BY recorded_at)::geography
) / 1000 AS total_distance_km,
COUNT(*) AS point_count,
MIN(recorded_at) AS start_time,
MAX(recorded_at) AS end_time,
EXTRACT(EPOCH FROM (MAX(recorded_at) - MIN(recorded_at))) / 3600 AS duration_hours
FROM user_trajectories
WHERE recorded_at >= '2025-01-01'
GROUP BY trip_id, user_id
ORDER BY total_distance_km DESC;
三、逻辑复制与高可用
3.1 逻辑复制架构
sql
-- ======== 发布端配置(主库)=======
-- 1. 修改 postgresql.conf
-- wal_level = logical
-- max_wal_senders = 10
-- max_replication_slots = 10
-- wal_keep_size = 1GB
-- 2. pg_hba.conf 添加复制用户权限
-- host replication replicator 0.0.0.0/0 md5
-- 3. 创建复制用户
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'repl_password';
-- 4. 创建发布(Publication)
-- 发布整个表
CREATE PUBLICATION full_replication FOR ALL TABLES;
-- 发布特定表(有条件)
CREATE PUBLICATION orders_publication FOR TABLE
orders, order_items, products
WITH (publish = 'insert, update, delete');
-- 发布特定行(行过滤)
CREATE PUBLICATION active_users FOR TABLE users
WHERE (status = 1);
-- 发布特定列(列过滤)
CREATE PUBLICATION users_public FOR TABLE users
(user_id, username, email); -- 不包含 password_hash 等敏感字段
-- 查看发布
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;
-- ======== 订阅端配置(从库)=======
-- 1. 创建订阅
CREATE SUBSCRIPTION orders_sub CONNECTION
'host=primary-host port=5432 dbname=mydb user=replicator password=repl_password'
PUBLICATION orders_publication
WITH (slot_name = 'orders_slot', synchronous_commit = on);
-- 2. 复制状态监控
SELECT
subcriptionname,
slot_name,
status,
received_lsn,
last_msg_size,
last_msg_timestamp,
latest_end_lsn,
latest_end_time
FROM pg_stat_subscription;
-- 3. 查看复制延迟
SELECT
application_name,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
(sent_lsn - replay_lsn) AS replication_lag
FROM pg_stat_replication;
-- ======== 两阶段提交复制 ========
-- 逻辑复制支持两阶段提交
-- 发布端:使用 BEGIN ... COMMIT
-- 订阅端:自动按事务顺序应用
3.2 故障切换与槽管理
sql
-- ======== 复制槽管理 ========
-- 逻辑复制槽(必须管理好,否则 WAL 会膨胀)
SELECT
slot_name,
plugin,
slot_type,
database,
active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots;
-- 删除不活跃的复制槽
SELECT pg_drop_replication_slot('inactive_slot');
-- ======== 备库故障切换流程 ========
-- 1. 在备库检查复制状态
SELECT pg_is_in_recovery();
-- 2. 在备库创建恢复点
-- 需要 pg_rewind(MySQL 到 PG 迁移工具)
-- pg_rewind --target-pgdata=/var/lib/postgresql/data
-- --source-server='host=old-primary port=5432'
-- 3. 提升备库为主库
-- pg_ctl promote
-- 4. 更新订阅连接指向新主库
ALTER SUBSCRIPTION orders_sub CONNECTION
'host=new-primary port=5432 dbname=mydb user=replicator password=repl_password';
-- ======== 使用 Patroni 高可用 ========
-- Patroni 配置示例(patroni.yml)
-- patroni:
-- name: postgres-node1
-- namespace: /db
-- scope: postgres-cluster
-- restapi:
-- listen: 0.0.0.0:8008
-- connect_address: node1:8008
-- etcd:
-- host: etcd:2379
-- postgresql:
-- name: postgres-node1
-- listen: 0.0.0.0:5432
-- connect_address: node1:5432
-- data_dir: /data/postgresql
-- parameters:
-- wal_level: logical
-- max_wal_senders: 10
-- max_replication_slots: 10
-- wal_keep_size: 1GB
-- slots:
-- orders_slot:
-- database: mydb
-- plugin: pgoutput
-- tags:
-- nofailover: false
-- clonefrom: false
四、性能调优
4.1 EXPLAIN ANALYZE 深度分析
sql
-- ======== 执行计划分析基础 ========
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
u.user_id,
u.username,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.user_id
WHERE u.created_at >= '2025-01-01'
AND o.status = 'completed'
GROUP BY u.user_id, u.username
HAVING SUM(o.total_amount) > 1000
ORDER BY total_spent DESC
LIMIT 100;
-- ======== 解读执行计划关键指标 ========
/*
典型输出解析:
Limit (cost=1234.56..7890.12 rows=100)
-> Sort (cost=1234.56..7890.12 rows=100)
Sort Key: (sum(o.total_amount)) DESC
Sort Method: top-N heapsort BKK: 5
-> HashAggregate (cost=4567.89..5678.90 rows=100)
Batches: 1 Memory Usage: 523kB
-> Hash Right Join (cost=...->... rows=...)
Hash Cond: (o.user_id = u.user_id)
-> Seq Scan on orders o (cost=... rows=...)
Filter: ((status = 'completed') AND ...)
-> Hash (cost=... rows=...)
-> Seq Scan on users u (cost=... rows=...)
Filter: (created_at >= '2025-01-01')
Planning Time: 1.234 ms
Execution Time: 45.678 ms
*/
-- ======== 常见问题与解决方案 ========
-- 问题1:Seq Scan 全表扫描
SET enable_seqscan = off; -- 强制使用索引(调试用)
-- 问题2:Nested Loop 过载
SET enable_nestloop = off;
-- 问题3:Hash Join 内存不足
SET work_mem = '256MB'; -- 增加排序/哈希内存
-- 问题4:统计信息不准确
ANALYZE VERBOSE users; -- 详细模式
-- 生成扩展统计信息(多列相关性)
CREATE STATISTICS s_users_filter (dependencies)
ON status, created_at FROM users;
CREATE STATISTICS s_orders_join (ndistinct)
ON user_id, status FROM orders;
ANALYZE;
-- ======== 监控慢查询 ========
-- 开启 pg_stat_statements(需先创建扩展)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- 查询最慢的 SQL
SELECT
query,
calls,
total_exec_time / 1000 AS total_sec,
mean_exec_time AS avg_ms,
stddev_exec_time,
rows / calls AS avg_rows,
(100 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0))::INT AS cache_hit_pct
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- 查询扫描次数最多的表
SELECT
schemaname,
relname,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY seq_scan DESC;
4.2 索引优化
sql
-- ======== 各类索引适用场景 ========
-- B-tree(默认,最常用):等值查询、范围查询、排序
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_created ON orders (created_at DESC);
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
-- Hash:仅等值查询,速度更快
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);
-- GIN:全文搜索、JSONB 包含查询、数组包含
CREATE INDEX idx_users_interests ON users USING GIN (interests); -- 数组
CREATE INDEX idx_orders_metadata ON orders USING GIN (metadata); -- JSONB
CREATE INDEX idx_products_search ON products USING GIN (to_tsvector('english', name || ' ' || description));
-- GiST:几何/地理数据、范围类型
CREATE INDEX idx_reservations_timerange ON reservations USING GIST (timerange);
-- Partial Index(部分索引):只索引满足条件的数据
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
CREATE INDEX idx_users_active ON users (last_login_at)
WHERE status = 1 AND last_login_at > '2024-01-01';
-- Covering Index(覆盖索引):包含查询所需全部列,避免回表
CREATE INDEX idx_orders_covering ON orders (user_id, created_at DESC)
INCLUDE (total_amount, status);
-- ======== 表达式索引 ========
CREATE INDEX idx_users_email_lower ON users (lower(email));
CREATE INDEX idx_orders_month ON orders (DATE_TRUNC('month', created_at));
CREATE INDEX idx_products_revenue ON products ((price * stock_quantity));
-- ======== 重建索引 ========
-- 重建单个索引
REINDEX INDEX CONCURRENTLY idx_users_email;
-- 重建表的所有索引
REINDEX TABLE CONCURRENTLY orders;
-- 重建整个数据库的系统索引
REINDEX SYSTEM CONCURRENTLY postgres;
4.3 连接池与并发
sql
-- ======== PgBouncer 配置 ========
-- pgbouncer.ini
-- [databases]
-- mydb = host=127.0.0.1 port=5432 dbname=mydb
-- [pgbouncer]
-- listen_port = 6432
-- listen_addr = 127.0.0.1
-- auth_type = md5
-- auth_file = /etc/pgbouncer/userlist.txt
-- pool_mode = transaction -- 事务级连接池(推荐)
-- max_client_conn = 2000
-- default_pool_size = 25 -- 每个数据库的用户连接数
-- min_pool_size = 5
-- reserve_pool_size = 5
-- reserve_pool_timeout = 5
-- server_lifetime = 3600
-- server_idle_timeout = 600
-- log_connections = 0
-- log_disconnections = 0
-- log_pooler_errors = 1
-- ======== 连接池模式对比 ========
-- session 模式:连接在会话结束时释放
-- - 适合长连接应用
-- - 保证 SET/SETLOCAL 设置有效
-- - 缺点:连接复用率低
-- transaction 模式:事务结束后立即释放连接(推荐)
-- - 高并发场景首选
-- - 缺点:不能在事务外执行 SET(用 SET LOCAL)
-- - 不支持 PREPARE(需要禁用)
-- 语句模式:每个语句后释放
-- - 仅用于极短查询
-- - 不支持事务
-- ======== PostgreSQL 并发参数调优 ========
-- postgresql.conf
max_connections = 200 -- 最大连接数
shared_buffers = 8GB -- 共享缓冲区(建议 OS 内存的 25%)
effective_cache_size = 24GB -- 有效缓存(建议 OS 内存的 75%)
work_mem = 256MB -- 单次排序/哈希内存
maintenance_work_mem = 2GB -- 维护操作内存(VACUUM/ANALYZE/CREATE INDEX)
max_worker_processes = 16 -- 并行 worker 进程数
max_parallel_workers_per_gather = 4 -- 每个并行节点的最大 worker
max_parallel_workers = 16 -- 全局最大并行 worker
parallel_leader_participation = on
effective_io_concurrency = 200 -- 并行 IO 线程数(SSD 用 200+)
random_page_cost = 1.1 -- 随机页访问成本(SSD 用 1.1,HDD 用 4.0)
default_statistics_target = 500 -- 统计信息目标(越高越精确)
-- ======== 异步提交与可靠性 ========
-- 异步提交:性能更高,可能丢失少量数据
ALTER SYSTEM SET synchronous_commit = off;
-- 同步提交(默认):保证每个事务持久化
ALTER SYSTEM SET synchronous_commit = on;
-- 远程提交:等待 WAL 写入远程备库
ALTER SYSTEM SET synchronous_commit = remote_write;
-- 远程应用:等待备库重放 WAL
ALTER SYSTEM SET synchronous_commit = remote_apply;
-- 查看当前配置
SHOW synchronous_commit;
五、扩展生态
5.1 全文搜索
sql
-- ======== 全文搜索基础 ========
-- 创建全文搜索索引
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(content, '')), 'B') ||
setweight(to_tsvector('english', coalesce(tags, '')), 'C');
-- 创建 GIN 索引
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
-- 自动更新 search_vector(使用触发器)
CREATE OR REPLACE FUNCTION articles_search_trigger() RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector :=
setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(NEW.content, '')), 'B') ||
setweight(to_tsvector('english', coalesce(NEW.tags, '')), 'C');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();
-- 全文搜索查询
SELECT
title,
ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & optimize & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- 高亮搜索结果
SELECT
title,
ts_headline('english', content, query, 'MaxWords=50, MinWords=20') AS snippet
FROM articles, to_tsquery('english', 'postgres') query
WHERE search_vector @@ query;
5.2 JSONB 高级操作
sql
-- ======== JSONB 增删改查 ========
-- 创建 JSONB 列
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
event_type VARCHAR(50),
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_events_type ON events ((payload->>'type'));
CREATE INDEX idx_events_meta ON events USING GIN (payload);
-- JSONB 操作符
-- -> 获取 JSON 对象(返回 JSON)
SELECT payload->'user' FROM events;
-- ->> 获取 JSON 对象(返回 TEXT)
SELECT payload->>'user_id' FROM events;
-- #> / #>> 获取嵌套路径
SELECT payload#>'{user,profile,age}' FROM events;
SELECT payload#>>'{user,profile,age}' FROM events;
-- @> 包含(是否包含指定 JSON)
SELECT * FROM events WHERE payload @> '{"type":"click"}';
-- ? 存在键
SELECT * FROM events WHERE payload ? 'user_id';
-- JSONB 数组操作
SELECT payload->'items'->0->>'product_id' FROM orders;
SELECT * FROM events WHERE payload->'tags' @> '["mobile", "ios"]'::jsonb;
-- ======== JSONB 函数 ========
-- jsonb_object_keys:展开对象键
SELECT jsonb_object_keys(payload->'metadata')
FROM events
WHERE payload ? 'metadata';
-- jsonb_each:展开键值对
SELECT key, value
FROM events,
jsonb_each_text(payload)
WHERE event_id = 1;
-- jsonb_agg / jsonb_object_agg:聚合为 JSONB
SELECT
user_id,
jsonb_object_agg(event_type, count) AS event_summary
FROM (
SELECT
payload->>'user_id' AS user_id,
payload->>'type' AS event_type,
COUNT(*) AS count
FROM events
GROUP BY 1, 2
) t
GROUP BY user_id;
5.3 TimescaleDB 时序扩展
sql
-- ======== TimescaleDB 安装与使用 ========
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- 将普通表转为时序表(Hypertable)
SELECT create_hypertable('sensor_readings',
time_column => 'recorded_at',
chunk_time_interval => INTERVAL '1 day',
migrate_data => true
);
-- 压缩策略(大幅节省存储)
ALTER TABLE sensor_readings SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id'
);
-- 添加压缩策略(7 天后压缩)
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');
-- 连续聚合(实时物化视图)
CREATE MATERIALIZED VIEW hourly_stats
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', recorded_at) AS bucket,
sensor_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
MIN(temperature) AS min_temp,
COUNT(*) AS reading_count
FROM sensor_readings
GROUP BY 1, 2
WITH NO DATA;
-- 添加刷新策略
SELECT add_continuous_aggregate_policy('hourly_stats',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
-- 降采样查询(从压缩数据中快速聚合)
SELECT * FROM hourly_stats
WHERE bucket >= NOW() - INTERVAL '30 days'
AND sensor_id = 'sensor_001'
ORDER BY bucket DESC;
六、Checklist 总结
□ 分区表
□ Range 分区(时间序列)
□ List 分区(枚举值/地区)
□ Hash 分区(均匀散列)
□ 分区裁剪生效(EXPLAIN 验证)
□ 自动分区创建(触发器)
□ 分区维护(归档/删除过期)
□ 分区索引(本地索引/局部索引)
□ PostGIS
□ 点/线/多边形数据类型
□ 空间索引(GiST)
□ 距离查询(ST_DWithin + 索引)
□ 最近邻查询(<-> 操作符)
□ 地理围栏(ST_Within / ST_Contains)
□ 区域交叉(ST_Intersects)
□ 轨迹处理(ST_MakeLine)
□ 坐标转换(ST_Transform)
□ 逻辑复制
□ Publication 创建与过滤
□ Subscription 配置
□ 复制槽管理
□ 延迟监控
□ 故障切换流程
□ 性能调优
□ EXPLAIN ANALYZE 分析
□ 扩展统计信息
□ 各类索引(B-tree/GIN/GiST/Hash/Partial/Covering)
□ PgBouncer 连接池
□ PostgreSQL 参数调优
□ 压缩配置
□ 扩展生态
□ 全文搜索(tsvector/tsquery/GIN)
□ JSONB 操作与索引
□ TimescaleDB 时序压缩
□ PostGIS 轨迹处理
总结
一句话总结: PostgreSQL 高级特性 = 分区裁剪 + PostGIS 空间查询 + 逻辑复制 + 扩展生态,从 CRUD 到 TB 级数据处理全覆盖。
PostgreSQL vs MySQL 高级特性对比:
| 特性 | PostgreSQL | MySQL |
|---|---|---|
| 分区策略 | Range/List/Hash/复合分区 | RANGE/LIST/HASH/KEY |
| 空间查询 | PostGIS(完整 GIS 功能) | ST_Geometry(基础) |
| 复制 | 逻辑复制 + 物理流复制 | GTID 异步/半同步 |
| JSON | JSONB(原生,支持索引) | JSON(支持,功能弱) |
| 全文搜索 | 内置 + 多种语言支持 | InnoDB FTS(基础) |
| 时序扩展 | TimescaleDB | MySQL HeatWave(商业) |
| 连接池 | PgBouncer | ProxySQL |
下一步推荐:
- CockroachDB 分布式 SQL 实战(HTAP + 全球分布)
- TimescaleDB + Grafana 时序监控大盘
- PostgreSQL 16/17 新特性(列级权限、pg_waldump、Logical Replication 增强)