PostgreSQL 高级实战:分区表 + PostGIS + 逻辑复制 + 性能调优 + 扩展生态

前言

💡 痛点: PostgreSQL 怎么分库分表?PostGIS 空间查询怎么做?逻辑复制 vs 物理复制怎么选?慢查询怎么优化?高并发怎么扛?

🎯 解决方案: 本文覆盖 PostgreSQL 高级特性:Range/List/Hash 分区表设计与分区裁剪优化、PostGIS 空间索引与地理位置查询、逻辑复制搭建与故障切换、EXPLAIN ANALYZE 深度分析、连接池 PgBouncer、性能调优 checklist、生产运维最佳实践。
#mermaid-svg-Y0STUmAJX2p4dgg6{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Y0STUmAJX2p4dgg6 .error-icon{fill:#552222;}#mermaid-svg-Y0STUmAJX2p4dgg6 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Y0STUmAJX2p4dgg6 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .marker.cross{stroke:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Y0STUmAJX2p4dgg6 p{margin:0;}#mermaid-svg-Y0STUmAJX2p4dgg6 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster-label text{fill:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster-label span{color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster-label span p{background-color:transparent;}#mermaid-svg-Y0STUmAJX2p4dgg6 .label text,#mermaid-svg-Y0STUmAJX2p4dgg6 span{fill:#333;color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node rect,#mermaid-svg-Y0STUmAJX2p4dgg6 .node circle,#mermaid-svg-Y0STUmAJX2p4dgg6 .node ellipse,#mermaid-svg-Y0STUmAJX2p4dgg6 .node polygon,#mermaid-svg-Y0STUmAJX2p4dgg6 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .rough-node .label text,#mermaid-svg-Y0STUmAJX2p4dgg6 .node .label text,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape .label{text-anchor:middle;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .rough-node .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .node .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape .label,#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape .label{text-align:center;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node.clickable{cursor:pointer;}#mermaid-svg-Y0STUmAJX2p4dgg6 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .arrowheadPath{fill:#333333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Y0STUmAJX2p4dgg6 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Y0STUmAJX2p4dgg6 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster text{fill:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 .cluster span{color:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Y0STUmAJX2p4dgg6 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Y0STUmAJX2p4dgg6 rect.text{fill:none;stroke-width:0;}#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape p,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Y0STUmAJX2p4dgg6 .icon-shape .label rect,#mermaid-svg-Y0STUmAJX2p4dgg6 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Y0STUmAJX2p4dgg6 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Y0STUmAJX2p4dgg6 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Y0STUmAJX2p4dgg6 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 性能优化
高级特性
分区策略
复制与高可用
逻辑复制

逻辑订阅/发布
物理复制

流复制/异步/同步
Patroni

高可用切换
Range 分区

按时间/数值范围
List 分区

按枚举值
Hash 分区

均匀散列
PostGIS

空间地理查询
全文搜索

tsvector/tsquery
JSONB

半结构化数据
外部数据封装

fdw
索引

B-tree/Hash/GiST/SP-GiST/GIN
EXPLAIN ANALYZE

执行计划分析
PgBouncer

连接池
VACUUM

垃圾回收


一、分区表深度实战

1.1 分区策略选择

sql 复制代码
-- ======== 创建分区表(Range 分区 - 按时间) ========
-- 适用于:日志、订单、监控数据等时间序列数据

-- 主表(无数据,仅定义结构)
CREATE TABLE orders (
    order_id    BIGSERIAL,
    user_id     BIGINT NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    status      VARCHAR(20) NOT NULL DEFAULT 'pending',
    created_at  TIMESTAMPTZ NOT NULL,
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    metadata    JSONB DEFAULT '{}'
) PARTITION BY RANGE (created_at);

-- 创建默认分区(捕获未匹配的数据)
CREATE TABLE orders_default PARTITION OF orders DEFAULT;

-- 创建月度分区
CREATE TABLE orders_2025_01 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE orders_2025_02 PARTITION OF orders
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

CREATE TABLE orders_2025_03 PARTITION OF orders
    FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- 创建季度分区(更长期规划)
CREATE TABLE orders_2025_q2 PARTITION OF orders
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

-- ======== List 分区 - 按地区/类别 ========
CREATE TABLE sales (
    id          BIGSERIAL,
    region      VARCHAR(50) NOT NULL,
    product_id  BIGINT NOT NULL,
    amount      DECIMAL(12, 2) NOT NULL,
    sold_at     TIMESTAMPTZ NOT NULL
) PARTITION BY LIST (region);

-- 按地区分区
CREATE TABLE sales_east PARTITION OF sales
    FOR VALUES IN ('Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen');

CREATE TABLE sales_west PARTITION OF sales
    FOR VALUES IN ('Chengdu', 'Chongqing', 'Xi''an', 'Kunming');

CREATE TABLE sales_north PARTITION OF sales
    FOR VALUES IN ('Harbin', 'Changchun', 'Shenyang', 'Dalian');

CREATE TABLE sales_default PARTITION OF sales DEFAULT;

-- ======== Hash 分区 - 均匀分布 ========
CREATE TABLE users (
    user_id     BIGSERIAL,
    username    VARCHAR(100) NOT NULL,
    email       VARCHAR(255) NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    status      SMALLINT NOT NULL DEFAULT 1
) PARTITION BY HASH (user_id);

-- 4 个 Hash 分区(取模 4)
CREATE TABLE users_p0 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE users_p1 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE users_p2 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE users_p3 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

-- ======== 多级分区(复合分区)=======
-- 按年和月复合分区
CREATE TABLE event_logs (
    event_id    BIGSERIAL,
    event_type  VARCHAR(50) NOT NULL,
    payload     JSONB NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

-- 年级分区
CREATE TABLE event_logs_2025 PARTITION OF event_logs
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- 月级分区(在父分区上继续分区)
CREATE TABLE event_logs_2025_m01 PARTITION OF event_logs_2025
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

1.2 分区管理(自动分区)

sql 复制代码
-- ======== 自动分区管理函数 ========
-- 使用 PL/pgSQL 自动创建新分区

CREATE OR REPLACE FUNCTION create_monthly_partition()
RETURNS TRIGGER AS $$
DECLARE
    partition_date DATE;
    partition_name TEXT;
    start_date TEXT;
    end_date TEXT;
BEGIN
    -- 从新行的 created_at 提取年月
    partition_date := DATE_TRUNC('month', NEW.created_at);
    partition_name := 'orders_' || TO_CHAR(partition_date, 'YYYY_MM');
    start_date := TO_CHAR(partition_date, 'YYYY-MM-DD');
    end_date := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');

    -- 检查分区是否已存在
    IF NOT EXISTS (
        SELECT 1 FROM pg_tables
        WHERE tablename = partition_name
    ) THEN
        -- 创建新分区
        EXECUTE format(
            'CREATE TABLE %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)',
            partition_name, start_date, end_date
        );

        -- 创建索引
        EXECUTE format(
            'CREATE INDEX ON %I (user_id)',
            partition_name
        );
        EXECUTE format(
            'CREATE INDEX ON %I (status)',
            partition_name
        );

        RAISE NOTICE 'Created partition: %', partition_name;
    END IF;

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- 触发器自动创建分区
CREATE TRIGGER trg_create_partition
    BEFORE INSERT ON orders
    FOR EACH ROW
    EXECUTE FUNCTION create_monthly_partition();

-- ======== 分区维护 ========
-- 分离旧分区(转为独立表,可单独归档)
ALTER TABLE orders_2024_01 DETACH PARTITION orders_2024_01;

-- 将分离的分区归档到冷存储
CREATE TABLE orders_archive_2024_01 (LIKE orders_2024_01 INCLUDING ALL);
INSERT INTO orders_archive_2024_01 SELECT * FROM orders_2024_01;
DROP TABLE orders_2024_01;

-- 删除旧分区(数据过期策略)
DROP TABLE IF EXISTS orders_2023_01;

-- 查看分区信息
SELECT
    parent.relname  AS parent_table,
    child.relname   AS partition_name,
    pg_get_expr(child.relpartbound, child.oid, true) AS partition_range,
    pg_size_pretty(pg_relation_size(child.oid)) AS partition_size
FROM pg_inherits
JOIN pg_class parent ON parent.oid = pg_inherits.inhparent
JOIN pg_class child ON child.oid = pg_inherits.inhrelid
WHERE parent.relname = 'orders'
ORDER BY child.relname;

1.3 分区裁剪优化

sql 复制代码
-- ======== 分区裁剪(Partition Pruning)=======
-- PostgreSQL 自动识别只需扫描的分区

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at >= '2025-03-01'
  AND created_at < '2025-04-01'
  AND user_id = 12345;

-- 关键看输出中是否包含 "Index Scan using ... on orders_2025_03"
-- 如果分区裁剪生效,只会扫描 2025_03 分区,不会扫描全部分区

-- 强制启用分区裁剪
SET enable_partition_pruning = on;  -- 默认开启

-- 禁用分区裁剪(调试用)
SET enable_partition_pruning = off;

-- ======== 分区上的索引 ========
-- 在分区上创建本地索引
CREATE INDEX ON orders_2025_01 (user_id);
CREATE INDEX ON orders_2025_02 (user_id);

-- 在主表上创建索引(自动继承到所有分区,PostgreSQL 11+)
CREATE INDEX ON orders (user_id);  -- 自动创建到每个分区

-- 局部索引(仅在特定分区上)
CREATE INDEX ON orders_2025_01 (status)
    WHERE status IN ('completed', 'cancelled');  -- 局部索引

-- ======== 分区统计信息 ========
-- 收集分区级别统计信息
ANALYZE orders;

-- 查看每个分区的统计
SELECT
    schemaname,
    tablename,
    n_live_tup,
    n_dead_tup,
    last_vacuum,
    last_autovacuum,
    last_analyze
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC;

二、PostGIS 地理空间查询

2.1 PostGIS 安装与基础

sql 复制代码
-- ======== 安装 PostGIS ========
-- Ubuntu/Debian
-- apt install postgresql-15-postgis-3

-- CentOS/RHEL
-- yum install postgis30_15

-- 创建 PostGIS 扩展
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS postgis_raster;
CREATE EXTENSION IF NOT EXISTS postgis_topology;
CREATE EXTENSION IF NOT EXISTS postgis_sfcgal;

-- 验证安装
SELECT postgis_full_version();

-- ======== 地理空间数据表 ========
-- 创建带地理坐标的门店表
CREATE TABLE stores (
    store_id    BIGSERIAL PRIMARY KEY,
    name        VARCHAR(200) NOT NULL,
    brand       VARCHAR(100),
    category    VARCHAR(50),
    address     VARCHAR(500),
    location    GEOGRAPHY(Point, 4326) NOT NULL,  -- WGS84 坐标系
    geom        GEOMETRY(Point, 4326),              -- 几何类型(用于投影计算)
    area_sqm    DECIMAL(10, 2),
    opening_hours JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- 创建门店地理围栏表
CREATE TABLE store_zones (
    zone_id     BIGSERIAL PRIMARY KEY,
    store_id    BIGINT REFERENCES stores(store_id),
    zone_type   VARCHAR(20),  -- 'delivery' | 'service' | 'parking'
    boundary    GEOGRAPHY(Polygon, 4326) NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- 创建用户位置表
CREATE TABLE user_locations (
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT NOT NULL,
    location    GEOGRAPHY(Point, 4326) NOT NULL,
    accuracy    DECIMAL(8, 2),   -- 米
    recorded_at  TIMESTAMPTZ DEFAULT NOW()
);

-- ======== 创建空间索引 ========
-- GiST 索引(最常用,支持范围查询)
CREATE INDEX idx_stores_location ON stores USING GIST (location);
CREATE INDEX idx_user_locations_location ON user_locations USING GIST (location);
CREATE INDEX idx_store_zones_boundary ON store_zones USING GIST (boundary);

-- ======== 批量插入数据 ========
-- 使用 WKT (Well-Known Text) 插入点
INSERT INTO stores (name, brand, category, address, location, geom, area_sqm) VALUES
(
    '朝阳门店',
    '星巴克',
    '咖啡',
    '北京市朝阳区建国路88号',
    ST_SetSRID(ST_MakePoint(116.4852, 39.9123), 4326)::geography,
    ST_MakePoint(116.4852, 39.9123),
    250.00
),
(
    '海淀黄庄店',
    '星巴克',
    '咖啡',
    '北京市海淀区中关村大街1号',
    ST_SetSRID(ST_MakePoint(116.3147, 39.9863), 4326)::geography,
    ST_MakePoint(116.3147, 39.9863),
    180.00
),
(
    '浦东陆家嘴店',
    '瑞幸咖啡',
    '咖啡',
    '上海市浦东新区世纪大道88号',
    ST_SetSRID(ST_MakePoint(121.5064, 31.2455), 4326)::geography,
    ST_MakePoint(121.5064, 31.2455),
    120.00
);

2.2 距离查询与最近邻

sql 复制代码
-- ======== 距离查询 ========
-- 查询用户位置周边 N 公里内的所有门店
DO $$
DECLARE
    user_point GEOGRAPHY;
BEGIN
    -- 设置用户位置(上海人民广场附近)
    user_point := ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography;

    -- 查询 3 公里内的门店,按距离排序
    RAISE NOTICE '%',
        (SELECT json_agg(json_build_object(
            'name', name,
            'distance_m', ST_Distance(location, user_point),
            'distance_km', ST_Distance(location, user_point) / 1000
        ))
        FROM stores
        WHERE ST_DWithin(location, user_point, 3000)  -- 3km 范围过滤(索引加速)
        ORDER BY location <-> user_point  -- <-> 是最近邻操作符(使用空间索引)
        LIMIT 10
    );
END $$;

-- ======== 最近邻查询优化 ========
-- 使用 <-> 操作符 + 空间索引加速最近邻搜索
EXPLAIN ANALYZE
SELECT
    store_id,
    name,
    brand,
    ST_Distance(location, ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography) AS distance_m
FROM stores
WHERE ST_DWithin(
    location,
    ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography,
    5000  -- 5km
)
ORDER BY location <-> ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
LIMIT 5;

-- ======== KNN 最近邻查询(无距离限制)=======
-- 使用空间索引实现真正的 KNN
SELECT
    store_id,
    name,
    ST_Distance(
        location,
        ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
    ) AS distance_m
FROM stores
ORDER BY location <-> ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
LIMIT 10;

2.3 空间关系与地理围栏

sql 复制代码
-- ======== 地理围栏判断 ========
-- 判断用户是否在某个门店配送范围内
CREATE OR REPLACE FUNCTION is_within_delivery_zone(
    p_user_point GEOGRAPHY,
    p_store_id BIGINT
) RETURNS BOOLEAN AS $$
DECLARE
    zone_boundary GEOGRAPHY;
BEGIN
    SELECT boundary INTO zone_boundary
    FROM store_zones
    WHERE store_id = p_store_id AND zone_type = 'delivery'
    LIMIT 1;

    IF zone_boundary IS NULL THEN
        RETURN FALSE;
    END IF;

    -- ST_Within: 点是否在多边形内
    RETURN ST_Within(p_user_point, zone_boundary);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- 查询所有可配送门店
SELECT DISTINCT ON (s.store_id)
    s.store_id,
    s.name,
    s.brand,
    sz.zone_id
FROM stores s
JOIN store_zones sz ON sz.store_id = s.store_id AND sz.zone_type = 'delivery'
WHERE is_within_delivery_zone(
    ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography,
    s.store_id
)
LIMIT 10;

-- ======== 点是否在多边形内 ========
-- ST_Contains / ST_Within: 几何包含关系
SELECT name, brand
FROM stores
WHERE ST_Contains(
    (SELECT geom FROM city_boundaries WHERE city_name = 'Shanghai'),
    stores.geom
);

-- ======== 区域交叉查询 ========
-- 查找与某个区域相交的所有门店
SELECT s.name, sz.zone_type
FROM stores s
JOIN store_zones sz ON sz.store_id = s.store_id
WHERE ST_Intersects(
    sz.boundary,
    ST_SetSRID(ST_MakeBox2D(
        ST_MakePoint(116.3, 39.8),
        ST_MakePoint(116.6, 40.1)
    ), 4326)::geography
);

-- ======== 路径规划辅助 ========
-- 计算两点间的球面距离(haversine 公式)
SELECT
    ST_Distance(
        ST_SetSRID(ST_MakePoint(116.4852, 39.9123), 4326)::geography,
        ST_SetSRID(ST_MakePoint(121.5064, 31.2455), 4326)::geography
    ) / 1000 AS distance_km;  -- 结果:约 1068 km

-- ======== 聚合统计 ========
-- 按城市聚合门店数量和总营业面积
SELECT
    CASE
        WHEN ST_Within(
            location,
            (SELECT geom FROM city_boundaries WHERE city_name = 'Beijing')
        ) THEN 'Beijing'
        WHEN ST_Within(
            location,
            (SELECT geom FROM city_boundaries WHERE city_name = 'Shanghai')
        ) THEN 'Shanghai'
        ELSE 'Other'
    END AS city,
    COUNT(*) AS store_count,
    SUM(area_sqm) AS total_area,
    AVG(ST_Distance(
        location,
        ST_SetSRID(ST_MakePoint(121.4726, 31.2304), 4326)::geography
    )) / 1000 AS avg_distance_km
FROM stores
GROUP BY 1
ORDER BY store_count DESC;

2.4 轨迹数据处理

sql 复制代码
-- ======== 轨迹表 ========
CREATE TABLE user_trajectories (
    trajectory_id BIGSERIAL PRIMARY KEY,
    user_id       BIGINT NOT NULL,
    trip_id       UUID NOT NULL,
    point         GEOGRAPHY(Point, 4326) NOT NULL,
    speed_kmh     DECIMAL(8, 2),  -- 速度(公里/小时)
    heading       DECIMAL(5, 2),   -- 方向角(度)
    recorded_at   TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (recorded_at);

-- 按月分区
CREATE TABLE user_trajectories_2025_01 PARTITION OF user_trajectories
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE INDEX idx_trajectories_point ON user_trajectories USING GIST (point);
CREATE INDEX idx_trajectories_user_trip ON user_trajectories (user_id, trip_id);
CREATE INDEX idx_trajectories_recorded ON user_trajectories (recorded_at);

-- ======== 轨迹平滑 ========
-- 使用 ST_MakeLine 按时间聚合轨迹
CREATE OR REPLACE FUNCTION get_trip_polyline(
    p_trip_id UUID
) RETURNS TEXT AS $$
DECLARE
    line GEOMETRY;
BEGIN
    SELECT ST_AsEncodedPolyline(
        ST_Simplify(
            ST_LineMerge(
                ST_Collect(point::geometry ORDER BY recorded_at)
            ),
            0.00001  -- 简化阈值(度)
        )
    ) INTO line
    FROM user_trajectories
    WHERE trip_id = p_trip_id;

    RETURN line;
END;
$$ LANGUAGE plpgsql;

-- ======== 轨迹段长度 ========
SELECT
    trip_id,
    user_id,
    ST_Length(
        ST_MakeLine(point ORDER BY recorded_at)::geography
    ) / 1000 AS total_distance_km,
    COUNT(*) AS point_count,
    MIN(recorded_at) AS start_time,
    MAX(recorded_at) AS end_time,
    EXTRACT(EPOCH FROM (MAX(recorded_at) - MIN(recorded_at))) / 3600 AS duration_hours
FROM user_trajectories
WHERE recorded_at >= '2025-01-01'
GROUP BY trip_id, user_id
ORDER BY total_distance_km DESC;

三、逻辑复制与高可用

3.1 逻辑复制架构

sql 复制代码
-- ======== 发布端配置(主库)=======
-- 1. 修改 postgresql.conf
-- wal_level = logical
-- max_wal_senders = 10
-- max_replication_slots = 10
-- wal_keep_size = 1GB

-- 2. pg_hba.conf 添加复制用户权限
-- host replication replicator 0.0.0.0/0 md5

-- 3. 创建复制用户
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'repl_password';

-- 4. 创建发布(Publication)
-- 发布整个表
CREATE PUBLICATION full_replication FOR ALL TABLES;

-- 发布特定表(有条件)
CREATE PUBLICATION orders_publication FOR TABLE
    orders, order_items, products
    WITH (publish = 'insert, update, delete');

-- 发布特定行(行过滤)
CREATE PUBLICATION active_users FOR TABLE users
    WHERE (status = 1);

-- 发布特定列(列过滤)
CREATE PUBLICATION users_public FOR TABLE users
    (user_id, username, email);  -- 不包含 password_hash 等敏感字段

-- 查看发布
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;

-- ======== 订阅端配置(从库)=======
-- 1. 创建订阅
CREATE SUBSCRIPTION orders_sub CONNECTION
    'host=primary-host port=5432 dbname=mydb user=replicator password=repl_password'
PUBLICATION orders_publication
WITH (slot_name = 'orders_slot', synchronous_commit = on);

-- 2. 复制状态监控
SELECT
    subcriptionname,
    slot_name,
    status,
    received_lsn,
    last_msg_size,
    last_msg_timestamp,
    latest_end_lsn,
    latest_end_time
FROM pg_stat_subscription;

-- 3. 查看复制延迟
SELECT
    application_name,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    (sent_lsn - replay_lsn) AS replication_lag
FROM pg_stat_replication;

-- ======== 两阶段提交复制 ========
-- 逻辑复制支持两阶段提交
-- 发布端:使用 BEGIN ... COMMIT
-- 订阅端:自动按事务顺序应用

3.2 故障切换与槽管理

sql 复制代码
-- ======== 复制槽管理 ========
-- 逻辑复制槽(必须管理好,否则 WAL 会膨胀)
SELECT
    slot_name,
    plugin,
    slot_type,
    database,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots;

-- 删除不活跃的复制槽
SELECT pg_drop_replication_slot('inactive_slot');

-- ======== 备库故障切换流程 ========
-- 1. 在备库检查复制状态
SELECT pg_is_in_recovery();

-- 2. 在备库创建恢复点
-- 需要 pg_rewind(MySQL 到 PG 迁移工具)
-- pg_rewind --target-pgdata=/var/lib/postgresql/data
--            --source-server='host=old-primary port=5432'

-- 3. 提升备库为主库
-- pg_ctl promote

-- 4. 更新订阅连接指向新主库
ALTER SUBSCRIPTION orders_sub CONNECTION
    'host=new-primary port=5432 dbname=mydb user=replicator password=repl_password';

-- ======== 使用 Patroni 高可用 ========
-- Patroni 配置示例(patroni.yml)
-- patroni:
--   name: postgres-node1
--   namespace: /db
--   scope: postgres-cluster
--   restapi:
--     listen: 0.0.0.0:8008
--     connect_address: node1:8008
--   etcd:
--     host: etcd:2379
--   postgresql:
--     name: postgres-node1
--     listen: 0.0.0.0:5432
--     connect_address: node1:5432
--     data_dir: /data/postgresql
--     parameters:
--       wal_level: logical
--       max_wal_senders: 10
--       max_replication_slots: 10
--       wal_keep_size: 1GB
--   slots:
--     orders_slot:
--       database: mydb
--       plugin: pgoutput
--   tags:
--     nofailover: false
--     clonefrom: false

四、性能调优

4.1 EXPLAIN ANALYZE 深度分析

sql 复制代码
-- ======== 执行计划分析基础 ========
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
    u.user_id,
    u.username,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.user_id
WHERE u.created_at >= '2025-01-01'
  AND o.status = 'completed'
GROUP BY u.user_id, u.username
HAVING SUM(o.total_amount) > 1000
ORDER BY total_spent DESC
LIMIT 100;

-- ======== 解读执行计划关键指标 ========
/*
典型输出解析:
Limit  (cost=1234.56..7890.12 rows=100)
  ->  Sort  (cost=1234.56..7890.12 rows=100)
        Sort Key: (sum(o.total_amount)) DESC
        Sort Method: top-N heapsort  BKK: 5
        ->  HashAggregate  (cost=4567.89..5678.90 rows=100)
              Batches: 1  Memory Usage: 523kB
              ->  Hash Right Join  (cost=...->... rows=...)
                    Hash Cond: (o.user_id = u.user_id)
                    ->  Seq Scan on orders o  (cost=... rows=...)
                          Filter: ((status = 'completed') AND ...)
                    ->  Hash  (cost=... rows=...)
                          ->  Seq Scan on users u  (cost=... rows=...)
                                Filter: (created_at >= '2025-01-01')
Planning Time: 1.234 ms
Execution Time: 45.678 ms
*/

-- ======== 常见问题与解决方案 ========
-- 问题1:Seq Scan 全表扫描
SET enable_seqscan = off;  -- 强制使用索引(调试用)

-- 问题2:Nested Loop 过载
SET enable_nestloop = off;

-- 问题3:Hash Join 内存不足
SET work_mem = '256MB';  -- 增加排序/哈希内存

-- 问题4:统计信息不准确
ANALYZE VERBOSE users;  -- 详细模式

-- 生成扩展统计信息(多列相关性)
CREATE STATISTICS s_users_filter (dependencies)
    ON status, created_at FROM users;
CREATE STATISTICS s_orders_join (ndistinct)
    ON user_id, status FROM orders;
ANALYZE;

-- ======== 监控慢查询 ========
-- 开启 pg_stat_statements(需先创建扩展)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- 查询最慢的 SQL
SELECT
    query,
    calls,
    total_exec_time / 1000 AS total_sec,
    mean_exec_time AS avg_ms,
    stddev_exec_time,
    rows / calls AS avg_rows,
    (100 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0))::INT AS cache_hit_pct
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 查询扫描次数最多的表
SELECT
    schemaname,
    relname,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch,
    n_tup_ins,
    n_tup_upd,
    n_tup_del,
    n_live_tup,
    n_dead_tup,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY seq_scan DESC;

4.2 索引优化

sql 复制代码
-- ======== 各类索引适用场景 ========
-- B-tree(默认,最常用):等值查询、范围查询、排序
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_created ON orders (created_at DESC);
CREATE INDEX idx_orders_status_created ON orders (status, created_at);

-- Hash:仅等值查询,速度更快
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);

-- GIN:全文搜索、JSONB 包含查询、数组包含
CREATE INDEX idx_users_interests ON users USING GIN (interests);  -- 数组
CREATE INDEX idx_orders_metadata ON orders USING GIN (metadata); -- JSONB
CREATE INDEX idx_products_search ON products USING GIN (to_tsvector('english', name || ' ' || description));

-- GiST:几何/地理数据、范围类型
CREATE INDEX idx_reservations_timerange ON reservations USING GIST (timerange);

-- Partial Index(部分索引):只索引满足条件的数据
CREATE INDEX idx_orders_pending ON orders (created_at)
    WHERE status = 'pending';
CREATE INDEX idx_users_active ON users (last_login_at)
    WHERE status = 1 AND last_login_at > '2024-01-01';

-- Covering Index(覆盖索引):包含查询所需全部列,避免回表
CREATE INDEX idx_orders_covering ON orders (user_id, created_at DESC)
    INCLUDE (total_amount, status);

-- ======== 表达式索引 ========
CREATE INDEX idx_users_email_lower ON users (lower(email));
CREATE INDEX idx_orders_month ON orders (DATE_TRUNC('month', created_at));
CREATE INDEX idx_products_revenue ON products ((price * stock_quantity));

-- ======== 重建索引 ========
-- 重建单个索引
REINDEX INDEX CONCURRENTLY idx_users_email;

-- 重建表的所有索引
REINDEX TABLE CONCURRENTLY orders;

-- 重建整个数据库的系统索引
REINDEX SYSTEM CONCURRENTLY postgres;

4.3 连接池与并发

sql 复制代码
-- ======== PgBouncer 配置 ========
-- pgbouncer.ini
-- [databases]
-- mydb = host=127.0.0.1 port=5432 dbname=mydb

-- [pgbouncer]
-- listen_port = 6432
-- listen_addr = 127.0.0.1
-- auth_type = md5
-- auth_file = /etc/pgbouncer/userlist.txt
-- pool_mode = transaction  -- 事务级连接池(推荐)
-- max_client_conn = 2000
-- default_pool_size = 25   -- 每个数据库的用户连接数
-- min_pool_size = 5
-- reserve_pool_size = 5
-- reserve_pool_timeout = 5
-- server_lifetime = 3600
-- server_idle_timeout = 600
-- log_connections = 0
-- log_disconnections = 0
-- log_pooler_errors = 1

-- ======== 连接池模式对比 ========
-- session 模式:连接在会话结束时释放
--   - 适合长连接应用
--   - 保证 SET/SETLOCAL 设置有效
--   - 缺点:连接复用率低

-- transaction 模式:事务结束后立即释放连接(推荐)
--   - 高并发场景首选
--   - 缺点:不能在事务外执行 SET(用 SET LOCAL)
--   - 不支持 PREPARE(需要禁用)

-- 语句模式:每个语句后释放
--   - 仅用于极短查询
--   - 不支持事务

-- ======== PostgreSQL 并发参数调优 ========
-- postgresql.conf
max_connections = 200          -- 最大连接数
shared_buffers = 8GB            -- 共享缓冲区(建议 OS 内存的 25%)
effective_cache_size = 24GB     -- 有效缓存(建议 OS 内存的 75%)
work_mem = 256MB                -- 单次排序/哈希内存
maintenance_work_mem = 2GB      -- 维护操作内存(VACUUM/ANALYZE/CREATE INDEX)
max_worker_processes = 16       -- 并行 worker 进程数
max_parallel_workers_per_gather = 4  -- 每个并行节点的最大 worker
max_parallel_workers = 16       -- 全局最大并行 worker
parallel_leader_participation = on
effective_io_concurrency = 200  -- 并行 IO 线程数(SSD 用 200+)
random_page_cost = 1.1         -- 随机页访问成本(SSD 用 1.1,HDD 用 4.0)
default_statistics_target = 500  -- 统计信息目标(越高越精确)

-- ======== 异步提交与可靠性 ========
-- 异步提交:性能更高,可能丢失少量数据
ALTER SYSTEM SET synchronous_commit = off;

-- 同步提交(默认):保证每个事务持久化
ALTER SYSTEM SET synchronous_commit = on;

-- 远程提交:等待 WAL 写入远程备库
ALTER SYSTEM SET synchronous_commit = remote_write;

-- 远程应用:等待备库重放 WAL
ALTER SYSTEM SET synchronous_commit = remote_apply;

-- 查看当前配置
SHOW synchronous_commit;

五、扩展生态

5.1 全文搜索

sql 复制代码
-- ======== 全文搜索基础 ========
-- 创建全文搜索索引
ALTER TABLE articles ADD COLUMN search_vector tsvector;

UPDATE articles SET search_vector =
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(content, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(tags, '')), 'C');

-- 创建 GIN 索引
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

-- 自动更新 search_vector(使用触发器)
CREATE OR REPLACE FUNCTION articles_search_trigger() RETURNS TRIGGER AS $$
BEGIN
    NEW.search_vector :=
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.content, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.tags, '')), 'C');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_update
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();

-- 全文搜索查询
SELECT
    title,
    ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & optimize & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- 高亮搜索结果
SELECT
    title,
    ts_headline('english', content, query, 'MaxWords=50, MinWords=20') AS snippet
FROM articles, to_tsquery('english', 'postgres') query
WHERE search_vector @@ query;

5.2 JSONB 高级操作

sql 复制代码
-- ======== JSONB 增删改查 ========
-- 创建 JSONB 列
CREATE TABLE events (
    event_id  BIGSERIAL PRIMARY KEY,
    event_type VARCHAR(50),
    payload   JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_events_type ON events ((payload->>'type'));
CREATE INDEX idx_events_meta ON events USING GIN (payload);

-- JSONB 操作符
-- -> 获取 JSON 对象(返回 JSON)
SELECT payload->'user' FROM events;

-- ->> 获取 JSON 对象(返回 TEXT)
SELECT payload->>'user_id' FROM events;

-- #> / #>> 获取嵌套路径
SELECT payload#>'{user,profile,age}' FROM events;
SELECT payload#>>'{user,profile,age}' FROM events;

-- @> 包含(是否包含指定 JSON)
SELECT * FROM events WHERE payload @> '{"type":"click"}';

-- ? 存在键
SELECT * FROM events WHERE payload ? 'user_id';

-- JSONB 数组操作
SELECT payload->'items'->0->>'product_id' FROM orders;
SELECT * FROM events WHERE payload->'tags' @> '["mobile", "ios"]'::jsonb;

-- ======== JSONB 函数 ========
-- jsonb_object_keys:展开对象键
SELECT jsonb_object_keys(payload->'metadata')
FROM events
WHERE payload ? 'metadata';

-- jsonb_each:展开键值对
SELECT key, value
FROM events,
     jsonb_each_text(payload)
WHERE event_id = 1;

-- jsonb_agg / jsonb_object_agg:聚合为 JSONB
SELECT
    user_id,
    jsonb_object_agg(event_type, count) AS event_summary
FROM (
    SELECT
        payload->>'user_id' AS user_id,
        payload->>'type' AS event_type,
        COUNT(*) AS count
    FROM events
    GROUP BY 1, 2
) t
GROUP BY user_id;

5.3 TimescaleDB 时序扩展

sql 复制代码
-- ======== TimescaleDB 安装与使用 ========
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- 将普通表转为时序表(Hypertable)
SELECT create_hypertable('sensor_readings',
    time_column => 'recorded_at',
    chunk_time_interval => INTERVAL '1 day',
    migrate_data => true
);

-- 压缩策略(大幅节省存储)
ALTER TABLE sensor_readings SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id'
);

-- 添加压缩策略(7 天后压缩)
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

-- 连续聚合(实时物化视图)
CREATE MATERIALIZED VIEW hourly_stats
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', recorded_at) AS bucket,
    sensor_id,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp,
    COUNT(*) AS reading_count
FROM sensor_readings
GROUP BY 1, 2
WITH NO DATA;

-- 添加刷新策略
SELECT add_continuous_aggregate_policy('hourly_stats',
    start_offset => INTERVAL '3 hours',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);

-- 降采样查询(从压缩数据中快速聚合)
SELECT * FROM hourly_stats
WHERE bucket >= NOW() - INTERVAL '30 days'
  AND sensor_id = 'sensor_001'
ORDER BY bucket DESC;

六、Checklist 总结

复制代码
□ 分区表
  □ Range 分区(时间序列)
  □ List 分区(枚举值/地区)
  □ Hash 分区(均匀散列)
  □ 分区裁剪生效(EXPLAIN 验证)
  □ 自动分区创建(触发器)
  □ 分区维护(归档/删除过期)
  □ 分区索引(本地索引/局部索引)

□ PostGIS
  □ 点/线/多边形数据类型
  □ 空间索引(GiST)
  □ 距离查询(ST_DWithin + 索引)
  □ 最近邻查询(<-> 操作符)
  □ 地理围栏(ST_Within / ST_Contains)
  □ 区域交叉(ST_Intersects)
  □ 轨迹处理(ST_MakeLine)
  □ 坐标转换(ST_Transform)

□ 逻辑复制
  □ Publication 创建与过滤
  □ Subscription 配置
  □ 复制槽管理
  □ 延迟监控
  □ 故障切换流程

□ 性能调优
  □ EXPLAIN ANALYZE 分析
  □ 扩展统计信息
  □ 各类索引(B-tree/GIN/GiST/Hash/Partial/Covering)
  □ PgBouncer 连接池
  □ PostgreSQL 参数调优
  □ 压缩配置

□ 扩展生态
  □ 全文搜索(tsvector/tsquery/GIN)
  □ JSONB 操作与索引
  □ TimescaleDB 时序压缩
  □ PostGIS 轨迹处理

总结

一句话总结: PostgreSQL 高级特性 = 分区裁剪 + PostGIS 空间查询 + 逻辑复制 + 扩展生态,从 CRUD 到 TB 级数据处理全覆盖。

PostgreSQL vs MySQL 高级特性对比:

特性 PostgreSQL MySQL
分区策略 Range/List/Hash/复合分区 RANGE/LIST/HASH/KEY
空间查询 PostGIS(完整 GIS 功能) ST_Geometry(基础)
复制 逻辑复制 + 物理流复制 GTID 异步/半同步
JSON JSONB(原生,支持索引) JSON(支持,功能弱)
全文搜索 内置 + 多种语言支持 InnoDB FTS(基础)
时序扩展 TimescaleDB MySQL HeatWave(商业)
连接池 PgBouncer ProxySQL

下一步推荐:

  • CockroachDB 分布式 SQL 实战(HTAP + 全球分布)
  • TimescaleDB + Grafana 时序监控大盘
  • PostgreSQL 16/17 新特性(列级权限、pg_waldump、Logical Replication 增强)