【GaussDB】 keep聚合SQL 性能优化分析:从 123s 到 18s 的优化之路

【GaussDB】 排序聚合SQL 性能优化分析:从 123s 到 18s 的优化之路 注:本文所有对象名及字段名均已做脱敏处理

一、业务背景与需求描述

1.1 业务场景

某金融交易系统需要查询每日行情数据,针对每个交易品种(asset_id),获取两个时间段的最新行情快照:

  • 全日快照 :在 update_time <= '15:00:00' 范围内,取 seq_no 最大的记录(当日收盘前最新数据)
  • 早盘快照 :在 update_time <= '09:00:00' 范围内,取 seq_no 最大的记录(开盘前最新数据)

需要返回的字段包括交易日期、品种ID、成交额、成交量、持仓量等。

1.2 数据规模

  • data_ops.t_market_data_detail(行情深度明细表)
  • 全表数据量:约 15,198,348 行
  • 过滤后数据量:约 15,189,573 行(特定交易日 + 时间范围 + 指定频道)
  • 最终输出行数:约 2,355 行(按品种分组后的结果)

1.3 查询条件

sql 复制代码
trade_date = '20250627'
update_time <= '15:00:00'
channel_id IN (100, 500, 1000, 5000)

1.4 原始SQL执行耗时对比

ORACLE 19.10 GaussDB 506.0 SPC0500 集中式
30s 123s

二、初始 SQL 分析(方案 0)

2.1 原始 SQL

(作者评:写出这个SQL的是天才,褒义的那种)

sql 复制代码
SELECT MIN(trade_date) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS trade_date,
       MIN(asset_id) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS asset_id,
       MIN(amount) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS amount,
       MIN(qty) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS qty,
       MIN(oi_value) KEEP(dense_rank FIRST ORDER BY seq_no DESC NULLS LAST) AS oi_value,
       MIN(qty) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_qty,
       MIN(amount) KEEP(dense_rank FIRST ORDER BY t2_seq_no DESC NULLS LAST) AS t2_amount
FROM (SELECT seq_no,
             CASE WHEN update_time > '09:00:00' THEN NULL ELSE seq_no END AS t2_seq_no,
             trade_date, asset_id, amount, qty, oi_value
      FROM data_ops.t_market_data_detail s
      WHERE trade_date = '20250627'
        AND update_time <= '15:00:00'
        AND (channel_id = 1000 OR channel_id = 5000 OR channel_id = 100 OR channel_id = 500))
GROUP BY asset_id;

2.2 完整执行计划

sql 复制代码
 id |                   operation                   |   A-time   |    A-rows    |    E-rows    | Peak Memory | A-width | E-width |        E-costs         
----+-----------------------------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------
 1  | ->  GroupAggregate                            | 123462.280 |         2355 |         2354 | 71kB        |         |     273 | 3241514.039..3659250.836
 2  |      ->  Sort                                 |   68854.676 |    15189573 |    15189573 | 1048966kB   |         |      49 | 3241514.039..3279487.971
 3  |           ->  Seq Scan on t_market_data_detail s | 16408.833 |    15189573 |    15189573 | 53kB        |         |      49 |   0.000..730253.225
(3 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 3 --Seq Scan on t_market_data_detail s
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten optimized), (Predicates Reordered)
     Rows Removed by Filter: 8775
(3 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 2 --Sort
       Sort Method: external merge  Disk: 938544kB
(2 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.179 ms
Datanode executor run time: 123468.607 ms
Datanode executor end time: 74.734 ms
Planner runtime: 0.873 ms
Query Id: 1946962414165264683
Total runtime: 123543.548 ms
(6 rows)

总耗时:123,543 ms(约 2 分钟)

执行计划摘要
节点 操作 实际耗时 行数
1 GroupAggregate 123,462 ms 2,355
2 Sort 68,854 ms 15,189,573
3 Seq Scan 16,408 ms 15,189,573

2.3 核心问题诊断

  1. 全表顺序扫描:无任何索引可用,对近 1500 万行数据进行全表扫描
  2. 单次大排序 :Sort 节点耗时 68.8s,使用外部归并排序,磁盘 I/O 达 938 MB
  3. KEEP 聚合函数开销:每个 KEEP 子句都需要在分组时维护有序状态,7 个 KEEP 表达式叠加计算成本极高
  4. 串行执行:未启用并行查询,单线程处理全部数据

三、优化方案演进

方案 1:启用并行查询(query_dop = 4)

3.1 优化措施
sql 复制代码
SET query_dop = 4;
-- SQL 逻辑不变
3.2 完整执行计划
sql 复制代码
 id |                    operation                   |            A-time             |    A-rows    |    E-rows    |   Peak Memory   |  A-width  | E-width |        E-costs         
----+------------------------------------------------+-------------------------------+--------------+--------------+-----------------+-----------+---------+------------------------
 1  | ->  Streaming(type: LOCAL GATHER dop: 1/4)     | [43143.728,43143.728]         |         2355 |         2354 | [136kB,136kB]   |           |     273 | 2837486.031..3255350.306
 2  |      -> GroupAggregate                         | [42060.014,43116.415]         |         2355 |         2354 | [63kB,63kB]     |           |     273 | 2837486.031..3255222.828
 3  |           -> Sort                              | [28888.504,29384.295]         |    15189573 |    15189573 | [261MB,263MB]   | [68,68]   |      49 | 2837486.031..2875459.963
 4  |              -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [18407.666,18551.991] |    15189573 |    15189573 | [134kB,134kB]   |           |      49 |   0.000..850777.167
 5  |                    -> Seq Scan on t_market_data_detail s | [4196.041,4228.979]     |    15189573 |    15189573 | [46kB,46kB]     |           |      49 |   0.000..183563.306
(5 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 5 --Seq Scan on t_market_data_detail s
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND ((channel_id = 100) OR (channel_id = 500) OR (channel_id = 1000) OR (channel_id = 5000))), (Expression Flatten Optimized), (Predicates Reordered)
     Rows Removed by Filter: 8775
(3 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

Sort Method: external merge  Memory: 1048576kB ~ 0kB
3 --Sort
       Sort Method: external merge  Disk: 231288kB ~ 231288kB
(3 rows)

                                 User Define Profiling
---------------------------------------------------------------------------------------------------------------------------

Segment Id: 1 Track name: Datanode build connection
     (actual time=[0.000, 0.000], calls=[1, 1])
Plan Node id: 1 Track name: Datanode start up stream thread
     (actual time=[0.347, 0.347], calls=[1, 1]))
(4 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 1.041 ms
Datanode executor run time: 43144.837 ms
Datanode executor end time: 0.065 ms
Planner runtime: 34.099 ms
Query Id: 1946962414165247362
Total runtime: 43145.962 ms
(6 rows)

总耗时:43,145 ms(约 43 秒)

执行计划摘要
节点 操作 实际耗时 说明
1 Streaming (LOCAL GATHER) 43,143 ms 并行结果汇聚
2 GroupAggregate 42,060~43,116 ms 4 个线程并行聚合
3 Sort 28,888~29,384 ms 每个线程排序约 1/4 数据
4 Streaming (LOCAL REDISTRIBUTE) 18,407~18,551 ms 数据重分布
5 Seq Scan 4,196~4,228 ms 并行扫描,每个线程约 4s
3.3 效果分析
  • 性能提升:65%(从 123s → 43s)
  • 磁盘排序量从 938MB 降至各线程约 231MB
  • 扫描时间从 16.4s 降至约 4.2s(接近理想的 4 倍加速比)
  • 局限性:仍然需要全表扫描 + 大排序,KEEP 聚合函数的本质开销未解决

方案 2:DISTINCT ON + 覆盖索引重构

3.4 优化思路

将 KEEP 聚合改写为 PostgreSQL 特有的 DISTINCT ON 语法,同时创建覆盖索引避免回表。

3.5 DDL 变更
sql 复制代码
-- 创建测试表(数据副本)
CREATE TABLE t_test_staging AS SELECT * FROM data_ops.t_market_data_detail;

-- 创建覆盖索引:排序键 + 包含所有查询字段
CREATE INDEX idx_opt_001 ON t_test_staging (asset_id, seq_no DESC)
    INCLUDE (trade_date, amount, qty, oi_value, update_time, channel_id);
3.6 SQL 改写
sql 复制代码
SELECT a.asset_id, a.trade_date, a.amount, a.qty, a.oi_value, n.t2_qty, n.t2_amount
FROM (SELECT DISTINCT ON (asset_id) asset_id, trade_date, amount, qty, oi_value
      FROM t_test_staging
      WHERE trade_date = '20250627' AND update_time <= '15:00:00'
        AND channel_id IN (100, 500, 1000, 5000)
      ORDER BY asset_id, seq_no DESC NULLS LAST) a
LEFT JOIN (SELECT DISTINCT ON (asset_id) asset_id, qty AS t2_qty, amount AS t2_amount
           FROM t_test_staging
           WHERE trade_date = '20250627' AND update_time <= '09:00:00'
             AND channel_id IN (100, 500, 1000, 5000)
           ORDER BY asset_id, seq_no DESC NULLS LAST) n
USING (asset_id);
3.7 完整执行计划
sql 复制代码
 id |                                      operation                                       |   A-time   |    A-rows    | E-rows |  Peak Memory  | A-width | E-width |        E-costs         
----+--------------------------------------------------------------------------------------+------------+--------------+--------+---------------+---------+---------+------------------------
 1  | -> Hash Left Join (2, 5)                                                             |   40774.030|         2355 |      1 |          24kB |     204 |         | 1142568.006..1142570.563
 2  |    -> Unique                                                                         |    33574.824|        2355 |      1 |          10kB |     182 |         |   571282.729..571285.264
 3  |         -> Sort                                                                      |    30952.123|     15189573|    507 |      104896kB |     182 |         |   571282.729..571283.997
 4  |              -> Index Only Scan using idx_opt_001 on t_test_staging                  |    12362.086|     15189573|    507 |         112kB |     182 |         |       0.000..571259.950
 5  |      -> Hash                                                                         |     7125.826|        2355 |      1 |         427kB |     114 |         |   571285.274..571285.274
 6  |           -> Subquery Scan on n                                                      |     7124.697|        2355 |      1 |          13kB |     114 |         |   571282.729..571285.274
 7  |                -> Unique                                                             |     7123.976|        2355 |      1 |          10kB |     128 |         |   571282.729..571285.264
 8  |                     -> Sort                                                          |     7123.138|        2373 |    507 |          3kB  |     128 |         |   571282.729..571283.997
 9  |                          -> Index Only Scan using idx_opt_001 on t_test_staging      |     7112.291|        2373 |    507 |         112kB |     128 |         |       0.000..571259.950
(9 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 1 --Hash Left Join (2, 5)
      Hash Cond: ((public.t_test_staging.asset_id)::text = (n.asset_id)::text), (Expression Flatten Optimized)
 4 --Index Only Scan using idx_opt_001 on t_test_staging
     Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '15:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
     Rows Removed by Filter: 8775
 9 --Index Only Scan using idx_opt_001 on t_test_staging
     Filter: (((trade_date)::text = '20250627'::text) AND ((update_time)::text <= '09:00:00'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
     Rows Removed by Filter: 15195975
(8 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

3 --Sort
       Sort Method: external merge  Disk: 804784kB
Buckets: 32768 Batches: 1 Memory Usage: 124kB
8 --Sort
       Sort Method: quicksort  Memory: 319kB
(5 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.114 ms
Datanode executor run time: 40775.241 ms
Datanode executor end time: 0.047 ms
Planner runtime: 0.770 ms
Query Id: 1946962414165250608
Total runtime: 40775.440 ms
(6 rows)

总耗时:40,775 ms(约 41 秒)

执行计划摘要
节点 操作 实际耗时 说明
1 Hash Left Join 40,774 ms 合并两个子查询结果
2-4 Unique/Sort/Index Only Scan(全日) 33,574 ms 覆盖索引扫描
5-9 Hash/Subquery/Unique/Sort/Index Only Scan(早盘) 7,125 ms 覆盖索引扫描
3.8 效果分析
  • 相比方案 0:提升 67%(从 123s → 41s)
  • Index Only Scan:完全通过索引获取数据,无需回表
  • 逻辑拆分:将两个时间段的查询拆为独立子查询,早盘查询过滤后仅 2,373 行
  • 局限性:仍然需要对全量数据进行 Sort(804MB 磁盘排序),未启用并行

方案 3:窗口函数 + 条件排名(探索性尝试)

3.9 SQL改写
sql 复制代码
SELECT asset_id,
       MAX(CASE WHEN rn_all = 1 THEN trade_date END) AS trade_date,
       MAX(CASE WHEN rn_all = 1 THEN amount END) AS amount,
       MAX(CASE WHEN rn_all = 1 THEN qty END) AS qty,
       MAX(CASE WHEN rn_all = 1 THEN oi_value END) AS oi_value,
       MAX(CASE WHEN rn_nine = 1 THEN qty END) AS t2_qty,
       MAX(CASE WHEN rn_nine = 1 THEN amount END) AS t2_amount
FROM (
      SELECT asset_id, trade_date, amount, qty, oi_value,
             -- 1 update_time<=15:00 seq_no KEEP
             ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY seq_no DESC NULLS LAST) AS rn_all,
             -- 2 update_time<=09:00 seq_no KEEPCASE
             ROW_NUMBER() OVER (PARTITION BY asset_id ORDER BY CASE WHEN update_time <= '09:00:00' THEN seq_no END DESC NULLS LAST) AS rn_nine
      FROM (
        SELECT asset_id, trade_date, amount, qty, oi_value, seq_no, update_time
        FROM data_ops.t_market_data_detail
        WHERE trade_date = '20250627'
          AND update_time <= '15:00:00'
          AND channel_id IN (100, 500, 1000, 5000)
      ) base
    ) ranked
GROUP BY asset_id;
3.10 完整执行计划
sql 复制代码
 id |        operation       |   A-time   |    A-rows    |    E-rows    | Peak Memory | A-width | E-width |        E-costs         
----+------------------------+------------+--------------+--------------+-------------+---------+---------+------------------------
 1  | -> GroupAggregate      |  163316.589|         2355 |        2356 |        60kB |     243 |         | 6003628.088..6988339.898
 2  |      -> windowAgg      |   155457.946|    15189573 |    15149050 |        32kB |      49 |         | 6003628.088..6344481.713
 3  |           -> Sort      |   144296.753|    15189573 |    15149050 |      6557kB |      49 |         | 6003628.088..6041500.713
 4  |                  -> windowAgg |   119080.339|    15189573 |    15149050 |        32kB |      49 |         | 3158506.822..3499360.447
 5  |                         -> Sort |   107443.791|    15189573 |    15149050 |      6557kB |      49 |         | 3158506.822..3196379.447
 6  |                                -> Seq Scan on t_market_data_detail |  21274.101|    15189573 |    15149050 |        4kB  |      49 |         |   0.000..654239.180
(6 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 6 --Seq Scan on t_market_data_detail
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{100,500,1000,5000}'::numeric[]))), (Expression Flatten Optimizd), (Predicates Reordered)
     Rows Removed by Filter: 8775
(3 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

3 --Sort
       Sort Method: external merge  Disk: 965808kB
5 --Sort
       Sort Method: external merge  Disk: 804920kB
(4 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 0.170 ms
Datanode executor run time: 163319.509 ms
Datanode executor end time: 60.965 ms
Planner runtime: 0.911 ms
Query Id: 1946962414165494361
Total runtime: 163380.667 ms
(6 rows)

总耗时:163,380 ms(约 163 秒) ⚠️ 性能最差

执行计划摘要
节点 操作 实际耗时
1 GroupAggregate 163,316 ms
2 WindowAgg 155,457 ms
3 Sort 144,296 ms
4 WindowAgg 119,080 ms
5 Sort 107,443 ms
6 Seq Scan 21,274 ms
3.11 失败原因分析
  • 双重窗口函数:需要两次独立排序,分别产生 965MB 和 804MB 的磁盘 I/O
  • 全量数据排名:对 1500 万行逐行计算 ROW_NUMBER,然后外层再聚合
  • 无并行、无索引:退化为最差的串行全表扫描模式
  • 结论:此方案不适用于大数据量场景,窗口函数在数据量过大时开销显著

方案 4:两阶段聚合 + 反向查找(最终最优方案)⭐

3.12 核心思路

"先找序列号,再查明细" ------ 将问题分解为两个阶段:

  1. 第一阶段(聚合) :按品种分组,直接计算出每个品种在两个时间段的最大 seq_no
  2. 第二阶段(回查) :用计算出的 seq_no 精确匹配原表,获取完整行数据
3.13 SQL 实现
sql 复制代码
SELECT /*+set(query_dop 4)*/
       asset_id,
       MAX(CASE WHEN data_type = 'GROUP_09' THEN amount END) AS amount_09,
       MAX(CASE WHEN data_type = 'GROUP_09' THEN qty END) AS qty_09,
       MAX(CASE WHEN data_type = 'GROUP_15' THEN trade_date END) AS trade_date_15,
       MAX(CASE WHEN data_type = 'GROUP_15' THEN amount END) AS amount_15,
       MAX(CASE WHEN data_type = 'GROUP_15' THEN qty END) AS qty_15
FROM (
      SELECT o.asset_id, o.seq_no, o.trade_date, o.amount, o.qty, o.oi_value,
             CASE
                 WHEN o.seq_no = kp.max_seq_no_09 THEN 'GROUP_09'
                 WHEN o.seq_no = kp.max_seq_no_15 THEN 'GROUP_15'
             END AS data_type
      FROM (
        -- 第一阶段:聚合出每个品种的最大 seq_no
        SELECT asset_id,
               MAX(CASE WHEN update_time <= '09:00:00' THEN seq_no END) AS max_seq_no_09,
               MAX(seq_no) AS max_seq_no_15
        FROM data_ops.t_market_data_detail
        WHERE trade_date = '20250627'
          AND update_time <= '15:00:00'
          AND channel_id IN (1000, 5000, 100, 500)
        GROUP BY asset_id
      ) kp
      JOIN data_ops.t_market_data_detail o
        ON o.asset_id = kp.asset_id
        AND o.trade_date = '20250627'
        AND o.channel_id IN (1000, 5000, 100, 500)
        AND o.seq_no IN (kp.max_seq_no_09, kp.max_seq_no_15)
     ) TargetRows
GROUP BY asset_id;
3.14 完整执行计划
sql 复制代码
 id |                            operation                           |            A-time             |    A-rows   | E-rows  |   Peak Memory   |  A-width  | E-width |        E-costs         
----+----------------------------------------------------------------+-------------------------------+-------------+---------+-----------------+-----------+---------+------------------------
 1  | -> Streaming(type: LOCAL GATHER dop: 1/4)                      | [18111.000,18111.000]         |        2355 |       3 | [134kB,134kB]   |           |     297 | 482084.094..482084.278
 2  |      -> HashAggregate                                          | [18099.159,18104.285]         |        2355 |       3 | [377kB,393kB]   | [27,27]   |     297 | 482084.094..482084.102
 3  |           -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4)      | [18093.913,18099.183]         |        4710 |       3 | [136kB,136kB]   |           |     105 | 203747.245..482084.047
 4  |                  -> Hash Join (5,6)                            | [17880.987,18086.743]         |        4710 |       3 | [24kB,24kB]     |           |     105 | 203747.245..482083.833
 5  |                      -> Seq Scan on t_market_data_detail o     | [3808.056,3887.799]           |   15191946 | 15196559| [41kB,41kB]     |           |      41 |   0.000..155058.039
 6  |                      -> Hash                                  | [8385.531,8404.325]           |        9420 |    9392 | [419kB,419kB]   | [39,39]   |      75 | 203717.895..203717.895
 7  |                           -> Streaming(type: BROADCAST dop: 4/4) | [8384.467,8403.213]      |        9420 |    9392 | [123kB,123kB]   |           |      75 | 202593.157..203717.895
 8  |                                -> Subquery Scan on kp         | [8379.329,8386.988]           |        2355 |    2348 | [5kB,5kB]       |           |      75 | 202593.157..202622.507
 9  |                                      -> HashAggregate         | [8379.154,8386.834]           |        2355 |    2348 | [178kB,182kB]   | [27,27]   |      89 | 202593.157..202599.027
10  |                                           -> Streaming(type: LOCAL REDISTRIBUTE dop: 4/4) | [8376.756,8384.340] |        9420 |    2348 | [114kB,114kB]   |           |      89 | 202432.420..202587.287
11  |                                                  -> HashAggregate | [7870.510,8379.085]       |        9420 |    2348 | [539kB,622kB]   | [27,27]   |      89 | 202432.420..202438.290
12  |                                                       -> Seq Scan on t_market_data_detail | [5167.157,5347.326] |   15189573 | 15149050| [37kB,37kB]     |           |      25 |   0.000..164559.795
(12 rows)

                                    Predicate Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

 4 --Hash Join (5,6)
      Hash Cond: ((o.asset_id)::text = (kp.asset_id)::text), (Expression Flatten Optimized)
      Join Filter: (((o.seq_no = kp.max_seq_no_09) OR (o.seq_no = kp.max_seq_no_15)), (Expression Flatten Optimized)
      Rows Removed by Join Filter: 15187236
 5 --Seq Scan on t_market_data_detail o
     Filter: (((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimized), (Predicates Reordered)
     Rows Removed by Filter: 6402
12 --Seq Scan on t_market_data_detail
     Filter: (((update_time)::text <= '15:00:00'::text) AND ((trade_date)::text = '20250627'::text) AND (channel_id = ANY ('{1000,5000,100,500}'::numeric[]))), (Expression Flatten Optimiz
d), (Predicates Reordered)
     Rows Removed by Filter: 8775
(10 rows)

                                   Memory Information (identified by plan id)
---------------------------------------------------------------------------------------------------------------------------

6 --Hash
       Max Buckets: 32768  Max Batches: 1  Max Memory Usage: 129kB
       Min Buckets: 32768  Min Batches: 1  Min Memory Usage: 129kB
(3 rows)

                                 User Define Profiling
---------------------------------------------------------------------------------------------------------------------------

Segment Id: 1 Track name: Datanode build connection
     (actual time=[0.000, 0.000], calls=[1, 1])
Plan Node id: 1 Track name: Datanode start up stream thread
     (actual time=[1.292, 1.292], calls=[1, 1]))
(4 rows)


================ Query Summary ================
-----------------------------------------------
Datanode executor start time: 2.436 ms
Datanode executor run time: 18111.228 ms
Datanode executor end time: 0.118 ms
Planner runtime: 5.341 ms
Query Id: 194696241416526033
Total runtime: 18113.822 ms
(6 rows)

总耗时:18,113 ms(约 18 秒)

执行计划摘要
节点 操作 实际耗时 说明
1 Streaming (LOCAL GATHER) 18,111 ms 并行汇聚
2 HashAggregate 18,099~18,104 ms 最终聚合
3 Streaming (LOCAL REDISTRIBUTE) 18,093~18,099 ms 数据重分布
4 Hash Join 17,880~18,086 ms 两阶段 JOIN
5 Seq Scan (o) 3,808~3,887 ms 明细表并行扫描
6-12 Hash/Broadcast/Subquery/HashAggregate 5,167~8,404 ms 聚合阶段并行执行
3.15 效果分析
对比项 方案 0 方案 1 方案 2 方案 3 方案 4
总耗时 123,543 ms 43,145 ms 40,775 ms 163,380 ms 18,113 ms
相对方案 0 100% 34.9% 33.0% 132.3% 14.7%
性能提升 - 65.1% 67.0% -32.3% 85.3%
并行度 DOP=4 DOP=4
排序方式 外部归并(938MB) 外部归并(231MB×4) 外部归并(804MB) 外部归并(1.7GB) HashAggregate
索引使用 Index Only 无需索引

四、优化策略总结

4.1 关键优化点

(1)避免大排序 ------ 用 HashAggregate 替代 Sort
复制代码
方案 0/1/2/3:扫描 → 排序(磁盘) → 聚合    【瓶颈在排序】
方案 4:      扫描 → HashAggregate → JOIN  【无大排序】

方案 4 的核心突破在于:第一阶段直接用 MAX() 聚合找到目标 seq_no,完全避免了排序操作。HashAggregate 的内存开销仅为几百 KB,远小于磁盘排序的数百 MB。

(2)减少数据流转量
方案 排序/聚合的数据量 最终输出
0/1/3 15,189,573 行(全量) 2,355 行
2 15,189,573 + 2,373 行 2,355 行
4 2,355 行(聚合后)+ 最多 4,710 行(回查) 2,355 行

方案 4 在第一阶段就将 1500 万行压缩为 2,355 个品种的最大 seq_no,第二阶段仅用这些 seq_no 精确回查,JOIN 后最多返回约 4,710 行。

(3)并行计算加速

方案 4 充分利用 GaussDB 的SMP能力:

  • LOCAL REDISTRIBUTE:将数据均匀分发到 4 个线程
  • BROADCAST:将小的聚合结果广播到所有线程
  • LOCAL GATHER:汇聚各线程的局部结果
(4)单次表扫描的设计哲学

虽然方案 4 在逻辑上"读了两遍表"(一次聚合 + 一次回查),但两遍扫描是并行分布在不同线程执行的,且第二遍扫描通过 Hash Join 的过滤条件大幅减少了有效数据量。相比方案 0/1 中排序带来的磁盘 I/O,两次内存友好的扫描反而更高效。

4.2 优化路径总结

复制代码
┌─────────────────────────────────────────────────────────────┐
│  初始状态:KEEP 聚合 + 全表扫描 + 大排序                     │
│  耗时: 123s                                                  │
│                           ↓                                 │
│  第一步:启用并行 (query_dop=4)                              │
│  效果: 123s → 43s(65% 提升)                                │
│  分析: 并行有效但未解决排序瓶颈                               │
│                           ↓                                 │
│  第二步:DISTINCT ON + 覆盖索引(探索方向)                   │
│  效果: 123s → 41s(67% 提升)                                │
│  分析: Index Only Scan 有效,但仍需大排序                     │
│                           ↓                                 │
│  第三步:窗口函数(反面教材)                                 │
│  效果: 123s → 163s(性能下降!)                              │
│  分析: 双重窗口 + 双重排序 = 灾难                             │
│                           ↓                                 │
│  第四步:两阶段聚合 + 反向查找 ⭐                             │
│  效果: 123s → 18s(85% 提升)                                │
│  分析: HashAggregate 替代排序 + 并行 = 最优解                │
└─────────────────────────────────────────────────────────────┘

4.3 通用优化建议

  1. 优先消除排序 :对于 "取每组最大值对应行" 的场景,先用 MAX()/MIN() 聚合出目标键值,再回查明细,比直接排序更高效
  2. KEEP/FIRST_VALUE 类函数的替代方案:当数据量大时,考虑用"先聚合后 JOIN"的模式替代
  3. 并行是基础优化 :在 GaussDB 全表扫描性能相比ORACLE差很多, query_dop 是最直接的优化手段
  4. 覆盖索引有前提:Index Only Scan 虽好,但如果查询仍需大排序,收益有限;且维护索引有额外成本
  5. 窗口函数需谨慎:窗口函数在数据量超过内存容量时会产生大量磁盘 I/O,不适合作为大数据量的首选方案
  6. 通过执行计划验证假设 :每次改写后务必用 EXPLAIN ANALYZE 验证实际效果,避免"理论上更好"但实际更差的情况

相关推荐
DarkAthena1 天前
【GaussDB】排查报错“ERROR: Plpgsql estate is null“
gaussdb
DarkAthena2 天前
【DuckDB】一条 SQL 同时连接 Oracle 和 GaussDB
oracle·gaussdb·duckdb
Gauss松鼠会3 天前
GaussDB(DWS) 日常维护命令
服务器·数据库·postgresql·性能优化·gaussdb·经验总结
Gauss松鼠会3 天前
GaussDB(DWS) GUC参数修改、查看
java·数据库·sql·数据库开发·gaussdb
UtopianCoding4 天前
数据库语法对比详细规则
数据库·mysql·gaussdb
Gauss松鼠会4 天前
GaussDB(DWS) 资源监控Topsql
java·网络·数据库·算法·oracle·性能优化·gaussdb
Gauss松鼠会6 天前
【GaussDB】基于SpringBoot实现操作GaussDB(DWS)的项目实战
java·数据库·经验分享·spring boot·后端·sql·gaussdb
Gauss松鼠会6 天前
【GaussDB】GaussDB 常见问题及解决方案汇总
java·数据库·算法·性能优化·gaussdb·经验总结
DarkAthena10 天前
【GaussDB】讨论下for update游标加锁的设计
gaussdb