帮朋友优化一条很简单的窗口函数 ROW_NUMBER() OVER() , Oracle 迁移 PostgreSQL项目。
原始SQL和执行计划
STUDENT_BAK 表我模拟的数据,3千万行数据。
sql
SELECT STU_ID, STU_NAME, STU_SEX, STU_AGE, STU_DATE, STU_CLASSID, STU_SAL
FROM (SELECT SB.*,
ROW_NUMBER() OVER (PARTITION BY SB.STU_CLASSID,SB.STU_AGE,SB.STU_SEX ORDER BY SB.STU_DATE DESC) RN
FROM STUDENT_BAK SB) X
WHERE RN = 1;
Oracle 执行计划
sql
--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5400 |00:00:28.92 | 160K| 166K| 5702 |
|* 1 | VIEW | | 1 | 30M| 5400 |00:00:28.92 | 160K| 166K| 5702 |
|* 2 | WINDOW SORT PUSHED RANK| | 1 | 30M| 12919 |00:00:28.92 | 160K| 166K| 5702 |
| 3 | TABLE ACCESS FULL | STUDENT_BAK | 1 | 30M| 30M|00:00:01.44 | 160K| 160K| 0 |
--------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
" 1 - filter(""RN""=1)"
" 2 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC "
)<=1)
PostgreSQL 执行计划
sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on x (cost=5085273.94..6210240.94 rows=149996 width=33) (actual time=64787.749..80656.358 rows=5400 loops=1)
Filter: (x.rn = 1)
Rows Removed by Filter: 29994600
-> WindowAgg (cost=5085273.94..5835251.94 rows=29999120 width=41) (actual time=64787.747..79956.535 rows=30000000 loops=1)
-> Sort (cost=5085273.94..5160271.74 rows=29999120 width=33) (actual time=64787.728..73281.302 rows=30000000 loops=1)
Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex, sb.stu_date DESC
Sort Method: external merge Disk: 1369904kB
-> Seq Scan on student_bak sb (cost=0.00..539329.20 rows=29999120 width=33) (actual time=0.008..20064.258 rows=30000000 loops=1)
Planning Time: 0.123 ms
Execution Time: 80764.258 ms
(10 行记录)
Oracle:28秒出结果,PG:80秒出结果,两者执行时间差距很大。
优化方案一、使用分区表
将 STUDENT_BAK 表的 STU_CLASSID 列作为 LIST 分区
Oracle 使用分区表优化后执行计划:
sql
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5400 |00:00:33.62 | 167K| 167K|
| 1 | PARTITION LIST ALL | | 1 | 30M| 5400 |00:00:33.62 | 167K| 167K|
|* 2 | VIEW | | 300 | 30M| 5400 |00:00:21.59 | 167K| 167K|
|* 3 | WINDOW SORT PUSHED RANK| | 300 | 30M| 12919 |00:00:18.20 | 167K| 167K|
| 4 | TABLE ACCESS FULL | STUDENT_PAR_BAK | 300 | 30M| 30M|00:00:01.15 | 167K| 167K|
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
" 2 - filter(""RN""=1)"
" 3 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY "
" INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC )<=1)"
PostgreSQL 使用分区表优化后执行计划:
sql
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on x (cost=10416416.10..11541288.60 rows=149983 width=450) (actual time=54656.336..63401.133 rows=5400 loops=1)
Filter: (x.rn = 1)
Rows Removed by Filter: 29994600
-> WindowAgg (cost=10416416.10..11166331.10 rows=29996600 width=458) (actual time=54656.333..62742.542 rows=30000000 loops=1)
-> Sort (cost=10416416.10..10491407.60 rows=29996600 width=450) (actual time=54656.303..56466.219 rows=30000000 loops=1)
Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex, sb.stu_date DESC
Sort Method: external merge Disk: 1369904kB
-> Append (cost=0.00..539445.00 rows=29996600 width=450) (actual time=0.010..20511.558 rows=30000000 loops=1)
-> Seq Scan on student_par_bak_p0 sb (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..67.413 rows=100240 loops=1)
-> Seq Scan on student_par_bak_p1 sb_1 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..67.507 rows=100169 loops=1)
-> Seq Scan on student_par_bak_p2 sb_2 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.013..65.356 rows=100260 loops=1)
-> Seq Scan on student_par_bak_p3 sb_3 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.009..65.545 rows=100052 loops=1)
-> Seq Scan on student_par_bak_p4 sb_4 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..65.411 rows=100078 loops=1)
-> Seq Scan on student_par_bak_p5 sb_5 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.010..65.672 rows=99757 loops=1)
-> Seq Scan on student_par_bak_p6 sb_6 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.007..66.148 rows=100510 loops=1)
-> Seq Scan on student_par_bak_p7 sb_7 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.008..65.720 rows=99636 loops=1)
-> Seq Scan on student_par_bak_p8 sb_8 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.013..63.716 rows=100625 loops=1)
-> Seq Scan on student_par_bak_p9 sb_9 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.007..64.223 rows=99965 loops=1)
-> Seq Scan on student_par_bak_p10 sb_10 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.009..65.790 rows=100186 loops=1)
-> Seq Scan on student_par_bak_p11 sb_11 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.011..65.542 rows=100171 loops=1)
-> Seq Scan on student_par_bak_p12 sb_12 (cost=0.00..1798.15 rows=99989 width=450) (actual time=0.015..64.736 rows=99841 loops=1)
/* 此处省略 200 行分区表计划 */
Planning Time: 0.688 ms
Execution Time: 63466.714 ms
(310 行记录)
Oracle:33秒出结果,PG:63秒出结果。
Oracle使用分区表后反而比原来的单表慢了5秒,但是PG使用分区表以后比原来快了17秒(效果不错)。
优化方案二、分区表的基础上使用并行
*Oracle 语句添加HINT /*+ parallel(8) / 执行计划
sql
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5400 |00:00:02.80 | 910 |
| 1 | PX COORDINATOR | | 1 | | 5400 |00:00:02.80 | 910 |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 0 | 30M| 0 |00:00:00.01 | 0 |
|* 3 | VIEW | | 0 | 30M| 0 |00:00:00.01 | 0 |
|* 4 | WINDOW SORT PUSHED RANK | | 0 | 30M| 0 |00:00:00.01 | 0 |
| 5 | PX RECEIVE | | 0 | 30M| 0 |00:00:00.01 | 0 |
| 6 | PX SEND HASH | :TQ10000 | 0 | 30M| 0 |00:00:00.01 | 0 |
|* 7 | WINDOW CHILD PUSHED RANK| | 0 | 30M| 0 |00:00:00.01 | 0 |
| 8 | PX BLOCK ITERATOR | | 0 | 30M| 0 |00:00:00.01 | 0 |
|* 9 | TABLE ACCESS FULL | STUDENT_PAR_BAK | 0 | 30M| 0 |00:00:00.01 | 0 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
" 3 - filter(""RN""=1)"
" 4 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY "
" INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC )<=1)"
" 7 - filter(ROW_NUMBER() OVER ( PARTITION BY ""SB"".""STU_CLASSID"",""SB"".""STU_AGE"",""SB"".""STU_SEX"" ORDER BY "
" INTERNAL_FUNCTION(""SB"".""STU_DATE"") DESC )<=1)"
9 - access(:Z>=:Z AND :Z<=:Z)
Note
-----
- Degree of Parallelism is 8 because of hint
*PostgreSQL 语句添加HINT /*+ parallel(8 sb hard) / 执行计划
sql
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on x (cost=9876971.10..11001843.60 rows=149983 width=450) (actual time=52118.006..61222.471 rows=5400 loops=1)
Filter: (x.rn = 1)
Rows Removed by Filter: 29994600
-> WindowAgg (cost=9876971.10..10626886.10 rows=29996600 width=458) (actual time=52118.003..60558.094 rows=30000000 loops=1)
-> Sort (cost=9876971.10..9951962.60 rows=29996600 width=450) (actual time=52117.986..54214.307 rows=30000000 loops=1)
Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex, sb.stu_date DESC
Sort Method: external merge Disk: 1369904kB
-> Gather (cost=0.00..0.00 rows=29996600 width=450) (actual time=2.655..9516.724 rows=30000000 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Append (cost=0.00..0.00 rows=3749575 width=450) (actual time=0.020..10158.207 rows=3333333 loops=9)
-> Parallel Seq Scan on student_par_bak_p0 sb (cost=0.00..0.00 rows=12499 width=450) (actual time=0.018..400.671 rows=100240 loops=1)
-> Parallel Seq Scan on student_par_bak_p1 sb_1 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.018..410.512 rows=100169 loops=1)
-> Parallel Seq Scan on student_par_bak_p2 sb_2 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.024..344.441 rows=100260 loops=1)
-> Parallel Seq Scan on student_par_bak_p3 sb_3 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.021..337.809 rows=100052 loops=1)
-> Parallel Seq Scan on student_par_bak_p4 sb_4 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.019..406.484 rows=100078 loops=1)
-> Parallel Seq Scan on student_par_bak_p5 sb_5 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.020..385.837 rows=99757 loops=1)
-> Parallel Seq Scan on student_par_bak_p6 sb_6 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.023..259.886 rows=100510 loops=1)
-> Parallel Seq Scan on student_par_bak_p7 sb_7 (cost=0.00..0.00 rows=12499 width=450) (actual time=0.019..293.689 rows=99636 loops=1)
/* 此处省略 200 行分区表计划 */
Planning Time: 0.564 ms
Execution Time: 61279.150 ms
(313 行记录)
Oracle:2.8秒出结果,PG:61秒出结果。
Oracle使用并行以后提升巨大、PG 感觉并行感觉没有对row_number进行很好的优化,和原来分区表执行时间差别不大:61s。
优化方案三、ROW_NUMBER() OVER() 等价改写
在原来分区表、8个并行的基础上等价改写 ROW_NUMBER() OVER() 逻辑
等价改写SQL不放出来,看到这里有想要继续深入学习的读者可以联系我,毕竟我也想了一会才想到实现的方法,纯逻辑转换题。
Oracle 等价改写ROW_NUMBER() OVER() 计划
sql
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5400 |00:00:00.69 | 910 | |
| 1 | PX COORDINATOR | | 1 | | 5400 |00:00:00.69 | 910 | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 0 | 37 | 0 |00:00:00.01 | 0 | |
| 3 | NESTED LOOPS | | 0 | 37 | 0 |00:00:00.01 | 0 | |
| 4 | VIEW | | 0 | 2700 | 0 |00:00:00.01 | 0 | |
| 5 | SORT GROUP BY | | 0 | 2700 | 0 |00:00:00.01 | 0 | 68608 |
| 6 | PX RECEIVE | | 0 | 2700 | 0 |00:00:00.01 | 0 | |
| 7 | PX SEND HASH | :TQ10000 | 0 | 2700 | 0 |00:00:00.01 | 0 | |
| 8 | SORT GROUP BY | | 0 | 2700 | 0 |00:00:00.01 | 0 | 585K|
| 9 | PX BLOCK ITERATOR | | 0 | 30M| 0 |00:00:00.01 | 0 | |
|* 10 | TABLE ACCESS FULL | STUDENT_PAR_BAK | 0 | 30M| 0 |00:00:00.01 | 0 | |
|* 11 | TABLE ACCESS BY USER ROWID| STUDENT_PAR_BAK | 0 | 1 | 0 |00:00:00.01 | 0 | |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
10 - access(:Z>=:Z AND :Z<=:Z)
" 11 - filter((""B"".""STU_CLASSID""=""X2"".""STU_CLASSID"" AND ""B"".""STU_AGE""=""X2"".""STU_AGE"" AND ""B"".""STU_SEX""=""X2"".""STU_SEX"" AND "
" ""B"".""STU_DATE""=""X2"".""MAX_STU_DATE""))"
Note
-----
- Degree of Parallelism is 8 because of hint
PostgreSQL 等价改写ROW_NUMBER() OVER() 计划
sql
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=5627880.30..6621517.67 rows=281 width=450) (actual time=26827.678..32286.183 rows=5400 loops=1)
Hash Cond: ((x1.stu_classid = sb.stu_classid) AND (x1.stu_age = sb.stu_age) AND ((x1.stu_sex)::text = (sb.stu_sex)::text) AND ((x1.stu_date)::timestamp without time zone = (((max(
(sb.stu_date)::timestamp without time zone))::date))::timestamp without time zone))
CTE x
-> Gather (cost=0.00..0.00 rows=29996600 width=456) (actual time=3.752..6509.459 rows=30000000 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Parallel Append (cost=0.00..0.00 rows=3749575 width=456) (actual time=0.018..10115.770 rows=3333333 loops=9)
-> Parallel Seq Scan on student_par_bak_p0 b (cost=0.00..0.00 rows=12499 width=456) (actual time=0.020..308.911 rows=100240 loops=1)
-> Parallel Seq Scan on student_par_bak_p1 b_1 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.018..308.579 rows=100169 loops=1)
-> Parallel Seq Scan on student_par_bak_p2 b_2 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.017..248.827 rows=100260 loops=1)
-> Parallel Seq Scan on student_par_bak_p3 b_3 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.016..278.987 rows=100052 loops=1)
-> Parallel Seq Scan on student_par_bak_p4 b_4 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.021..316.482 rows=100078 loops=1)
-> Parallel Seq Scan on student_par_bak_p5 b_5 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.016..306.535 rows=99757 loops=1)
-> Parallel Seq Scan on student_par_bak_p6 b_6 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.016..312.291 rows=100510 loops=1)
-> Parallel Seq Scan on student_par_bak_p7 b_7 (cost=0.00..0.00 rows=12499 width=456) (actual time=0.015..187.120 rows=99636 loops=1)
/* 此处省略 200 行分区表计划 */
-> CTE Scan on x x1 (cost=0.00..599932.00 rows=29996600 width=456) (actual time=3.754..1339.590 rows=30000000 loops=1)
-> Hash (cost=5560387.95..5560387.95 rows=2999660 width=30) (actual time=26808.484..26808.486 rows=5400 loops=1)
Buckets: 4194304 Batches: 1 Memory Usage: 33095kB
-> GroupAggregate (cost=5042946.60..5530391.35 rows=2999660 width=30) (actual time=20922.491..26805.407 rows=5400 loops=1)
Group Key: sb.stu_classid, sb.stu_age, sb.stu_sex
-> Sort (cost=5042946.60..5117938.10 rows=29996600 width=30) (actual time=20921.360..22292.427 rows=30000000 loops=1)
Sort Key: sb.stu_classid, sb.stu_age, sb.stu_sex
Sort Method: external merge Disk: 1174192kB
-> CTE Scan on x sb (cost=0.00..599932.00 rows=29996600 width=30) (actual time=0.004..10345.811 rows=30000000 loops=1)
Planning Time: 1.540 ms
Execution Time: 32535.572 ms
(318 行记录)
时间:32541.503 ms (00:32.542)
Oracle:0.69秒出结果,PG:32秒出结果。
Oracle执行速度感觉打了鸡血一样,太猛了,PG改写以后也能从61秒降低到32秒跑出结果
目前只能将PG的 ROW_NUMBER() OVER() 执行速度优化到32秒,后续再想想有没有更快的优化方案。
结束语
上面 ROW_NUMBER() OVER() 的三种优化手段,如果在项目上有遇到类似的案例,先要搞清楚环境需求,再选择采用哪种优化手段。
如果读者有更好的优化思路,欢迎一起交流,进步。