实验六动态剪枝

实验介绍

通过分区剪枝可以大大减少从磁盘检索的数据量，提高查询性能。

当分区列的条件有绑定变量时，在SQL解析过程中，数据库无法确定需要扫描的分区，只能通过执行时具体参数值来确定，所以动态剪枝发生在 SQL 执行过程中。

本实验以TPCC业务表为例，通过分析分区表动态剪枝的基本行为、触发条件、剪枝前后行为对比，了解数据库是如何通过分区动态剪枝提升分区表的查询性能。

实验目的

了解动态剪枝的基本原理及触发条件。

了解如何判断 SQL 是否发生了动态剪枝。

了解如何合理使用动态剪枝提升分区表查询性能。

实验步骤

步骤1设置参数

set enable_fast_query_shipping = off;
set enable_stream_operator = on;

该两个参数为会话级，只在本次会话期间生效。

步骤2在数据库中创建分区表，用于验证动态剪枝功能。动态剪枝实验所使用的表定义及数据与静态剪枝实验一致。

步骤3 通过查看打印的计划，判断一级分区是否发生动态剪枝。

与静态剪枝类似，当 SQL 符合动态剪枝条件时，数据库会自动进行动态剪枝。可以通过查看SQL的执行计划来判断是否发生了动态剪枝。需要注意的是，由于动态剪枝在 SQL 执行阶段进行，打印生成的计划中，是无法看到具体剪枝分区的。

a.查看如下 PBE 的执行计划。

sql 复制代码

jiang=# PREPARE p1 AS SELECT * FROM bmsql_stock WHERE s_w_id = $1 AND s_i_id = $2; 
PREPARE
jiang=# EXPLAIN EXECUTE p1(59, 23);
 id |                   operation                   | E-rows | E-width | E-costs 
----+-----------------------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER)                  |      2 |    1142 | 501.62
  2 |    ->  Partition Iterator                     |      2 |    1142 | 500.50
  3 |       ->  Partitioned Seq Scan on bmsql_stock |      2 |    1142 | 500.50
(3 rows)

   Predicate Information (identified by plan id)   
---------------------------------------------------
   2 --Partition Iterator
         Iterations: 1
   3 --Partitioned Seq Scan on bmsql_stock
         Filter: ((s_w_id = 59) AND (s_i_id = 23))
         Selected Partitions:  2
(5 rows)

查询SQL带有条件s_w_id=$1，在SQL解析阶段，数据库可以根据这个条件确定只需要扫描部分分区（可能是 1个，也可能是0个），但无法确定具体扫描的分区数和分区编号，所以会进行动态剪枝。可以看到，Iterations 和 Selected Partitions 都标记为 1 和 2，表示只扫描了部分分区。

b.将条件改成 s_w_id < $1 AND s_i_id =$ 2，查看执行计划。

sql 复制代码

jiang=# DEALLOCATE p1;
DEALLOCATE
jiang=# PREPARE p1 AS SELECT * FROM bmsql_stock WHERE s_w_id < $1 AND s_i_id = $2; 
PREPARE
jiang=# EXPLAIN EXECUTE p1(59, 23);
 id |                   operation                   | E-rows | E-width | E-costs 
----+-----------------------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER)                  |     58 |    1142 | 521.69
  2 |    ->  Partition Iterator                     |     58 |    1142 | 500.50
  3 |       ->  Partitioned Seq Scan on bmsql_stock |     58 |    1142 | 500.50
(3 rows)

   Predicate Information (identified by plan id)   
---------------------------------------------------
   2 --Partition Iterator
         Iterations: 2
   3 --Partitioned Seq Scan on bmsql_stock
         Filter: ((s_w_id < 59) AND (s_i_id = 23))
         Selected Partitions:  1..2
(5 rows)

可以看到分区列条件 s_w_id < $1 和 s_w_id =$ 1 剪枝结果是类似的，数据库只能确定扫描部分分区，具体扫描哪些分区，只有在执行阶段才能确定。比如 $1 绑参为 80，则只会扫描分区stock_p3；$ 1 绑参为 20，则扫描所有 3 个分区。

c.将条件改成 s_i_id = $1，查看执行计划。

sql 复制代码

jiang=# DEALLOCATE p1;
DEALLOCATE
jiang=# PREPARE p1 AS SELECT * FROM bmsql_stock WHERE s_i_id = $1; 
PREPARE
jiang=# EXPLAIN EXECUTE p1(23);
 id |                   operation                   | E-rows | E-width | E-costs 
----+-----------------------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER)                  |    100 |    1142 | 654.05
  2 |    ->  Partition Iterator                     |    100 |    1142 | 617.25
  3 |       ->  Partitioned Seq Scan on bmsql_stock |    100 |    1142 | 617.25
(3 rows)

 Predicate Information (identified by plan id) 
-----------------------------------------------
   2 --Partition Iterator
         Iterations: 3
   3 --Partitioned Seq Scan on bmsql_stock
         Filter: (s_i_id = 23)
         Selected Partitions:  1..3
(5 rows)

分区键 s_w_id 上不带有任何条件，数据库不会进行动态剪枝。

步骤4验证动态剪枝的触发条件

分区列条件有绑定变量时，若符合剪枝条件，可以触发动态剪枝。动态剪枝的触发条件与静态剪枝类似，同样包括范围表达式（>、>=、=、<=、<）、IN 查询，以及由此组合的布尔表达式（AND、OR）。需要注意的是，一条 SQL 的查询计划中 Iterations 和 Selected Partitions 标记为PART并不代表一定进行了动态剪枝，只是数据库认为这条 SQL可能进行动态剪枝，是否真正进行剪枝只能在绑参后才能确定。

分区列发生类型转换，可以触发动态剪枝。

sql 复制代码

jiang=# DEALLOCATE p1;
DEALLOCATE
jiang=# PREPARE p1 AS SELECT * FROM bmsql_stock WHERE s_w_id = $1; 
PREPARE
jiang=# EXPLAIN ANALYZE EXECUTE p1(59.1);
 id |                   operation                   |    A-time     | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs 
----+-----------------------------------------------+---------------+--------+--------+-------------+---------+---------+---------
  1 | ->  Streaming (type: GATHER)                  | 5.261         |    999 |    996 | 104KB       |         |    1142 | 787.51
  2 |    ->  Partition Iterator                     | [2.321,2.321] |    999 |    996 | [8KB,8KB]   |         |    1142 | 417.25
  3 |       ->  Partitioned Seq Scan on bmsql_stock | [2.195,2.195] |    999 |    996 | [46KB,46KB] |         |    1142 | 417.25
(3 rows)

 Predicate Information (identified by plan id) 
-----------------------------------------------
   2 --Partition Iterator
         Iterations: 1
   3 --Partitioned Seq Scan on bmsql_stock
         Filter: (s_w_id = 59)
         Rows Removed by Filter: 10989
         Selected Partitions:  2
(6 rows)

 Memory Information (identified by plan id) 
--------------------------------------------
 Coordinator Query Peak Memory:
         Query Peak Memory: 1MB
 Datanode:
         Max Query Peak Memory: 1MB
         Min Query Peak Memory: 1MB
(5 rows)

                           User Define Profiling                           
---------------------------------------------------------------------------
 Plan Node id: 1  Track name: coordinator get datanode connection
  (actual time=[0.789, 0.789], calls=[1, 1])
 Plan Node id: 1  Track name: Coordinator serialize plan
  (actual time=[0.730, 0.730], calls=[1, 1])
 Plan Node id: 1  Track name: Coordinator send begin command
  (actual time=[0.000, 0.000], calls=[1, 1])
 Plan Node id: 1  Track name: Coordinator start transaction and send query
  (actual time=[0.016, 0.016], calls=[1, 1])
(8 rows)

                                ====== Query Summary =====                                
------------------------------------------------------------------------------------------
 Datanode executor start time [dn_6007_6008_6009, dn_6007_6008_6009]: [0.081 ms,0.081 ms]
 Datanode executor run time [dn_6007_6008_6009, dn_6007_6008_6009]: [2.965 ms,2.965 ms]
 Datanode executor end time [dn_6007_6008_6009, dn_6007_6008_6009]: [0.008 ms,0.008 ms]
 Coordinator executor start time: 0.064 ms
 Coordinator executor run time: 5.335 ms
 Coordinator executor end time: 0.015 ms
 Planner runtime: 0.097 ms
 Plan size: 4039 byte
 Query Id: 72902018968264835
 Total runtime: 5.436 ms
(10 rows)

实验总结

本实验通过分析分区表动态剪枝的行为、触发条件，了解数据库是如何使用动态剪枝对分区表进行查询优化的。当分区列的条件有绑定变量时，在SQL解析过程中，数据库无法确定需要扫描的分区。若符合剪枝条件，数据库可以通过动态剪枝，在执行阶段只扫描部分分区，提高查询性能。

典型 TPCC 业务场景都是通过 PBE 调用SQL的，通过调整表的分区方式、修改查询条件等，可以让尽可能多的TPCC业务查询触发动态剪枝，从而对大数据量查询进行性能优化。

实验六 动态剪枝

实验介绍

实验目的

实验步骤

实验总结

实验六动态剪枝