数仓调优实践丨多次关联发散导致数据爆炸案例分析改写

本文分享自华为云社区《GaussDB(DWS)性能调优:求字段全体值中大于本行值的最小值------多次关联发散导致数据爆炸案例分析改写》,作者: Zawami 。

1、【问题描述】

语句中存在同一个表多次自关联,且均为发散关联,数据爆炸导致性能瓶颈。

2、【原始SQL】

sql 复制代码
explain verbose
WITH TMP AS
(
    SELECT WH_ID
         , (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || STOP_TIME)::TIMESTAMP AS STOP_TIME
         , (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || '23:59:59')::TIMESTAMP AS MAX_ASD
      FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
    WHERE IS_OPEN = 'Y'
      AND STOP_TIME IS NOT NULL
)
SELECT T1.WH_ID
     , T1.THE_DATE
     , T1.IS_OPEN
     , MIN(T2.STOP_TIME) AS STOP_TIME
     , MIN(T2.MAX_ASD) AS TODAY_MAX_ASD
     , MIN(T3.MAX_ASD) AS NEXT_MAX_ASD
FROM (SELECT WH_ID
           , THE_DATE
           , IS_OPEN
           , (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || STOP_TIME)::TIMESTAMP AS STOP_TIME
        FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
     ) T1
LEFT JOIN TMP T2
ON T1.WH_ID = T2.WH_ID
AND T1.THE_DATE < T2.STOP_TIME

LEFT JOIN TMP T3
ON T1.WH_ID = T3.WH_ID
AND ADDDATE(T1.THE_DATE,1) < T3.STOP_TIME

GROUP BY T1.WH_ID, T1.THE_DATE, T1.IS_OPEN;

从SQL中不难看出,物理表HOLIDAY_D使用WH_ID为关联键,并使用其它字段做不等值关联。

3、【性能分析】

scss 复制代码
QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 id |                                    operation                                     |    E-rows     | E-distinct |   E-memory    | E-width |     E-costs                                                                                                    |
----+----------------------------------------------------------------------------------+---------------+------------+---------------+---------+-----------------                                                                                               |
  1 | ->  Row Adapter                                                                  |         51584 |            |               |      67 | 377559930171.36                                                                                                |
  2 |    ->  Vector Streaming (type: GATHER)                                           |         51584 |            |               |      67 | 377559930171.36                                                                                                |
  3 |       ->  Vector Hash Aggregate                                                  |         51584 |            | 16MB          |      67 | 377559929546.36                                                                                                |
  4 |          ->  Vector CTE Append(5, 7)                                             | 5699739636332 |            | 1MB           |      43 | 292063834485.54                                                                                                |
  5 |             ->  Vector Streaming(type: BROADCAST)                                |        757752 |            | 2MB           |      22 | 1474.87                                                                                                        |
  6 |                ->  CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d  [5, CTE tmp(1)] |        757752 |            | 1MB           |      22 | 1474.87                                                                                                        |
  7 |             ->  Vector Hash Left Join (8, 11)                                    | 5699739636332 |            | 107MB(6863MB) |      43 | 292063833010.67                                                                                                |
  8 |                ->  Vector Hash Right Join (9, 10)                                |     542231841 | 50         | 16MB          |      27 | 22365789.31                                                                                                    |
  9 |                   ->  Vector CTE Scan on tmp(1) t3                               |         31573 | 50         | 1MB           |      48 | 15155.04                                                                                                       |
 10 |                   ->  CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d               |         51584 | 50         | 1MB           |      19 | 556.58                                                                                                         |
 11 |                ->  Vector CTE Scan on tmp(1) t2                                  |         31573 | 50         | 1MB           |      48 | 15155.04                                                                                                       |

由于SQL非常慢,难以打出performance计划,我们先看verbose计划。从计划中我们看到,经过两次的关联发散,估计数据量达到了5万亿行;因为hash join根据WH_ID列进行关联,实际不会有这么多。所以调优的思路就是取消一些发散,让中间结果集行数变少。

4、【改写SQL】

分析SQL,可知发散是为了寻找所有 STOP_TIME中大于本行 THE_DATE的最小值。像这种每行都需要用到本行数据和所有数据的逻辑,或许可以使用窗口函数进行编写;但囿于笔者能力,先提供单次自关联的方法。

SQL改写如下:

sql 复制代码
explain performance
	WITH TMP AS
    (
        SELECT WH_ID
             , (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || STOP_TIME)::TIMESTAMP AS STOP_TIME
             , (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || '23:59:59')::TIMESTAMP AS MAX_ASD
          FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
        WHERE IS_OPEN = 'Y'
    	  AND STOP_TIME IS NOT NULL
    )
    SELECT T1.WH_ID
         , T1.THE_DATE
		 , T1.IS_OPEN
         , MIN(CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN STOP_TIME ELSE NULL END) AS STOP_TIME
         , MIN(CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END) AS TODAY_MAX_ASD
         , MIN(CASE WHEN ADDDATE(T1.THE_DATE, 1) < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END) AS NEXT_MAX_ASD
    FROM (SELECT DISTINCT WH_ID
               , THE_DATE
               , IS_OPEN
            FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
         ) T1
    LEFT JOIN TMP T2
    ON T1.WH_ID = T2.WH_ID
	GROUP BY
		T1.WH_ID
         , T1.THE_DATE
		 , T1.IS_OPEN
    ;

经过改写,取消了一次自关联,SQL的中间结果集变小。在关联后,通过条件聚合来得到需要的值。

scss 复制代码
 id |                            operation                            |        A-time        |  A-rows  | E-rows | E-distinct |  Peak Memory   | E-memory |  A-width  | E-width | E-costs  
----+-----------------------------------------------------------------+----------------------+----------+--------+------------+----------------+----------+-----------+---------+----------
  1 | ->  Row Adapter                                                 | 7490.354             |    34035 |    200 |            | 70KB           |          |           |      58 | 15149.80 
  2 |    ->  Vector Streaming (type: GATHER)                          | 7488.129             |    34035 |    200 |            | 216KB          |          |           |      58 | 15149.80 
  3 |       ->  Vector Hash Aggregate                                 | [7481.430, 7481.430] |    34035 |    200 |            | [9MB, 9MB]     | 16MB     | [112,112] |      58 | 15137.30 
  4 |          ->  Vector Hash Left Join (5, 7)                       | [909.377, 909.377]   | 31204164 | 109803 |            | [2MB, 2MB]     | 16MB     |           |      34 | 3880.50  
  5 |             ->  Vector Sonic Hash Aggregate                     | [5.876, 5.876]       |    34035 |  34036 | 6807       | [3MB, 3MB]     | 16MB     | [51,51]   |      18 | 1127.67  
  6 |                ->  CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d | [0.199, 0.199]       |    34036 |  34036 |            | [792KB, 792KB] | 1MB      |           |      18 | 532.04   
  7 |             ->  CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d    | [40.794, 40.794]     |    25122 |  21960 | 19         | [1MB, 1MB]     | 1MB      | [59,59]   |      24 | 617.13   

从执行计划中可以看到,中间结果集大小已经在可接受的范围内。但是又看到聚合 3千万数据使用了6s+的时间,这是过慢的,需要看执行计划中的DN信息寻找原因 。

ini 复制代码
                                                                                               Datanode Information (identified by plan id)                                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 --Row Adapter
        (actual time=7486.498..7490.354 rows=34035 loops=1)
        (CPU: ex c/r=107, ex row=34035, ex cyc=3668104, inc cyc=22468059912)
  2 --Vector Streaming (type: GATHER)
        (actual time=7486.466..7488.129 rows=34035 loops=1)
        (Buffers: shared hit=1)
        (CPU: ex c/r=660037, ex row=34035, ex cyc=22464391808, inc cyc=22464391808)
  3 --Vector Hash Aggregate
        dn_6083_6084 (actual time=7479.644..7481.430 rows=34035 loops=1) (projection time=4488.807)
        dn_6083_6084 (Buffers: shared hit=40)
        dn_6083_6084 (CPU: ex c/r=631, ex row=31204164, ex cyc=19718763112, inc cyc=22443886288)
  4 --Vector Hash Left Join (5, 7)
        dn_6083_6084 (actual time=48.009..909.377 rows=31204164 loops=1)
        dn_6083_6084 (Buffers: shared hit=36)
        dn_6083_6084 (CPU: ex c/r=43699, ex row=59157, ex cyc=2585141400, inc cyc=2725123176)
  5 --Vector Sonic Hash Aggregate
        dn_6083_6084 (actual time=5.177..5.876 rows=34035 loops=1)
        dn_6083_6084 (Buffers: shared hit=11)
        dn_6083_6084 (CPU: ex c/r=500, ex row=34036, ex cyc=17027544, inc cyc=17619064)
  6 --CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d
        dn_6083_6084 (actual time=0.043..0.199 rows=34036 loops=1) (CU ScanInfo: smallCu: 0, totalCu: 1, avrCuRow: 34036, totalDeadRows: 0)
        dn_6083_6084 (Buffers: shared hit=11)
        dn_6083_6084 (CPU: ex c/r=17, ex row=34036, ex cyc=591520, inc cyc=591520)
  7 --CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d
        dn_6083_6084 (actual time=6.464..40.794 rows=25122 loops=1) (filter time=0.872 projection time=33.671) (RoughCheck CU: CUNone: 0, CUTagNone: 0, CUSome: 1) (CU ScanInfo: smallCu: 0, totalCu: 1, avrCuRow: 34036, totalDeadRows: 0)
        dn_6083_6084 (Buffers: shared hit=25)
        dn_6083_6084 (CPU: ex c/r=3595, ex row=34036, ex cyc=122362712, inc cyc=122362712)

从中可以看出,所有算子都只在一个DN上运行了。这可以视为严重的计算倾斜 ,若对单点性能有更高要求需要继续优化。查看DMISC.DM_DIM_CBG_WH_HOLIDAY_D表的定义,发现它是一个复制表(distribute by replication),在进行各层运算的时候只用其中一个DN来算。而在本SQL中,使用到这张表的时候,关联键都是WH_ID

再查看调整分布列为WH_ID的倾斜情况:

csharp 复制代码
select * from pg_catalog.table_skewness('DMISC.DM_DIM_CBG_WH_HOLIDAY_D', 'wh_id');

结果有23行,小于集群DN个数 ,且存在倾斜 。但是本SQL需要使用该表的全量数据,故可以把这张表改为使用WH_ID作为分步键进行重分布。

由表分布方式为复制表导致的计算倾斜无法使用skew hint解决,可以改变物理表分布方式或者创建临时表来解决(复制表通常较小)。由于表在SQL中的使用情况和表的倾斜情况,不适合更改物理表分步键为WH_ID,故本例中试使用创建临时表指定重分布方式的办法解决。

sql 复制代码
DROP TABLE IF EXISTS holiday_d_tmp;
CREATE TEMP TABLE holiday_d_tmp WITH ( orientation = COLUMN, compression = low ) distribute BY hash ( wh_id ) AS ( SELECT * FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D );
EXPLAIN performance WITH TMP AS (
	SELECT
		WH_ID,
		( IFNULL ( SUBSTR( THE_DATE, 1, 10 ), '1900-01-01' ) || ' ' || STOP_TIME ) :: TIMESTAMP AS STOP_TIME,
		( IFNULL ( SUBSTR( THE_DATE, 1, 10 ), '1900-01-01' ) || ' ' || '23:59:59' ) :: TIMESTAMP AS MAX_ASD 
	FROM
		holiday_d_tmp 
	WHERE
		IS_OPEN = 'Y' 
		AND STOP_TIME IS NOT NULL 
	) SELECT
	T1.WH_ID,
	T1.THE_DATE,
	T1.IS_OPEN,
	MIN ( CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN STOP_TIME ELSE NULL END ) AS STOP_TIME,
	MIN ( CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END ) AS TODAY_MAX_ASD,
	MIN ( CASE WHEN ADDDATE ( T1.THE_DATE, 1 ) < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END ) AS NEXT_MAX_ASD 
FROM
	( SELECT WH_ID, THE_DATE, IS_OPEN FROM holiday_d_tmp ) T1
	LEFT JOIN TMP T2 ON T1.WH_ID = T2.WH_ID 
GROUP BY
	T1.WH_ID,
	T1.THE_DATE,
	T1.IS_OPEN;

下面是对应的执行计划:

scss 复制代码
 id |                                      operation                                       |      A-time      |  A-rows  |  E-rows  | E-distinct |  Peak Memory   | E-memory | A-width | E-width | E-costs  
----+--------------------------------------------------------------------------------------+------------------+----------+----------+------------+----------------+----------+---------+---------+----------
  1 | ->  Row Adapter                                                                      | 673.495          |    34035 |    34032 |            | 70KB           |          |         |      58 | 68112.60 
  2 |    ->  Vector Streaming (type: GATHER)                                               | 671.103          |    34035 |    34032 |            | 216KB          |          |         |      58 | 68112.60 
  3 |       ->  Vector Hash Aggregate                                                      | [0.079, 672.724] |    34035 |    34032 |            | [1MB, 1MB]     | 16MB     | [0,114] |      58 | 67794.10 
  4 |          ->  Vector Hash Left Join (5, 6)                                            | [0.047, 76.395]  | 31205167 | 27587201 |            | [324KB, 485KB] | 16MB     |         |      34 | 8876.88  
  5 |             ->  CStore Scan on pg_temp_cn_5003_6_22022_139764371019520.holiday_d_tmp | [0.004, 0.098]   |    34036 |    34036 | 1          | [760KB, 792KB] | 1MB      |         |      18 | 1553.65  
  6 |             ->  CStore Scan on pg_temp_cn_5003_6_22022_139764371019520.holiday_d_tmp | [0.008, 3.253]   |    25122 |    22018 | 1          | [880KB, 1MB]   | 1MB      | [0,61]  |      24 | 1557.76  

从计划中我们可以看到,耗时比单个DN运算快了不少,当然这里没有算上创建临时表的时间约0.2s。

5、【调优总结】

在本案例中,因为实际执行SQL时间太长先看了verbose计划而非performance计划,发现中间结果集发散 问题后,进行等价逻辑改写 ,把两个(等值-不等值)关联改为一个等值关联和条件聚合 。之后,我们发现SQL因复制表存在计算倾斜 问题,考虑SQL消费表数据的方式表的统计数据 ,采用了使用临时表重新指定分布方式的方法,解决了计算倾斜问题,SQL从单点25min+优化到单点800ms。

点击关注,第一时间了解华为云新鲜技术~

相关推荐
QX_hao1 小时前
【Go】--map和struct数据类型
开发语言·后端·golang
MC丶科2 小时前
【SpringBoot 快速上手实战系列】5 分钟用 Spring Boot 搭建一个用户管理系统(含前后端分离)!新手也能一次跑通!
java·vue.js·spring boot·后端
G探险者2 小时前
为何一个系统上线要经过N轮测试?带你看懂企业级发布体系
后端
TDengine (老段)3 小时前
TDengine 数学函数 DEGRESS 用户手册
大数据·数据库·sql·物联网·时序数据库·iot·tdengine
TDengine (老段)3 小时前
TDengine 数学函数 GREATEST 用户手册
大数据·数据库·物联网·时序数据库·iot·tdengine·涛思数据
安当加密4 小时前
云原生时代的数据库字段加密:在微服务与 Kubernetes 中实现合规与敏捷的统一
数据库·微服务·云原生
lang201509284 小时前
Spring Boot 入门:5分钟搭建Hello World
java·spring boot·后端
爱喝白开水a4 小时前
LangChain 基础系列之 Prompt 工程详解:从设计原理到实战模板_langchain prompt
开发语言·数据库·人工智能·python·langchain·prompt·知识图谱
想ai抽4 小时前
深入starrocks-多列联合统计一致性探查与策略(YY一下)
java·数据库·数据仓库