GaussDB(DWS)性能调优,解决DM区大内存占用问题

本文分享自华为云社区《GaussDB(DWS)性能调优:DM区优化案例------维度表关联条件存在会计期》,作者: O泡果奶~。

当前DM(P1、P3、CBGDM)存在维度表与主表关联时使用会计期作为关联条件,会导致出现大内存占用或未识别数据倾斜的问题

【场景一】f.period_id = 维度表.period_id

1.1、【问题描述】

主表和维度表关联过程中将会计期作为关联条件,导致维度表未进行分区剪枝,可能会产生大内存占用的情况

1.2、【原始SQL】

仅呈现SQL中的问题,详细SQL见附件

复制代码
FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND F.PERIOD_ID = PP.PERIOD_ID
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

1.3、【性能分析】



从上图的执行计划可以看出,由于用会计期作为关联条件,导致维度表未进行分区剪枝,数据量大,不但产生了数据倾斜,同时还由于数据量大出现了关联下盘,大大降低了sql执行性能。

主表只有一个会计期,可以识别出对应的会计期,然后对SQL进行如下改写:

复制代码
FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND PP.PERIOD_ID = '202406'
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

经优化后,执行计划如下图所示,维度表进行了分区剪枝,数据量减少,缓解了数据倾斜,也避免了关联下盘的问题。

【场景二】f left join 维度表 on f.period_id = 维度表.period_id and 维度表.period_id = '会计期'

2.1、【问题描述】

主表和维度表关联过程中将会计期作为关联条件,同时还为维度表会计期进行赋值,可能会产生数据倾斜未识别的情况

2.2、【原始SQL】

复制代码
FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT1.period_id = LT2.period_id 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.period_id = LT3.period_id 
        AND LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.period_id = LT4.period_id 
        AND LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.period_id = LT5.period_id 
        AND LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.period_id = LT6.period_id 
        AND LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.period_id = LT7.period_id 
        AND LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.period_id = LT8.period_id 
        AND LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.period_id = LT9.period_id 
        AND LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.period_id = LT10.period_id 
        AND LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.period_id = LT11.period_id 
        AND LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.period_id = LT12.period_id 
        AND LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.period_id = LT13.period_id 
        AND LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.period_id = LT14.period_id 
        AND LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.period_id = LT15.period_id 
        AND LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.period_id = LT16.period_id 
        AND LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.period_id = LT17.period_id 
        AND LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.period_id = LT18.period_id 
        AND LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.period_id = LT19.period_id 
        AND LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.period_id = LT20.period_id 
        AND LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

2.3、【性能分析】


上图的执行计划可以看出,在主表一开始关联过程中就存在数据倾斜,导致SQL执行性能差。

详细执行计划中,虽然维度表进行了分区剪枝,但由于使用了 left join,导致关联条件中维度表的常量period_id不能直接赋值给主表period_id,主表关联后的结果重分布时将period_id作为了分布键之一,这会影响优化器的倾斜优化。

可以将f.period_id = 维度表.period_id这一关联条件删掉,对sql进行如下改写

复制代码
FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

改写后,执行计划如下所示

可以看出,执行计划不但进行了分区剪枝,同时优化器还进行了倾斜优化,提高了SQL执行性能

点击关注,第一时间了解华为云新鲜技术~

相关推荐
张永清1 天前
每周读书与学习->初识JMeter 元件(一)
性能调优·jmeter性能测试·性能分析·每周读书与学习
张永清8 天前
每周读书与学习->认识性能测试工具JMeter
性能调优·jmeter性能测试·性能分析·每周读书与学习
张永清13 天前
Apache JMeter性能测试工具属性配置最全的整理
jmeter·性能测试·性能调优·性能分析·教材
p@nd@16 天前
【SQL优化案例】SQL改写 - 用 UNION ALL 替代 OR
oracle·性能调优·sql优化·oracle优化
张永清19 天前
性能测试中性能分析与调优学习大纲整理
性能测试·性能调优·性能分析
张永清25 天前
《JMeter核心技术、性能测试与性能分析》 教学大纲及标准
jmeter·性能测试·性能调优·性能分析·教材
张永清1 个月前
软件性能测试工具的发展以及不同性能测试工具之间的使用对比总结
jmeter·性能测试·性能调优·性能分析
张永清1 个月前
【图书介绍】清华大学出版社出版的《JMeter核心技术、性能测试与性能分析》,专为解决实际问题而生
jmeter·性能测试·性能调优·性能分析
OceanBase数据库官方博客1 个月前
如何利用 rowid 在OceanBase 中处理大表时提效
oceanbase·分布式数据库·性能调优·实践经验
练习时长两年半的程序员小胡1 个月前
JVM 性能调优实战:让系统性能 “飞” 起来的核心策略
java·jvm·性能调优·jvm调优