GaussDB(DWS)性能调优,解决DM区大内存占用问题

本文分享自华为云社区《GaussDB(DWS)性能调优:DM区优化案例------维度表关联条件存在会计期》,作者: O泡果奶~。

当前DM(P1、P3、CBGDM)存在维度表与主表关联时使用会计期作为关联条件,会导致出现大内存占用或未识别数据倾斜的问题

【场景一】f.period_id = 维度表.period_id

1.1、【问题描述】

主表和维度表关联过程中将会计期作为关联条件,导致维度表未进行分区剪枝,可能会产生大内存占用的情况

1.2、【原始SQL】

仅呈现SQL中的问题,详细SQL见附件

复制代码
FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND F.PERIOD_ID = PP.PERIOD_ID
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

1.3、【性能分析】



从上图的执行计划可以看出,由于用会计期作为关联条件,导致维度表未进行分区剪枝,数据量大,不但产生了数据倾斜,同时还由于数据量大出现了关联下盘,大大降低了sql执行性能。

主表只有一个会计期,可以识别出对应的会计期,然后对SQL进行如下改写:

复制代码
FROM
        DMACC.dm_adp_ar_trx_dtl_tmp F
        INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY
        INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 
        AND T9.PROD_POV_ID = 1
        INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY
        INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY
        LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 
        AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY
        LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 
        AND PP.PERIOD_ID = '202406'
        LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID
        INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID
        INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID
        LEFT JOIN (
        SELECT C
            .CONTRACT_KEY,
            D.COMPANY_KEY,
            R.FIRST_SHIP_DATE 
        FROM
            DMDIM.dm_dim_contract_d C,
            DMDIM.DM_DIM_COMPANY_D D,
            DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 
        WHERE
            C.CONTRACT_ID = R.CONTRACT_ID 
            AND D.COMPANY_ID = R.COMPANY_ID 
        ) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 
        AND F.COA_COMPANY_KEY = FR.COMPANY_KEY
        INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY
        JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID
        JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 

经优化后,执行计划如下图所示,维度表进行了分区剪枝,数据量减少,缓解了数据倾斜,也避免了关联下盘的问题。

【场景二】f left join 维度表 on f.period_id = 维度表.period_id and 维度表.period_id = '会计期'

2.1、【问题描述】

主表和维度表关联过程中将会计期作为关联条件,同时还为维度表会计期进行赋值,可能会产生数据倾斜未识别的情况

2.2、【原始SQL】

复制代码
FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT1.period_id = LT2.period_id 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.period_id = LT3.period_id 
        AND LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.period_id = LT4.period_id 
        AND LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.period_id = LT5.period_id 
        AND LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.period_id = LT6.period_id 
        AND LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.period_id = LT7.period_id 
        AND LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.period_id = LT8.period_id 
        AND LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.period_id = LT9.period_id 
        AND LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.period_id = LT10.period_id 
        AND LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.period_id = LT11.period_id 
        AND LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.period_id = LT12.period_id 
        AND LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.period_id = LT13.period_id 
        AND LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.period_id = LT14.period_id 
        AND LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.period_id = LT15.period_id 
        AND LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.period_id = LT16.period_id 
        AND LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.period_id = LT17.period_id 
        AND LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.period_id = LT18.period_id 
        AND LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.period_id = LT19.period_id 
        AND LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.period_id = LT20.period_id 
        AND LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

2.3、【性能分析】


上图的执行计划可以看出,在主表一开始关联过程中就存在数据倾斜,导致SQL执行性能差。

详细执行计划中,虽然维度表进行了分区剪枝,但由于使用了 left join,导致关联条件中维度表的常量period_id不能直接赋值给主表period_id,主表关联后的结果重分布时将period_id作为了分布键之一,这会影响优化器的倾斜优化。

可以将f.period_id = 维度表.period_id这一关联条件删掉,对sql进行如下改写

复制代码
FROM
        dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1
        LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 
        AND LT2.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.geo_pc_key = LT3.old_key 
        AND LT3.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.account_dept_cust_key = LT4.old_key 
        AND LT4.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.proj_key = LT5.old_key 
        AND LT5.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.enterprise_cust_key = LT6.old_key 
        AND LT6.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.report_item_id = LT7.old_key 
        AND LT7.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.supply_center_key = LT8.old_key 
        AND LT8.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.inventory_class_key = LT9.old_key 
        AND LT9.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.business_status_key = LT10.old_key 
        AND LT10.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.hisi_prod_key = LT11.old_key 
        AND LT11.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.inventory_org_key = LT12.old_key 
        AND LT12.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.end_cust_key = LT13.old_key 
        AND LT13.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.sign_cust_key = LT14.old_key 
        AND LT14.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.agent_distribution_cust_key = LT15.old_key 
        AND LT15.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.company_key = LT16.old_key 
        AND LT16.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.contract_key = LT17.old_key 
        AND LT17.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.loan_contract_key = LT18.old_key 
        AND LT18.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.target_supply_center_key = LT19.old_key 
        AND LT19.PERIOD_ID = 202406
        LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.subinventory_key = LT20.old_key 
        AND LT20.PERIOD_ID = 202406 
    WHERE
        1 = 1 
    AND partition_value IN ( 0, 1 )

改写后,执行计划如下所示

可以看出,执行计划不但进行了分区剪枝,同时优化器还进行了倾斜优化,提高了SQL执行性能

点击关注,第一时间了解华为云新鲜技术~

相关推荐
华为云开发者联盟2 天前
结合RNN与Transformer双重优点,深度解析大语言模型RWKV
rnn·深度学习·大模型·transformers·rwkv·华为云开发者联盟
华为云开发者联盟2 天前
从安装到配置,教你用Argo CD对接CCE集群完成测试、生产部署
gitops·华为云开发者联盟·持续交付·argo cd
华为云开发者联盟2 天前
寻找适合编写静态分析规则的语言
静态分析·软件开发·华为云开发者联盟·dsl
华为云开发者联盟3 天前
一文为你深度解析LLaMA2模型架构
大模型·nlp·llama·transformers·华为云开发者联盟
华为云开发者联盟4 天前
一文了解Spring Boot启动类SpringApplication
java·spring boot·华为云开发者联盟·springapplication
华为云开发者联盟4 天前
从零开始带你上手体验Sermant自定义插件开发
微服务·云原生·开源·华为云开发者联盟
华为云开发者联盟5 天前
深度解读昇腾CANN多流并行技术,提高硬件资源利用率
cann·华为云开发者联盟·华为云图引擎ges·多流并行
华为云开发者联盟6 天前
探秘数据库中的并行计算技术应用
mysql·并行计算·华为云开发者联盟
华为云开发者联盟10 天前
深入理解Spring AOP中的@EnableAspectJAutoProxy
java·spring·aop·华为云开发者联盟