GaussDB分布式数据库调优-基本步骤

1 业务背景

GaussDB分布式与集中式数据库在数据存储、扩展性、数据一致性与执行计划的差异外,在数据库调优方面确认分布键与复制表的选择。

2 分布式数据库调优基本步骤

3 操作实践

3.1 数据库参数

只需要设置几个常规参数,特别的参数设置需要根据后面优化中遇到的问题针对性的设置即可。

常规参数如下:

  • 流控参数
bash 复制代码
gs_guc reload -Z datanode -Z coordinator -N all -I all -c "audit_enabled=off";
  • 开启stream
bash 复制代码
 gs_guc reload  -Z datanode  -Z coordinator -N all -I all -c "enable_stream_operator='on'";
  • 线程池:
bash 复制代码
  gs_guc set  -Z datanode  -Z coordinator -N all -I all -c "enable_thread_pool='on'"
  • arm 线程池 绑核
bash 复制代码
gs_guc set -Z datanode -N all -I all -c "numa_distribute_mode='all'";
gs_guc set -Z datanode -N all -I all -c "thread_pool_attr='1024,4,(numabind: 0-31,32-63,64-95,96-127)'";
  • sql优化相关 (sql优化完成后正式压测需要关闭一下参数)
bash 复制代码
gs_guc reload -Z datanode  -Z coordinator -N all -I all -c "track_stmt_parameter='on'";  ---显示PBE 入参值
gs_guc reload -Z datanode  -Z coordinator  -N all -I all -c "instr_unique_sql_count=50000"  --dbe_perf.summary_statement unique sql收集数量
gs_guc reload -Z datanode  -Z coordinator  -N all -I all -c "track_activity_query_size=10000" --- dbe_perf.summary_statement query长度
gs_guc reload -Z datanode  -Z coordinator  -N all -I all -c "log_min_duration_statement=10";   ---  捕获大于10ms的sql记录到statement_history中 ------ 调试完后 调整为3S

3.2 布键 和 复制表的选择

分布式数据库中在两表join中如果非分布键 就是产生 broadcast或者redistribute算子,这两种很影响sql的执行效率。特别是再并发的情况下。所有一般的做法是把整个业务的sql提前出来,把有join的sql中的表提取出来,根据表名来筛选看看这次跟哪些表有join,join的字段是什么,以此为依据选出最适合的分布键 和 复制表。

怎么提前分析业务sql:通过 summary_statement 视图来提取,分布式 业务sql耗时:

sql 复制代码
select 
        node_name,
         unique_sql_id ,
         round(total_elapse_time/n_calls,2) as avg_time,
        n_calls as n_calls,
        round(plan_time/total_elapse_time,2)*100 as plan_time_rto,
        replace(replace(query,CHR(10),''),'  ',' ') as query,
       last_updated
from dbe_perf.summary_statement 
where user_name='xxx'  
and n_calls>10 order by 3 desc;

3.3 sql 优化

分布键 和 复制表的选择后,可以先跑一轮压测。通过上面的sql来提取 平均耗时最长的sql。找到平均耗时最长的sql后 可以通过unique_sql_id字段 反查只要知道sql的执行计划,入参值,等待时间等来分析:查看sql耗时详细内容需解析details:

sql 复制代码
 select pg_catalog.statement_detail_decode(details,'plaintext',true) as detail
from
 ( 
      select 
        *
     from dbe_perf.get_global_full_sql_by_timestamp(now()-1/24,now())
) a 
where a.unique_query_id=xxx
order by start_time desc limit 1;

如果details 信息中 LockMgrLock 耗着占比很高。可以通过修改num_internal_lock_partitions 参数 减少 轻量级锁 耗时,只要修改 LOG2_LOCKTABLE_PART,FASTPATH_PART:

bash 复制代码
  gs_guc set -Z coordinator -Z datanode -N all -I all -c "num_internal_lock_partitions='CLOG_PART=256,CSNLOG_PART=512,LOG2_LOCKTABLE_PART=4,TWOPHASE_PART=1,FASTPATH_PART=150'";
    

通过上面的sql获取到入参值后:我们可以手动构成pbe ,来完成模拟java pbe执行,然后看具体的执行计划 (为啥要模拟java pbe的行为,因为在集中式中引入了sql执行计划自适应功能,再某些特定场景下执行计划会变,gplan和cplan的跳变。) :

sql 复制代码
PREPARE p2(varchar(20),varchar(20),varchar(20))

   as ( 
    select   a.AAZ007,  a.AAZ617,  a.AAE140,  a.AAE002,  
    a.AAA039,  a.AAE719,  a.AAA028,  a.AAZ010,  a.AAE734,  a.AAE145,  a.AAE760,  a.AAE008,  a.AAE009,  a.AAE010,  a.AAA036,  a.AAA345,  a.AAA346, 
    a.AAE100,  a.AAA079,  a.AAE696,  a.AAE101,  a.AAZ661,  a.AIC161,  a.AAZ654,  a.AAZ903,  a.AAZ901,  a.AAB001,  a.AIC162,  a.AAZ902,  a.AAF002,  a.AAC157, 
    a.AAF200,  a.AAF216,  a.AAF217,  a.AAE041,  a.AAE042,  a.AAE013,  a.AAE007,  a.AAE006,  a.BAZ002, a.AAE011, a.AAE036, a.AAB034, a.BZE011, a.BZE036, 
    a.AAA027, a.BZE300, a.AAB359, a.AAB360, a.AAZ692, a.AAA431, a.AAF018, a.AAA508, a.AAE930, c.AAC059, c.AAE668, d.AAC107  
    from xxxx1 a,       
    xxx2 b,        
    xxx_tc c,        
    xx_kz d  
    where a.aaz010 = b.aac001    
    and a.baz002 = b.baz002     
    and c.aaz257 = b.aaz257    
    and c.aaz649 = b.baz002     
    and c.aaz661 = a.aaz661     
    and c.aaz661 = d.aaz661     
    and c.aaz257 = d.aaz257    
    AND B.BZ = '1'   
    AND B.JOB_NAME = $1
    AND B.PCH = $2   
    AND B.BAZ002 = $3);

再分析执行计划:

sql 复制代码
explain analyse
execute p2('200001260412803','1222','200001260412803');

对于几毫秒以内的sql,且数据很小的场景,一般情况,我们都是通过 enable_stream_operator=off 来优化。 因为走stream gather算子对于几毫秒的sql底噪很大。

索引不优: 通过 ind_ac82_111 索引扫描后,Filter aaz649, Rows Removed by Filter: 4141249 。Filter的数据量太大,说明索引定位数据不好。

sql 复制代码
 --Index Scan using ind_ac82_111 on ac82 c
     Index Cond: (c.aaz257 = xxxxx)
     Filter: (aaz649 = xxxxx::numeric)
     Rows Removed by Filter: 4141249 

Join顺序 、 Join方式的、Scan方式 等等 根据实际情况我们都可以通过hint 的方式来优化。一般在现场都是通过后台绑定sql_patch处理。

sql 复制代码
  --- 创建 patch
  select * from dbe_sql_util.create_hint_sql_patch('patch名称',&unique_query_id, 'set (enable_stream_operator off)');
sql 复制代码
 --- 删除 patch
 select * from dbe_sql_util.drop_sql_patch('patch名称'); 
      
sql 复制代码
select *from  global_sql_patch_func(); -- 全局各个节点上的SQL PATCH信息,用于返回global_sql_patch视图的结果。

3.4 sql改写

sql 复制代码
  INSTR((SELECT idpath FROM taarea WHERE areacode = (SELECT area FROM taorg WHERE customno = $5)),areaid) > ? 

---改写前
explain analyse 
select
    BCA001 as bca001, ADA120 as ada120, ADA113 as ada113, BCA008 as bca008, BCA002 as bca002, BCA003 as bca003,   BCA004 as bca004,
    BCA005 as bca005, BCA006 as bca006, BCA007 as bca007, AAE030 as aae030, AAE031 as aae031, AAE013 as aae013,
    AAE100 as aae100, AAE011 as aae011, YAE116 as yae116, AAE017 as aae017, AAE017_DSC as aae017Dsc, AAB302 as
    aab302, AAB302_DSC as aab302Dsc, AAE036 as aae036, AAE163 as aae163
    from xxx01
    where aae100 = '1'
        and ADA120 = 'JY198'
        and BCA008 = 'JY198_1'
      AND (aae017 = '220000000000354256' OR (bca003 = '1' 
AND aab302 != (SELECT area FROM taorg WHERE customno = '220xxxx300051034442') AND aab302 IN
(SELECT areaid  FROM xxx WHERE 
INSTR((SELECT idpath FROM xxx WHERE areacode = (SELECT area FROM xxx WHERE customno = '2201840300051034442')),areaid) > 0
)))
order by aab302 desc;

 id |                                          operation                                           | A-time | A-rows | E-rows | Peak Memory | A-width | E-width |      E-costs
----+----------------------------------------------------------------------------------------------+--------+--------+--------+-------------+---------+---------+--------------------
  1 | ->  Sort                                                                                     | 48.172 |      1 |     80 | 41KB        |         |     654 | 3254.664..3254.864
  2 |    ->  Hash Right Join (3, 7)                                                                | 48.150 |      1 |     80 | 33KB        |         |     654 | 2806.615..3249.852
  3 |       ->  HashAggregate                                                                      | 47.822 |      4 |  19623 | 280KB       |         |      13 | 2803.657..2999.887
  4 |          ->  Seq Scan on taarea                                                              | 47.626 |      4 |  19623 | 20KB        |         |      13 | 0.000..2750.035
  5 |          ->  Index Scan using index_taarea_2 on taarea  [4, InitPlan 3 (returns $2)]         |        |        |      1 |             |         |      66 | 2.282..4.565
  6 |             ->  Index Scan using index_taorg_customno on taorg  [5, InitPlan 2 (returns $1)] |        |        |      1 |             |         |      13 | 0.000..2.282
  7 |       ->  Hash                                                                               | 0.077  |      1 |      1 | 307KB       |         |     654 | 2.945..2.945
  8 |          ->  Seq Scan on ca01                                                                | 0.071  |      1 |      1 | 43KB        |         |     654 | 0.000..2.945
  9 |    ->  Index Scan using index_taorg_customno on taorg  [8, InitPlan 1 (returns $0)]          |        |        |      1 |             |         |      13 | 0.000..2.282
(9 rows)

                                                                                                    Predicate Information (identified by plan id)
------------------------------------------------------------------------------------------------------------------------------------------------------
   9 --Index Scan using index_taorg_customno on taorg
         Index Cond: ((customno)::text = '2201840300051034442'::text)
   2 --Hash Right Join (3, 7)
         Hash Cond: ((jy13kwsource.taarea.areaid)::text = (ca01.aab302)::text)
         Filter: (((ca01.aae017)::text = '220000000000354256'::text) OR (((ca01.bca003)::text = '1'::text) AND ((ca01.aab302)::text <> ($0)::text) AND (jy13kwsource.taarea.areaid IS NOT NULL)))
   5 --Index Scan using index_taarea_2 on taarea
         Index Cond: ((areacode)::text = ($1)::text)
   6 --Index Scan using index_taorg_customno on taorg
         Index Cond: ((customno)::text = '2201840300051034442'::text)
   4 --Seq Scan on taarea
         Filter: (instr(($2)::text, (areaid)::text) > 1)
         Rows Removed by Filter: 58865
   8 --Seq Scan on ca01
         Filter: (((aae100)::text = '1'::text) AND ((ada120)::text = 'JY198'::text) AND ((bca008)::text = 'JY198_1'::text) AND (((aae017)::text = '220000000000354256'::text) OR (((bca003)::text = '1'
::text) AND ((aab302)::text <> ($0)::text))))
         Rows Removed by Filter: 53
(15 rows)

     

可能会用到的一些命令:

sql 复制代码
select reset_unique_sql('GLOBAL','ALL',0); ---- 清空 summary_statement 视图
select pg_terminate_session(pid,sessionid) from pgxc_stat_activity a where application_name='PostgreSQL JDBC Driver';   ---- 杀掉所有应用链接

4 简单总结

分布式数据库中在两表join中如果非分布键 就是产生 broadcast或者redistribute算子,这两种很影响sql的执行效率。

转载:华为云社区

相关推荐
天空属于哈夫克37 分钟前
企业微信API常见的错误和解决方案
java·数据库·企业微信
东风破13717 分钟前
DM8达梦数据库备份、恢复原理介绍
数据库·oracle·dm达梦数据库
鹏子训30 分钟前
AI记忆新思路:用SQLite替代向量数据库,去EMBEDDINGS化,谷歌开源Google Always On Memory Agent
数据库·人工智能·sqlite·embedding
Frank_refuel1 小时前
终端环境下:Ubuntu 22.04.1 安装 MySQL 数据库
数据库·mysql·ubuntu
虹科网络安全2 小时前
艾体宝产品|深度解读 Redis 8.4 新增功能:原子化 Slot 迁移(下)
数据库·redis·bootstrap
有味道的男人3 小时前
对接亚马逊平台接口,商品全量信息一键抓取
数据库
Web极客码3 小时前
2026年Linux VPS安全加固清单:SSH、防火墙与审计就绪配置
运维·服务器·数据库
逻辑驱动的ken5 小时前
Java高频面试考点18
java·开发语言·数据库·算法·面试·职场和发展·哈希算法
qq_392690665 小时前
Redis怎样应对Redis集群整体宕机带来的雪崩
jvm·数据库·python
快乐非自愿5 小时前
Redis--SDS字符串与集合的底层实现原理
数据库·redis·缓存