OceanBase中NOT EXISTS是否需要被改写

作者简介

张瑞远,曾经从事银行、证券数仓设计、开发、优化类工作,现主要从事电信级IT系统及数据库的规划设计、架构设计、运维实施、运维服务、故障处理、性能优化等工作。 持有Orale OCM,MySQL OCP及国产代表数据库认证。 获得的专业技能与认证包括 OceanBase OBCE、Oracle OCP 11g、OracleOCM 11g 、MySQL OCP 5.7 、腾讯云TBase、腾讯云TDSQ

背景:

前段时间写了一篇《 关于OB中左外连接和反连接的探究 》的文章。OceanBase官网知识库后来也更新了这部分的内容,链接如下:

https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000475695

这意味着在OceanBase中就不建议使用not exists,或者说只能通过改写来优化它吗?

实际上,并非如此。包括我在文章中的介绍,也仅仅是表明在特定情况下,batch rescan的优化性能可能会优于无法使用该特性的anti join,但这并不意味着not exists的优化途径仅此一种。

为了更直观地说明这一点,我仍倾向于通过实验来验证。

接下来,我将以我之前文章中的sql为例,进行一次优化实验。

实验过程:

sql原文及效率和执行计划如下,因为CTM_GM是视图,所以执行计划中看到的表名是SD_CTM_GM。

复制代码
#####sql文本
 select  count(1)
    from tttt.mmmmm_sssssale t
   where t.sssss  not in ('e111', 'ddddda')
     and t.stats  = '1'
     and t.parean  is null
     and t.city  = 2208
     AND (t.cusystatus  = 'FFFFGGGGG')
     AND (T.ORGGGGGGNEL  is null or T.ORGGGGGGNEL  != 'infonow')
      AND NOT EXISTS (SELECT  1
            FROM tttt.TTTT_OWN_C  TOW
           WHERE T.PID  = TOW.OID
             AND TOW.CT_ID  = T.city 
             AND TOW.OODDD_sTS   IN
                 (SELECT DDDC
                    FROM tttt.CTM_GM
                   WHERE GPID  = 'OtherThing '
       AND stats  = '1'))  and to_char(createdate,'yyyymmdd') between 20150101 and 202301211;	
###执行时间
 +----------+
| COUNT(1) |
+----------+
|    29493 |
+----------+
1 row in set (43.87 sec)
####执行计划
| ===========================================================================================
|ID|OPERATOR                           |NAME                             |EST. ROWS|COST  |
-------------------------------------------------------------------------------------------
|0 |SCALAR GROUP BY                    |                                 |1        |161064|
|1 | NESTED-LOOP ANTI JOIN             |                                 |808      |161033|
|2 |  TABLE SCAN                       |T                                |1278     |40623 |
|3 |  PX COORDINATOR                   |                                 |1        |94    |
|4 |   EXCHANGE OUT DISTR              |:EX10001                         |1        |94    |
|5 |    SUBPLAN SCAN                   |VIEW2                            |1        |94    |
|6 |     NESTED-LOOP JOIN              |                                 |1        |94    |
|7 |      EXCHANGE IN DISTR            |                                 |1        |92    |
|8 |       EXCHANGE OUT DISTR (BC2HOST)|:EX10000                         |1        |92    |
|9 |        TABLE SCAN                 |TOW(IDX_TTTT_OWN_C _ORDERID)     |1        |92    |
|10|      TABLE SCAN                   |SD_CTM_GM(PK_SD_CTM_GM)          |1        |32    | 
===========================================================================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_COUNT(*)]), filter(nil), 
      group(nil), agg_func([T_FUN_COUNT(*)])
  1 - output([1]), filter(nil), 
      conds(nil), nl_params_([T.PID ])
  2 - output([T.PID ]), filter([T.city  = 2208], [cast(cast(TO_CHAR(T.CREATEDATE, ?), VARCHAR2(256 BYTE)), NUMBER(-1, -85)) >= 20150101], [cast(cast(TO_CHAR(T.CREATEDATE, ?), VARCHAR2(256 BYTE)), NUMBER(-1, -85)) <= 202301211], [(T_OP_IS, T.ORGGGGGGNEL , NULL, 0) OR T.ORGGGGGGNEL  != ?], [(T_OP_NOT_IN, T.sssss , (?, ?))], [(T_OP_IS, T.parean , NULL, 0)], [T.stats  = ?], [T.cusystatus  = ?]), 
      access([T.sssss ], [T.stats ], [T.parean ], [T.city ], [T.cusystatus ], [T.ORGGGGGGNEL ], [T.PID ], [T.CREATEDATE]), partitions(p0)
  3 - output([1]), filter(nil)
  4 - output([1]), filter(nil), is_single, dop=1
  5 - output([1]), filter(nil), 
      access([VIEW2.TOW.OID])
  6 - output([TOW.OID]), filter(nil), 
      conds(nil), nl_params_([TOW.OODDD_sTS  ])
  7 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil)
  8 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil), is_single, dop=1
  9 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter([TOW.CT_ID  = 2208]), 
      access([TOW.OID], [TOW.CT_ID ], [TOW.OODDD_sTS  ]), partitions(p0)
  10 - output([1]), filter([SD_CTM_GM.stats  = ?]), 
      access([SD_CTM_GM.stats ]), partitions(p0)

可以看到该sql走了NESTED-LOOP ANTI JOIN,执行时间是43s,执行时间比较长。

按照正常的优化思路来分析下,咱们先看下实际的数据量。

复制代码
obclient>  select  count(1)
    ->     from tttt.mmmmm_sssssale t
    ->    where t.sssss  not in ('e111', 'ddddda')
    ->      and t.stats  = '1'
    ->      and t.parean  is null
    ->      and t.city  = 2208
    ->      AND (t.cusystatus  = 'FFFFGGGGG')
    ->      AND (T.ORGGGGGGNEL  is null or T.ORGGGGGGNEL  != 'infonow') and to_char(createdate,'yyyymmdd') between 20150101 and 202301211; 
+----------+
| COUNT(1) |
+----------+
|    29493 |
+----------+
1 row in set (0.08 sec)

obclient> SELECT count(*)
    ->                     FROM tttt.CTM_GM
    ->                    WHERE GPID  = 'OtherThing '
    ->        AND stats  = '1' ;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

obclient> SELECT count(*) from  tttt.TTTT_OWN_C  TOW;
+----------+
| COUNT(*) |
+----------+
|  5087140 |
+----------+
1 row in set (1.99 sec)

可以看到CTM_GM的结果是0行,扫描速度很快,那么not exists的子查询结果也是0行,但是TOW表有500W数据,原本的执行计划是TOW通过nl连接CTM_GM,要取TOW的结果集去匹配CTM_GM,理论上我们修改这两个表关联顺序,就可以只取CTM_GM的0行消除掉TOW表这么大数据量的消耗代价。

复制代码
######sql文本
select /*+use_nl(@"SEL$1" ("VIEW2"@"SEL$1" ))*/ count(1)
    from tttt.mmmmm_sssssale t
   where t.sssss  not in ('e111', 'ddddda')
     and t.stats  = '1'
     and t.parean  is null
     and t.city  = 2208
     AND (t.cusystatus  = 'FFFFGGGGG')
     AND (T.ORGGGGGGNEL  is null or T.ORGGGGGGNEL  != 'infonow')
      AND NOT EXISTS (SELECT  /*+leading(CTM_GM) use_nl(CTM_GM,TOW) */ 1
            FROM tttt.TTTT_OWN_C  TOW
           WHERE T.PID  = TOW.OID
             AND TOW.CT_ID  = T.city 
             AND TOW.OODDD_sTS   IN
                 (SELECT DDDC
                    FROM tttt.CTM_GM
                   WHERE GPID  = 'OtherThing '
       AND stats  = '1'))  and to_char(createdate,'yyyymmdd') between 20150101 and 202301211;		

#######执行效率
+----------+
| COUNT(1) |
+----------+
|    29493 |
+----------+
1 row in set (0.21 sec)	  


####执行计划

| =========================================================================================
|ID|OPERATOR                |NAME                                      |EST. ROWS|COST  |
-----------------------------------------------------------------------------------------
|0 |SCALAR GROUP BY         |                                          |1        |275986|
|1 | NESTED-LOOP ANTI JOIN  |                                          |808      |275955|
|2 |  PX COORDINATOR        |                                          |1278     |41514 |
|3 |   EXCHANGE OUT DISTR   |:EX10000                                  |1278     |40623 |
|4 |    TABLE SCAN          |T                                         |1278     |40623 |
|5 |  SUBPLAN SCAN          |VIEW2                                     |1        |183   |
|6 |   NESTED-LOOP JOIN     |                                          |1        |183   |
|7 |    TABLE SCAN          |SD_CTM_GM(INX_SD_CTM_GM_GPID )|1        |92    |
|8 |    MATERIAL            |                                          |1        |92    |
|9 |     PX COORDINATOR     |                                          |1        |92    |
|10|      EXCHANGE OUT DISTR|:EX20000                                  |1        |92    |
|11|       TABLE SCAN       |TOW(IDX_TTTT_OWN_C _ORDERID)            |1        |92    |
=========================================================================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_COUNT(*)]), filter(nil), 
      group(nil), agg_func([T_FUN_COUNT(*)])
  1 - output([1]), filter(nil), 
      conds(nil), nl_params_([T.PID ])
  2 - output([T.PID ]), filter(nil)
  3 - output([T.PID ]), filter(nil), is_single, dop=1
  4 - output([T.PID ]), filter([T.city  = 2208], [cast(cast(TO_CHAR(T.CREATEDATE, ?), VARCHAR2(256 BYTE)), NUMBER(-1, -85)) >= 20150101], [cast(cast(TO_CHAR(T.CREATEDATE, ?), VARCHAR2(256 BYTE)), NUMBER(-1, -85)) <= 202301211], [(T_OP_IS, T.ORGGGGGGNEL , NULL, 0) OR T.ORGGGGGGNEL  != ?], [(T_OP_NOT_IN, T.sssss , (?, ?))], [(T_OP_IS, T.parean , NULL, 0)], [T.stats  = ?], [T.cusystatus  = ?]), 
      access([T.sssss ], [T.stats ], [T.parean ], [T.city ], [T.cusystatus ], [T.ORGGGGGGNEL ], [T.PID ], [T.CREATEDATE]), partitions(p0)
  5 - output([1]), filter(nil), 
      access([VIEW2.TOW.OID])
  6 - output([TOW.OID]), filter(nil), 
      conds([TOW.OODDD_sTS   = SD_CTM_GM.DDDC]), nl_params_(nil)
  7 - output([SD_CTM_GM.DDDC]), filter([SD_CTM_GM.stats  = ?]), 
      access([SD_CTM_GM.stats ], [SD_CTM_GM.DDDC]), partitions(p0)
  8 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil)
  9 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil)
  10 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil), is_single, dop=1
  11 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter([TOW.CT_ID  = 2208]), 
      access([TOW.OID], [TOW.CT_ID ], [TOW.OODDD_sTS  ]), partitions(p0) 

通过添加hint的方式我们可以看到我只调整了TOW和CTM_GM的连接顺序,效率从43s优化到了0.21s,而我上一篇文章中改写之后的效率是1s,所以该sql不需要改写就可以优化掉。

那这个效率还可以优化吗,看起来还有优化空间,既然not exists的结果集是0,那我用上面同样的方式,干预下view结果集和T表的顺序,能不能得到更好的结果那?

复制代码
####sql文本
select /*+leading(("VIEW2"@"SEL$1" ))*/ count(1)
    from tttt.mmmmm_sssssale t
   where t.sssss  not in ('e111', 'ddddda')
     and t.stats  = '1'
     and t.parean  is null
     and t.city  = 2208
     AND (t.cusystatus  = 'FFFFGGGGG')
     AND (T.ORGGGGGGNEL  is null or T.ORGGGGGGNEL  != 'infonow')
      AND NOT EXISTS (SELECT  /*+leading(CTM_GM) use_nl(CTM_GM,TOW) */ 1
            FROM tttt.TTTT_OWN_C  TOW
           WHERE T.PID  = TOW.OID
             AND TOW.CT_ID  = T.city 
             AND TOW.OODDD_sTS   IN
                 (SELECT DDDC
                    FROM tttt.CTM_GM
                   WHERE GPID  = 'OtherThing '
       AND stats  = '1'))  and to_char(createdate,'yyyymmdd') between 20150101 and 202301211;		  
####执行效率	   
+----------+
| COUNT(1) |
+----------+
|    29493 |
+----------+
1 row in set (0.08 sec)
#########执行计划	   
| ==========================================================================================
|ID|OPERATOR                |NAME                                      |EST. ROWS|COST   |
------------------------------------------------------------------------------------------
|0 |SCALAR GROUP BY         |                                          |1        |2231317|
|1 | HASH RIGHT ANTI JOIN   |                                          |808      |2231286|
|2 |  SUBPLAN SCAN          |VIEW2                                     |470      |2188504|
|3 |   NESTED-LOOP JOIN     |                                          |470      |2188497|
|4 |    TABLE SCAN          |SD_CTM_GM(INX_SD_CTM_GM_GPID )|1        |92     |
|5 |    MATERIAL            |                                          |299244   |2182932|
|6 |     PX COORDINATOR     |                                          |299244   |2169701|
|7 |      EXCHANGE OUT DISTR|:EX10000                                  |299244   |2075029|
|8 |       TABLE SCAN       |TOW                                       |299244   |2075029|
|9 |  PX COORDINATOR        |                                          |1278     |41514  |
|10|   EXCHANGE OUT DISTR   |:EX20000                                  |1278     |40623  |
|11|    TABLE SCAN          |T                                         |1278     |40623  |
==========================================================================================

Outputs & filters: 
-------------------------------------
  0 - output([T_FUN_COUNT(*)]), filter(nil), 
      group(nil), agg_func([T_FUN_COUNT(*)])
  1 - output([1]), filter(nil), 
      equal_conds([T.PID  = VIEW2.TOW.OID]), other_conds(nil)
  2 - output([VIEW2.TOW.OID]), filter(nil), 
      access([VIEW2.TOW.OID])
  3 - output([TOW.OID]), filter(nil), 
      conds([TOW.OODDD_sTS   = SD_CTM_GM.DDDC]), nl_params_(nil)
  4 - output([SD_CTM_GM.DDDC]), filter([SD_CTM_GM.stats  = ?]), 
      access([SD_CTM_GM.stats ], [SD_CTM_GM.DDDC]), partitions(p0)
  5 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil)
  6 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil)
  7 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter(nil), is_single, dop=1
  8 - output([TOW.OID], [TOW.OODDD_sTS  ]), filter([TOW.CT_ID  = 2208]), 
      access([TOW.OID], [TOW.CT_ID ], [TOW.OODDD_sTS  ]), partitions(p0)
  9 - output([T.PID ]), filter(nil)
  10 - output([T.PID ]), filter(nil), is_single, dop=1
  11 - output([T.PID ]), filter([T.city  = 2208], [cast(cast(TO_CHAR(T.CREATEDATE, ?), VARCHAR2(256 BYTE)), NUMBER(-1, -85)) >= 20150101], [cast(cast(TO_CHAR(T.CREATEDATE, ?), VARCHAR2(256 BYTE)), NUMBER(-1, -85)) <= 202301211], [(T_OP_IS, T.ORGGGGGGNEL , NULL, 0) OR T.ORGGGGGGNEL  != ?], [(T_OP_NOT_IN, T.sssss , (?, ?))], [(T_OP_IS, T.parean , NULL, 0)], [T.stats  = ?], [T.cusystatus  = ?]), 
      access([T.sssss ], [T.stats ], [T.parean ], [T.city ], [T.cusystatus ], [T.ORGGGGGGNEL ], [T.PID ], [T.CREATEDATE]), partitions(p0)	   

可以看到现在效率0.08s比一开始的43s提升了500多倍比单纯的改写效率提升了12倍多,所以NOT EXISTS不一定需要改写,可能只是计划走错了,这种情况下我们绑定下计划就好了,很多时候应用改代码的代价也很大。

结论:

很多sql不是一定要去改写才能解决的,虽然改写有可能可以使用到一些优化算子,但是可能问题的根因不在这里,我们一定要提高自己分析问题的能力,准确判断分析问题,这样在工作中可以得心应手,也可以避免很多不必要的冗余的工作。

行之所向,莫问远方。

相关推荐
想七想八不如114089 小时前
数据库--样题复习
数据库·sql·oracle
551只玄猫9 小时前
【数据库原理 实验报告1】创建和管理数据库
数据库·sql·学习·mysql·课程设计·实验报告·数据库原理
551只玄猫12 小时前
【数据库原理 实验报告3】索引的创建以及数据更新
数据库·sql·课程设计·实验报告·操作系统原理
养生技术人14 小时前
Oracle OCP认证考试题目详解082系列第5题
运维·数据库·sql·oracle·开闭原则
551只玄猫20 小时前
【数据库原理 实验报告5】数据查询的应用(连接)
数据库·sql·mysql·课程设计·实验报告
Gauss松鼠会20 小时前
【GaussDB】GaussDB 重要内存参数设置
数据库·oracle·性能优化·database·gaussdb
被考核重击20 小时前
虚拟列表(动态高度,性能优化,骨架屏)
javascript·vue.js·性能优化·虚拟列表
桌面运维家20 小时前
Windows Hyper-V:VHD/VHDX磁盘性能优化指南
windows·性能优化
551只玄猫20 小时前
【数据库原理 实验报告2】创建和管理数据表
数据库·sql·mysql·课程设计·实验报告
weixin_3077791321 小时前
2025年中国研究生数学建模竞赛A题:通用神经网络处理器下的核内调度问题——解决方案与实现
开发语言·人工智能·python·数学建模·性能优化