验证崖山数据库标量子查询是否带有CACHE功能

Oracle标量子查询算法如下:

主表返回多少行,子表就要被扫描COUNT(DISTINCT NVL(主表JOIN列,0))这么多次,也就是说Oracle标量子查询带有CACHE功能

大多数国产数据库标量子查询还未实现CACHE功能,算法还停留在主表返回多少行,子表被扫描多少次

标量子查询CACHE功能在一些极致性能优化场景中非常有用,笔者曾利用该特性优化过几百条SQL

今年夏天(2025年)给某头部券商做SQL优化的时候遇到一条SQL跑3.7秒,逻辑读70W,GROUP BY后最终返回4294行数据,要求优化到1秒内

SQL大致如下(由于是证券行业,受限于截图有水印以及保密原则,无法给出完整SQL和执行计划,请谅解)

sql 复制代码
select ...,b.一个列,d.一个列
         from (select c_pa_code,
                       n_hldcst,
                       n_hldcst_locl,
                       c_ml_attr,
                       c_cury_code,
                       n_valrate,
                       c_sec_code,
                       c_port_code,
                       d_hold
                  from gzdb.vh_repurchase
                 where c_pa_code in
                       ('MRFSJRZC', 'MCHGJRZC', 'YSLX_ZQ', 'YFLX_ZQ', 'JZZB')) a
          left join gzdb.vb_port_baseinfo b ---b.c_port_code是唯一的
            on a.c_port_code = b.c_port_code
          left join gzdb.vb_security c
            on c.c_sec_code = a.c_sec_code
           and c.c_sec_var in ('CJ', 'HG')
          left join (select c_port_code,
                           d_biz,
                           n_hldmkv_locl,
                           row_number() over(partition by c_port_code, d_biz order by c_update_time desc) as r
                      from gzdb.vn_port_index
                     where c_idx_code = 'ZCJZ'
                       and c_port_class = 'NA') d
            on a.c_port_code = d.c_port_code
           and a.d_hold = d.d_biz
           and d.r = 1
         where a.d_hold = to_date(:endt, 'yyyy-mm-dd')) a
 group by ....

a过滤完之后返回31793行数据,a通过join列谓词推入到d,d被扫描了31793次,a nl d之后累计耗时3.5秒,注意SQL总耗时3.7秒

a.c_port_code,a.d_hold有大量重复值,b和d只返回一个列,b.c_port_code是唯一的,d对关联列partition by后又限制只取1行

优化手段就是把b和d改写到标量子查询中(b不是引起性能问题的重点,可以改到标量子查询中,也可以不改,重点是d)

利用标量子查询CACHE功能减少d扫描次数,减少b扫描次数

具体改写后的SQL就不贴了,改写完之后逻辑读从70W降低到20W,SQL执行时间降低到1秒内

注意:

1.如果a.c_port_code,a.d_hold没有大量重复值,改写为标量子查询无法提升性能

2.如果Oracle标量子查询没有CACHE功能,那就只能把d和b封装到函数中,再对函数加上RESULT CACHE功能来减少扫描次数,这就太麻烦了

为了照顾SQL优化基础较弱的同学,这里举个例子演示一下Oracle(11.2.0.4)标量子查询CACHE功能

sql 复制代码
SQL> select *
  from (select owner, object_id
          from test02
         where owner = 'SYS'
         order by object_id)
 where rownum <= 3;  2    3    4    5    6  

OWNER       OBJECT_ID
------------------------------ ----------
SYS         2
SYS         3
SYS         4

SQL> select *
  from (select owner, object_id
          from test02
         where owner = 'PUBLIC'
         order by object_id)
 where rownum <= 3;  2    3    4    5    6  

OWNER       OBJECT_ID
------------------------------ ----------
PUBLIC              117
PUBLIC              280
PUBLIC              367

SQL> alter session set statistics_level=all;

Session altered.

SQL> select object_id, (select count(*) from test01 where owner = t2.owner) cnt 
  from test02 t2
 where object_id in (2, 3, 4, 117, 280, 367);  2    3  

 OBJECT_ID	  CNT
---------- ----------
   2   19348992
   3   19348992
   4   19348992
 117   17408512
 280   17408512
 367   17408512

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
SQL_ID  dcm994g6n6tq0, child number 3
-------------------------------------
select object_id, (select count(*) from test01 where owner = t2.owner)
cnt   from test02 t2  where object_id in (2, 3, 4, 117, 280, 367)

Plan hash value: 384367355

--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |      6 |00:00:00.01 |      15 |
|   1 |  SORT AGGREGATE              |               |      2 |      1 |      2 |00:00:05.60 |    1271K|
|*  2 |   TABLE ACCESS FULL          | TEST01        |      2 |    421K|     36M|00:00:04.52 |    1271K|
|   3 |  INLIST ITERATOR             |               |      1 |        |      6 |00:00:00.01 |      15 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TEST02        |      6 |      6 |      6 |00:00:00.01 |      15 |
|*  5 |    INDEX RANGE SCAN          | IDX_TEST02_ID |      6 |      6 |      6 |00:00:00.01 |       9 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"=:B1)
   5 - access(("OBJECT_ID"=2 OR "OBJECT_ID"=3 OR "OBJECT_ID"=4 OR "OBJECT_ID"=117 OR
        "OBJECT_ID"=280 OR "OBJECT_ID"=367))

主表TEST02过滤后返回6行数据,通过连接列OWNER传值给标量子查询的子表TEST01

TEST02过滤后OWNER列只有2个DISTINCT值:SYS和PUBLIC

执行计划中Starts表示扫描次数,ID=2的STARTS是2,说明Oracle标量子查询有CACHE功能,如果没有CACHE功能,TEST01应该扫描6次

当然了,表TEST01的OWNER列需要创建索引,地球人都知道,我们这里不创建

现在数据库切换到崖山23.5.1

sql 复制代码
SQL> alter session set statistics_level=all;

Succeed.

SQL> set autot trace
SQL> select object_id, (select count(*) from test01 where owner = t2.owner) cnt
  from test02 t2
 where object_id in (2, 3, 4, 117, 280, 367);   2    3 

Execution Plan                                                   
---------------------------------------------------------------- 
SQL hash value: 2923445856                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | E - Rows | A - Rows | Cost(%CPU)  | A - Time | Loops    | Memory   | Disk     | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |         6|             |        90|         7|         0|         0|                                |
|  1 |  SUBQUERY                      | QUERY[1]             |            |          |         2|             |   7889087|         4|         0|         0|                                |
|  2 |   AGGREGATE                    |                      |            |         1|         2|  1151005( 0)|   7889079|         4|         0|         0|                                |
|* 3 |    TABLE ACCESS FULL           | TEST01               | SCOTT      |       446|  36757504|  1151005( 0)|   6349946|  36757506|         0|         0|                                |
|  4 |  TABLE ACCESS BY INDEX ROWID   | TEST02               | SCOTT      |         6|          |        1( 0)|          |          |          |          |                                |
|* 5 |   INDEX RANGE SCAN             | IDX_TEST02_ID        | SCOTT      |         6|         6|        1( 0)|        88|         7|         0|         0|                                |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   1 - Subquery NDV info - NDV percentage: 0.500000, NDV Expression: ("T2"."OWNER"[OPTMZ-3])
   3 - Predicate : filter("TEST01"."OWNER"[OPTMZ-1] = "T2"."OWNER"[OPTMZ-3][OPTMZ-2])
   5 - Predicate : access("T2"."OBJECT_ID" IN (2[OPTMZ-0], 3[OPTMZ-0], 4[OPTMZ-0], 117[OPTMZ-0], 280[OPTMZ-0], 367[OPTMZ-0]))

Statistics
----------------------------------------------------------------------------------------------------
                    0 physical reads                                                  
              1299440 db block gets                                                   
                    0 consistent gets                                                 
                    0 redo size                                                       
                    0 recursive calls                                                 
                    0 bytes sent via SQL*Net to client                                
                    0 bytes received via SQL*Net from client                          
                    0 SQL*Net roundtrips to/from client                               
                    0 sorts (memory)                                                  
                    0 sorts (disk)                                                    
                    6 rows processed                                                  
                    0 bytes sent via PX                                               
                    0 block received                                                  

33 rows fetched.

Elapsed: 00:00:07.897

崖山的A-TIME看起来没有转换为Oracle A-TIME的时分秒样式,希望改进一下

ID=3 Loops=36757506,这是啥情况(如果有CACHE功能,ID=3的Loops应该显示为2,如果没有CACHE功能Loops应该显示为6),希望改进和Oracle一模一样

Loops不准,那怎么验证崖山标量子查询是否有CACHE功能呢?

可以根据SQL执行时间来判断

1.创建函数(注意:因为Loops不准我才搞的函数来验证,没其他意思)

sql 复制代码
CREATE OR REPLACE FUNCTION f_get_cnt_by_owner(p_owner varchar) RETURN int AS
  v_cnt int;
BEGIN
  select count(*) into v_cnt from test01 where owner = p_owner;
  RETURN v_cnt;
END;
/

2.设置statistics_level=TYPICAL,之前设置的是ALL,还原回来

sql 复制代码
alter session set statistics_level=TYPICAL; 

扫描函数一次,耗时2.7秒

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2);

  OBJECT_ID F_GET_CNT_BY_OWNER(OWNER) 
----------- ------------------------- 
          2                  19348992

1 row fetched.

Elapsed: 00:00:02.707

扫描函数6次,耗时5.7秒,唉?什么情况?这里应该耗时2.7*6=16.2秒啊,怎么等于5.7秒,5.7秒那就是只扫描了2次

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID F_GET_CNT_BY_OWNER(OWNER) 
----------- ------------------------- 
          2                  19348992
          3                  19348992
          4                  19348992
        117                  17408512
        280                  17408512
        367                  17408512

6 rows fetched.

Elapsed: 00:00:05.731

看一下执行计划

sql 复制代码
SQL> set autot trace
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

Execution Plan                                                   
---------------------------------------------------------------- 
SQL hash value: 1080230727                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | E - Rows | A - Rows | Cost(%CPU)  | A - Time | Loops    | Memory   | Disk     | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |          |             |          |          |          |          |                                |
|  1 |  SUBQUERY                      | QUERY[1]             |            |          |          |             |          |          |          |          |                                |
|  2 |   AGGREGATE                    |                      |            |         1|          |  1151005( 0)|          |          |          |          |                                |
|* 3 |    TABLE ACCESS FULL           | TEST01               | SCOTT      |       446|          |  1151005( 0)|          |          |          |          |                                |
|  4 |  TABLE ACCESS BY INDEX ROWID   | TEST02               | SCOTT      |         6|          |        1( 0)|          |          |          |          |                                |
|* 5 |   INDEX RANGE SCAN             | IDX_TEST02_ID        | SCOTT      |         6|          |        1( 0)|          |          |          |          |                                |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   1 - Subquery NDV info - NDV percentage: 0.500000, NDV Expression: ("TEST02"."OWNER")
   3 - Predicate : filter("TEST01"."OWNER" = "TEST02"."OWNER")  
   5 - Predicate : access("TEST02"."OBJECT_ID" IN (2, 3, 4, 117, 280, 367))


Statistics
----------------------------------------------------------------------------------------------------

20 rows fetched.

Elapsed: 00:00:05.612

原来崖山把SELECT调用自定义函数直接转成标量子查询了,同时也证明了崖山标量子查询有CACHE功能

在Oracle中,SELECT后调用自定义函数,要改写为SELECT (SELECT 自定义函数 FROM DUAL) FROM ...

或者是对函数开启RESULT CACHE功能才能减少函数被调用次数

sql 复制代码
SQL> select object_id,(select f_get_cnt_by_owner(owner) from dual) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID (SELECTF_GET_CNT_BY_OWNER(OWNER)FROMDUAL) 
----------- ----------------------------------------- 
          2                                  19348992
          3                                  19348992
          4                                  19348992
        117                                  17408512
        280                                  17408512
        367                                  17408512

6 rows fetched.

Elapsed: 00:00:05.803

现在把函数稍作修改

sql 复制代码
CREATE OR REPLACE FUNCTION f_get_cnt_by_owner(p_owner varchar) RETURN int AS
  v_cnt int;
BEGIN
  IF 1=1 THEN  ---加了这里
  select count(*) into v_cnt from test01 where owner = p_owner;
  END IF;      ---加了这里
  RETURN v_cnt;
END;
/

SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID F_GET_CNT_BY_OWNER(OWNER) 
----------- ------------------------- 
          2                  19348992
          3                  19348992
          4                  19348992
        117                  17408512
        280                  17408512
        367                  17408512

6 rows fetched.

Elapsed: 00:00:15.228

SQL耗时15.2秒,2.7*6=16.2,和15.2秒接近,说明函数被调用了6次

sql 复制代码
SQL> select object_id,(select f_get_cnt_by_owner(owner) from dual) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID (SELECTF_GET_CNT_BY_OWNER(OWNER)FROMDUAL) 
----------- ----------------------------------------- 
          2                                  19348992
          3                                  19348992
          4                                  19348992
        117                                  17408512
        280                                  17408512
        367                                  17408512

6 rows fetched.

Elapsed: 00:00:09.910

把SELECT调用自定义函数改写为SELECT (SELECT 自定义函数 FROM DUAL) FROM写法,SQL耗时从15.2秒降低到9.9秒

笔者对9.9秒耗时有个疑问,耗时应该是2.7*2=5.4-6秒之间,怎么是9.9秒呢,希望崖山后续优化一下这个场景

通过上面测试,发现崖山SELECT调用自定义函数,如果自定义函数是纯SQL,能自动把自定义函数转为标量子查询

如果自定义函数不是纯SQL,无法转换,还是要把SELECT调用自定义函数改写为SELECT (SELECT 自定义函数 FROM DUAL) FROM

我们现在切回到Oracle11.2.0.4

1.创建函数,不加IF 1=1

sql 复制代码
CREATE OR REPLACE FUNCTION f_get_cnt_by_owner(p_owner varchar) RETURN int AS
  v_cnt int;
BEGIN
  select count(*) into v_cnt from test01 where owner = p_owner;
  RETURN v_cnt;
END;
/

2.设置statistics_level=TYPICAL,之前设置的是ALL,还原回来

sql 复制代码
alter session set statistics_level=TYPICAL; 

扫描函数1次,耗时1.25秒

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2);

 OBJECT_ID F_GET_CNT_BY_OWNER(OWNER)
---------- -------------------------
	 2		    19348992

Elapsed: 00:00:01.25

扫描函数6次,耗时7.75秒

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

 OBJECT_ID F_GET_CNT_BY_OWNER(OWNER)
---------- -------------------------
   2        19348992
   3        19348992
   4        19348992
 117        17408512
 280        17408512
 367        17408512

6 rows selected.

Elapsed: 00:00:07.75

将函数调用改写到标量子查询中,耗时从7.75秒降低到2.54秒,也就是说函数只调用了2次

sql 复制代码
SQL> select object_id,(select f_get_cnt_by_owner(owner) from dual) from test02 where object_id in(2,3,4,117,280,367);

 OBJECT_ID (SELECTF_GET_CNT_BY_OWNER(OWNER)FROMDUAL)
---------- -----------------------------------------
   2            19348992
   3            19348992
   4            19348992
 117            17408512
 280            17408512
 367            17408512

6 rows selected.

Elapsed: 00:00:02.54

结论:

1.崖山标量子查询有CACHE功能,值得表扬,很多国产数据库没有这个特性

2.崖山SELECT调用自定义函数,如果自定义函数是纯SQL会被优化器自动改写为标量子查询,如果自定义函数不是纯SQL,还是和Oracle一样要人工改写为标量子查询写法

3.希望崖山早日修复A-TIME,Loops瑕疵,最好把Loops改成Oracle Starts的效果

4.在Oracle中FILTER算法和标量子查询算法一模一样,本文不讨论FILTER CACHE,因为FILTER崖山优化器行为和ORACLE差异很大,需要单独写篇文章讨论

5.崖山SELECT (SELECT 非纯SQL自定义函数 FROM DUAL) FROM可能需要内核再优化优化

相关推荐
老华带你飞2 小时前
农产品销售管理|基于java + vue农产品销售管理系统(源码+数据库+文档)
java·开发语言·前端·数据库·vue.js·spring boot·后端
SelectDB2 小时前
5 倍性能提升,Apache Doris TopN 全局优化详解|Deep Dive
数据库·apache
JIngJaneIL3 小时前
基于springboot + vue房屋租赁管理系统(源码+数据库+文档)
java·开发语言·前端·数据库·vue.js·spring boot·后端
陈平安安3 小时前
设计一个秒杀功能
java·数据库·sql
isNotNullX3 小时前
数据处理的流程是什么?如何进行数据预处理?
数据库·数据资产管理·数据处理·企业数字化
TAEHENGV3 小时前
基本设置模块 Cordova 与 OpenHarmony 混合开发实战
android·java·数据库
Leo1873 小时前
MySQL 回表(Back to Table)详解
数据库·mysql
不知江月待何人..3 小时前
MySQL服务无法启动问题
数据库·mysql
廋到被风吹走4 小时前
【数据库】【Oracle】SQL基础
数据库·sql·oracle