验证崖山数据库标量子查询是否带有CACHE功能

Oracle标量子查询算法如下:

主表返回多少行,子表就要被扫描COUNT(DISTINCT NVL(主表JOIN列,0))这么多次,也就是说Oracle标量子查询带有CACHE功能

大多数国产数据库标量子查询还未实现CACHE功能,算法还停留在主表返回多少行,子表被扫描多少次

标量子查询CACHE功能在一些极致性能优化场景中非常有用,笔者曾利用该特性优化过几百条SQL

今年夏天(2025年)给某头部券商做SQL优化的时候遇到一条SQL跑3.7秒,逻辑读70W,GROUP BY后最终返回4294行数据,要求优化到1秒内

SQL大致如下(由于是证券行业,受限于截图有水印以及保密原则,无法给出完整SQL和执行计划,请谅解)

sql 复制代码
select ...,b.一个列,d.一个列
         from (select c_pa_code,
                       n_hldcst,
                       n_hldcst_locl,
                       c_ml_attr,
                       c_cury_code,
                       n_valrate,
                       c_sec_code,
                       c_port_code,
                       d_hold
                  from gzdb.vh_repurchase
                 where c_pa_code in
                       ('MRFSJRZC', 'MCHGJRZC', 'YSLX_ZQ', 'YFLX_ZQ', 'JZZB')) a
          left join gzdb.vb_port_baseinfo b ---b.c_port_code是唯一的
            on a.c_port_code = b.c_port_code
          left join gzdb.vb_security c
            on c.c_sec_code = a.c_sec_code
           and c.c_sec_var in ('CJ', 'HG')
          left join (select c_port_code,
                           d_biz,
                           n_hldmkv_locl,
                           row_number() over(partition by c_port_code, d_biz order by c_update_time desc) as r
                      from gzdb.vn_port_index
                     where c_idx_code = 'ZCJZ'
                       and c_port_class = 'NA') d
            on a.c_port_code = d.c_port_code
           and a.d_hold = d.d_biz
           and d.r = 1
         where a.d_hold = to_date(:endt, 'yyyy-mm-dd')) a
 group by ....

a过滤完之后返回31793行数据,a通过join列谓词推入到d,d被扫描了31793次,a nl d之后累计耗时3.5秒,注意SQL总耗时3.7秒

a.c_port_code,a.d_hold有大量重复值,b和d只返回一个列,b.c_port_code是唯一的,d对关联列partition by后又限制只取1行

优化手段就是把b和d改写到标量子查询中(b不是引起性能问题的重点,可以改到标量子查询中,也可以不改,重点是d)

利用标量子查询CACHE功能减少d扫描次数,减少b扫描次数

具体改写后的SQL就不贴了,改写完之后逻辑读从70W降低到20W,SQL执行时间降低到1秒内

注意:

1.如果a.c_port_code,a.d_hold没有大量重复值,改写为标量子查询无法提升性能

2.如果Oracle标量子查询没有CACHE功能,那就只能把d和b封装到函数中,再对函数加上RESULT CACHE功能来减少扫描次数,这就太麻烦了

为了照顾SQL优化基础较弱的同学,这里举个例子演示一下Oracle(11.2.0.4)标量子查询CACHE功能

sql 复制代码
SQL> select *
  from (select owner, object_id
          from test02
         where owner = 'SYS'
         order by object_id)
 where rownum <= 3;  2    3    4    5    6  

OWNER       OBJECT_ID
------------------------------ ----------
SYS         2
SYS         3
SYS         4

SQL> select *
  from (select owner, object_id
          from test02
         where owner = 'PUBLIC'
         order by object_id)
 where rownum <= 3;  2    3    4    5    6  

OWNER       OBJECT_ID
------------------------------ ----------
PUBLIC              117
PUBLIC              280
PUBLIC              367

SQL> alter session set statistics_level=all;

Session altered.

SQL> select object_id, (select count(*) from test01 where owner = t2.owner) cnt 
  from test02 t2
 where object_id in (2, 3, 4, 117, 280, 367);  2    3  

 OBJECT_ID	  CNT
---------- ----------
   2   19348992
   3   19348992
   4   19348992
 117   17408512
 280   17408512
 367   17408512

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
SQL_ID  dcm994g6n6tq0, child number 3
-------------------------------------
select object_id, (select count(*) from test01 where owner = t2.owner)
cnt   from test02 t2  where object_id in (2, 3, 4, 117, 280, 367)

Plan hash value: 384367355

--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |      1 |        |      6 |00:00:00.01 |      15 |
|   1 |  SORT AGGREGATE              |               |      2 |      1 |      2 |00:00:05.60 |    1271K|
|*  2 |   TABLE ACCESS FULL          | TEST01        |      2 |    421K|     36M|00:00:04.52 |    1271K|
|   3 |  INLIST ITERATOR             |               |      1 |        |      6 |00:00:00.01 |      15 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TEST02        |      6 |      6 |      6 |00:00:00.01 |      15 |
|*  5 |    INDEX RANGE SCAN          | IDX_TEST02_ID |      6 |      6 |      6 |00:00:00.01 |       9 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"=:B1)
   5 - access(("OBJECT_ID"=2 OR "OBJECT_ID"=3 OR "OBJECT_ID"=4 OR "OBJECT_ID"=117 OR
        "OBJECT_ID"=280 OR "OBJECT_ID"=367))

主表TEST02过滤后返回6行数据,通过连接列OWNER传值给标量子查询的子表TEST01

TEST02过滤后OWNER列只有2个DISTINCT值:SYS和PUBLIC

执行计划中Starts表示扫描次数,ID=2的STARTS是2,说明Oracle标量子查询有CACHE功能,如果没有CACHE功能,TEST01应该扫描6次

当然了,表TEST01的OWNER列需要创建索引,地球人都知道,我们这里不创建

现在数据库切换到崖山23.5.1

sql 复制代码
SQL> alter session set statistics_level=all;

Succeed.

SQL> set autot trace
SQL> select object_id, (select count(*) from test01 where owner = t2.owner) cnt
  from test02 t2
 where object_id in (2, 3, 4, 117, 280, 367);   2    3 

Execution Plan                                                   
---------------------------------------------------------------- 
SQL hash value: 2923445856                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | E - Rows | A - Rows | Cost(%CPU)  | A - Time | Loops    | Memory   | Disk     | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |         6|             |        90|         7|         0|         0|                                |
|  1 |  SUBQUERY                      | QUERY[1]             |            |          |         2|             |   7889087|         4|         0|         0|                                |
|  2 |   AGGREGATE                    |                      |            |         1|         2|  1151005( 0)|   7889079|         4|         0|         0|                                |
|* 3 |    TABLE ACCESS FULL           | TEST01               | SCOTT      |       446|  36757504|  1151005( 0)|   6349946|  36757506|         0|         0|                                |
|  4 |  TABLE ACCESS BY INDEX ROWID   | TEST02               | SCOTT      |         6|          |        1( 0)|          |          |          |          |                                |
|* 5 |   INDEX RANGE SCAN             | IDX_TEST02_ID        | SCOTT      |         6|         6|        1( 0)|        88|         7|         0|         0|                                |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   1 - Subquery NDV info - NDV percentage: 0.500000, NDV Expression: ("T2"."OWNER"[OPTMZ-3])
   3 - Predicate : filter("TEST01"."OWNER"[OPTMZ-1] = "T2"."OWNER"[OPTMZ-3][OPTMZ-2])
   5 - Predicate : access("T2"."OBJECT_ID" IN (2[OPTMZ-0], 3[OPTMZ-0], 4[OPTMZ-0], 117[OPTMZ-0], 280[OPTMZ-0], 367[OPTMZ-0]))

Statistics
----------------------------------------------------------------------------------------------------
                    0 physical reads                                                  
              1299440 db block gets                                                   
                    0 consistent gets                                                 
                    0 redo size                                                       
                    0 recursive calls                                                 
                    0 bytes sent via SQL*Net to client                                
                    0 bytes received via SQL*Net from client                          
                    0 SQL*Net roundtrips to/from client                               
                    0 sorts (memory)                                                  
                    0 sorts (disk)                                                    
                    6 rows processed                                                  
                    0 bytes sent via PX                                               
                    0 block received                                                  

33 rows fetched.

Elapsed: 00:00:07.897

崖山的A-TIME看起来没有转换为Oracle A-TIME的时分秒样式,希望改进一下

ID=3 Loops=36757506,这是啥情况(如果有CACHE功能,ID=3的Loops应该显示为2,如果没有CACHE功能Loops应该显示为6),希望改进和Oracle一模一样

Loops不准,那怎么验证崖山标量子查询是否有CACHE功能呢?

可以根据SQL执行时间来判断

1.创建函数(注意:因为Loops不准我才搞的函数来验证,没其他意思)

sql 复制代码
CREATE OR REPLACE FUNCTION f_get_cnt_by_owner(p_owner varchar) RETURN int AS
  v_cnt int;
BEGIN
  select count(*) into v_cnt from test01 where owner = p_owner;
  RETURN v_cnt;
END;
/

2.设置statistics_level=TYPICAL,之前设置的是ALL,还原回来

sql 复制代码
alter session set statistics_level=TYPICAL; 

扫描函数一次,耗时2.7秒

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2);

  OBJECT_ID F_GET_CNT_BY_OWNER(OWNER) 
----------- ------------------------- 
          2                  19348992

1 row fetched.

Elapsed: 00:00:02.707

扫描函数6次,耗时5.7秒,唉?什么情况?这里应该耗时2.7*6=16.2秒啊,怎么等于5.7秒,5.7秒那就是只扫描了2次

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID F_GET_CNT_BY_OWNER(OWNER) 
----------- ------------------------- 
          2                  19348992
          3                  19348992
          4                  19348992
        117                  17408512
        280                  17408512
        367                  17408512

6 rows fetched.

Elapsed: 00:00:05.731

看一下执行计划

sql 复制代码
SQL> set autot trace
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

Execution Plan                                                   
---------------------------------------------------------------- 
SQL hash value: 1080230727                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | E - Rows | A - Rows | Cost(%CPU)  | A - Time | Loops    | Memory   | Disk     | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |          |             |          |          |          |          |                                |
|  1 |  SUBQUERY                      | QUERY[1]             |            |          |          |             |          |          |          |          |                                |
|  2 |   AGGREGATE                    |                      |            |         1|          |  1151005( 0)|          |          |          |          |                                |
|* 3 |    TABLE ACCESS FULL           | TEST01               | SCOTT      |       446|          |  1151005( 0)|          |          |          |          |                                |
|  4 |  TABLE ACCESS BY INDEX ROWID   | TEST02               | SCOTT      |         6|          |        1( 0)|          |          |          |          |                                |
|* 5 |   INDEX RANGE SCAN             | IDX_TEST02_ID        | SCOTT      |         6|          |        1( 0)|          |          |          |          |                                |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   1 - Subquery NDV info - NDV percentage: 0.500000, NDV Expression: ("TEST02"."OWNER")
   3 - Predicate : filter("TEST01"."OWNER" = "TEST02"."OWNER")  
   5 - Predicate : access("TEST02"."OBJECT_ID" IN (2, 3, 4, 117, 280, 367))


Statistics
----------------------------------------------------------------------------------------------------

20 rows fetched.

Elapsed: 00:00:05.612

原来崖山把SELECT调用自定义函数直接转成标量子查询了,同时也证明了崖山标量子查询有CACHE功能

在Oracle中,SELECT后调用自定义函数,要改写为SELECT (SELECT 自定义函数 FROM DUAL) FROM ...

或者是对函数开启RESULT CACHE功能才能减少函数被调用次数

sql 复制代码
SQL> select object_id,(select f_get_cnt_by_owner(owner) from dual) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID (SELECTF_GET_CNT_BY_OWNER(OWNER)FROMDUAL) 
----------- ----------------------------------------- 
          2                                  19348992
          3                                  19348992
          4                                  19348992
        117                                  17408512
        280                                  17408512
        367                                  17408512

6 rows fetched.

Elapsed: 00:00:05.803

现在把函数稍作修改

sql 复制代码
CREATE OR REPLACE FUNCTION f_get_cnt_by_owner(p_owner varchar) RETURN int AS
  v_cnt int;
BEGIN
  IF 1=1 THEN  ---加了这里
  select count(*) into v_cnt from test01 where owner = p_owner;
  END IF;      ---加了这里
  RETURN v_cnt;
END;
/

SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID F_GET_CNT_BY_OWNER(OWNER) 
----------- ------------------------- 
          2                  19348992
          3                  19348992
          4                  19348992
        117                  17408512
        280                  17408512
        367                  17408512

6 rows fetched.

Elapsed: 00:00:15.228

SQL耗时15.2秒,2.7*6=16.2,和15.2秒接近,说明函数被调用了6次

sql 复制代码
SQL> select object_id,(select f_get_cnt_by_owner(owner) from dual) from test02 where object_id in(2,3,4,117,280,367);

  OBJECT_ID (SELECTF_GET_CNT_BY_OWNER(OWNER)FROMDUAL) 
----------- ----------------------------------------- 
          2                                  19348992
          3                                  19348992
          4                                  19348992
        117                                  17408512
        280                                  17408512
        367                                  17408512

6 rows fetched.

Elapsed: 00:00:09.910

把SELECT调用自定义函数改写为SELECT (SELECT 自定义函数 FROM DUAL) FROM写法,SQL耗时从15.2秒降低到9.9秒

笔者对9.9秒耗时有个疑问,耗时应该是2.7*2=5.4-6秒之间,怎么是9.9秒呢,希望崖山后续优化一下这个场景

通过上面测试,发现崖山SELECT调用自定义函数,如果自定义函数是纯SQL,能自动把自定义函数转为标量子查询

如果自定义函数不是纯SQL,无法转换,还是要把SELECT调用自定义函数改写为SELECT (SELECT 自定义函数 FROM DUAL) FROM

我们现在切回到Oracle11.2.0.4

1.创建函数,不加IF 1=1

sql 复制代码
CREATE OR REPLACE FUNCTION f_get_cnt_by_owner(p_owner varchar) RETURN int AS
  v_cnt int;
BEGIN
  select count(*) into v_cnt from test01 where owner = p_owner;
  RETURN v_cnt;
END;
/

2.设置statistics_level=TYPICAL,之前设置的是ALL,还原回来

sql 复制代码
alter session set statistics_level=TYPICAL; 

扫描函数1次,耗时1.25秒

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2);

 OBJECT_ID F_GET_CNT_BY_OWNER(OWNER)
---------- -------------------------
	 2		    19348992

Elapsed: 00:00:01.25

扫描函数6次,耗时7.75秒

sql 复制代码
SQL> select object_id,f_get_cnt_by_owner(owner) from test02 where object_id in(2,3,4,117,280,367);

 OBJECT_ID F_GET_CNT_BY_OWNER(OWNER)
---------- -------------------------
   2        19348992
   3        19348992
   4        19348992
 117        17408512
 280        17408512
 367        17408512

6 rows selected.

Elapsed: 00:00:07.75

将函数调用改写到标量子查询中,耗时从7.75秒降低到2.54秒,也就是说函数只调用了2次

sql 复制代码
SQL> select object_id,(select f_get_cnt_by_owner(owner) from dual) from test02 where object_id in(2,3,4,117,280,367);

 OBJECT_ID (SELECTF_GET_CNT_BY_OWNER(OWNER)FROMDUAL)
---------- -----------------------------------------
   2            19348992
   3            19348992
   4            19348992
 117            17408512
 280            17408512
 367            17408512

6 rows selected.

Elapsed: 00:00:02.54

结论:

1.崖山标量子查询有CACHE功能,值得表扬,很多国产数据库没有这个特性

2.崖山SELECT调用自定义函数,如果自定义函数是纯SQL会被优化器自动改写为标量子查询,如果自定义函数不是纯SQL,还是和Oracle一样要人工改写为标量子查询写法

3.希望崖山早日修复A-TIME,Loops瑕疵,最好把Loops改成Oracle Starts的效果

4.在Oracle中FILTER算法和标量子查询算法一模一样,本文不讨论FILTER CACHE,因为FILTER崖山优化器行为和ORACLE差异很大,需要单独写篇文章讨论

5.崖山SELECT (SELECT 非纯SQL自定义函数 FROM DUAL) FROM可能需要内核再优化优化

相关推荐
爱可生开源社区1 天前
2026 年,优秀的 DBA 需要具备哪些素质?
数据库·人工智能·dba
随逸1771 天前
《从零搭建NestJS项目》
数据库·typescript
加号32 天前
windows系统下mysql多源数据库同步部署
数据库·windows·mysql
シ風箏2 天前
MySQL【部署 04】Docker部署 MySQL8.0.32 版本(网盘镜像及启动命令分享)
数据库·mysql·docker
李慕婉学姐2 天前
Springboot智慧社区系统设计与开发6n99s526(程序+源码+数据库+调试部署+开发环境)带论文文档1万字以上,文末可获取,系统界面在最后面。
数据库·spring boot·后端
百锦再2 天前
Django实现接口token检测的实现方案
数据库·python·django·sqlite·flask·fastapi·pip
tryCbest2 天前
数据库SQL学习
数据库·sql
jnrjian2 天前
ORA-01017 查找机器名 用户名 以及library cache lock 参数含义
数据库·oracle
十月南城2 天前
数据湖技术对比——Iceberg、Hudi、Delta的表格格式与维护策略
大数据·数据库·数据仓库·hive·hadoop·spark
Henry Zhu1232 天前
数据库:并发控制基本概念
服务器·数据库