记录一次 postgresql 优化案例( volatility 自定义函数无法并行查询 )

同事最近做个金融适配项目,找我看条SQL,告知ORACLE跑1分钟,PG要跑30分钟(其实并没有这么夸张)😎, 废话不说,贴慢SQL。

慢SQL(关键信息已经加密)

复制代码
explain analyze
SELECT 
    c_qxxxxaode,
    '2023-03-22 00:00:00' AS d_cdate,
    SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60) THEN 1 ELSE 0 END) AS bt5ycusts,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN 1 ELSE 0 END) AS bt3to5ycusts,
    SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN 1 ELSE 0 END) AS bt3ycusts,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12) THEN 1 ELSE 0 END) AS bt1yto3ycusts,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6) THEN 1 ELSE 0 END) AS bt6to12mcusts,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3) THEN 1 ELSE 0 END) AS bt3to6mcusts,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3) THEN 1 ELSE 0 END) AS btlose3mcusts,
    SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60) THEN f_qwwvddvvzz ELSE 0 END) AS bt5yshares,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN f_qwwvddvvzz ELSE 0 END) AS bt3to5yshares,
    SUM(CASE WHEN l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36) THEN f_qwwvddvvzz ELSE 0 END) AS bt3yshares,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12) THEN f_qwwvddvvzz ELSE 0 END) AS bt1to3yshares,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6) THEN f_qwwvddvvzz ELSE 0 END) AS bt6to12mshares,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3) THEN f_qwwvddvvzz ELSE 0 END) AS bt3to6mshares,
    SUM(CASE WHEN l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3) THEN f_qwwvddvvzz ELSE 0 END) AS btlose3mshares,
    round(AVG(months_between('2023-03-22 00:00:00', l.f_qdqdqdq)), 2) AS avgmonth,
    round(AVG(nvl(f_qwwvddvvzz, 0)), 2) AS avgshares,
    COUNT(a_daccoxxz) AS custsum,
    SUM(f_qwwvddvvzz) AS sharessum
FROM 
    stcdlbxxxxx l
WHERE 
    nvl(l.f_qwwvddvvzz, 0) > 0 AND 
    l.f_qdqdqdq <= '2023-03-22 00:00:00' AND 
    l.a_daccoxxz <> '996000000000' AND 
    c_qxxxxaode IN (SELECT c_qxxxxaode FROM  xdsfffs WHERE c_raisetype = '1')
GROUP BY 
    c_qxxxxaode;

执行计划:

复制代码
HashAggregate  (cost=1043326.49..1043332.91 rows=151 width=424) (actual time=381246.429..381246.640 rows=150 loops=1)
  Group Key: l.c_qxxxxaode
  ->  Hash Semi Join  (cost=8.78..936347.95 rows=301348 width=38) (actual time=0.057..30237.230 rows=30056793 loops=1)
        Hash Cond: (l.c_qxxxxaode = tfundinfo.c_qxxxxaode)
        ->  Seq Scan on stcdlbxxxxx l  (cost=0.00..906618.70 rows=10044941 width=38) (actual time=0.008..25908.814 rows=30157190 loops=1)
"              Filter: ((NVL(f_qwwvddvvzz, '0'::numeric) > '0'::numeric) AND (f_qdqdqdq <= '2023-03-22 00:00:00'::timestamp without time zone) AND (a_daccoxxz <> '996000000000'::text))"
              Rows Removed by Filter: 4842810
        ->  Hash  (cost=6.91..6.91 rows=150 width=8) (actual time=0.046..0.047 rows=150 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 14kB
              ->  Index Only Scan using idx_tfundinfo_fundcode on  xdsfffs  (cost=0.28..6.91 rows=150 width=8) (actual time=0.021..0.037 rows=150 loops=1)
                    Index Cond: (c_raisetype = '1'::text)
                    Heap Fetches: 0
Planning Time: 0.512 ms
Execution Time: 381246.699 ms
复制代码
select count(1) from stcdlbxxxxx;
  count   
----------
 35000000
(1 row)

stcdlbxxxxx 表数据量 3500W,数据量不算大,这条SQL主要含义应该是对 fact 表做统计,一堆聚合函数,fact 表是数仓概念,如果不明白可以去百度。

这条SQL主要慢在HashAggregate这个节点上,Hash Semi Join 花了 30237.230毫秒(30秒),然后到上面 Group Key: l.c_qxxxxaode 分个组以后,HashAggregate 直接飙到 381246.640毫秒(6.3分钟)。

其实这条SQL最理想的状态是走 HashAggregate + parallel 的计划,但是优化器并没有这样做,我怀疑可能是SQL写法导致优化器没有走并行,没多想,直接改写了一版的SQL让同事去试试。****

改写版本SQL:(PG独占的语法,聚合函数新增 FILTER 属性,代替 case when 写法)

复制代码
explain analyze
SELECT  c_qxxxxaode,
        '2023-03-22 00:00:00' AS d_cdate,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5ycusts,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5ycusts,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3ycusts,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1yto3ycusts,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mcusts,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mcusts,
        COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mcusts,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5yshares,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5yshares,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3yshares,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1to3yshares,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mshares,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mshares,
        SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mshares,
        ROUND(AVG(months_between('2023-03-22 00:00:00', l.f_qdqdqdq)), 2) AS avgmonth,
        ROUND(AVG(NVL(f_qwwvddvvzz, 0)), 2) AS avgshares,
        COUNT(a_daccoxxz) AS custsum,
        SUM(f_qwwvddvvzz) AS sharessum
FROM stcdlbxxxxx l
WHERE NVL(l.f_qwwvddvvzz, 0) > 0
  AND l.f_qdqdqdq <= '2023-03-22 00:00:00'
  AND l.a_daccoxxz <> '996000000000'
  AND c_qxxxxaode IN (SELECT c_qxxxxaode FROM  xdsfffs WHERE c_raisetype = '1')
GROUP BY c_qxxxxaode;

改写后执行计划:

复制代码
HashAggregate  (cost=1043326.49..1043332.91 rows=151 width=424) (actual time=380246.621..380246.849 rows=150 loops=1)
  Group Key: l.c_qxxxxaode
  ->  Hash Semi Join  (cost=8.78..936347.95 rows=301348 width=38) (actual time=0.055..29983.463 rows=30056793 loops=1)
        Hash Cond: (l.c_qxxxxaode = tfundinfo.c_qxxxxaode)
        ->  Seq Scan on stcdlbxxxxx l  (cost=0.00..906618.70 rows=10044941 width=38) (actual time=0.008..25415.490 rows=30157190 loops=1)
"              Filter: ((NVL(f_qwwvddvvzz, '0'::numeric) > '0'::numeric) AND (f_qdqdqdq <= '2023-03-22 00:00:00'::timestamp without time zone) AND (a_daccoxxz <> '996000000000'::text))"
              Rows Removed by Filter: 4842810
        ->  Hash  (cost=6.91..6.91 rows=150 width=8) (actual time=0.043..0.044 rows=150 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 14kB
              ->  Index Only Scan using idx_tfundinfo_fundcode on  xdsfffs  (cost=0.28..6.91 rows=150 width=8) (actual time=0.018..0.035 rows=150 loops=1)
                    Index Cond: (c_raisetype = '1'::text)
                    Heap Fetches: 0
Planning Time: 0.533 ms
Execution Time: 380246.909 ms

当时我看到这个改写后执行计划就无语了😶,和原SQL的计划一毛一样,直接把我整懵逼,怀疑是否现场环境没开并行参数,但是被同事告知是已经设置了并行参数。

复制代码
qtbg=> show max_parallel_workers_per_gather 
qtbg-> ;
 max_parallel_workers_per_gather 
---------------------------------
 16
(1 row)

了解到情况以后开始研究SQL,发现使用了两个自定义函数,add_months,months_between(我们研发写的函数、为了兼容ORACLE),然后看了下这两个函数的信息。

复制代码
qtbg=> \df+ add_months
                                                                                                                       List of functions
 Schema |    Name    | Result data type |        Argument data types        | Type | Volatility | Parallel | Owner  | Security | Access privileges | Language |                                   Source code                                    | Description 
--------+------------+------------------+-----------------------------------+------+------------+----------+--------+----------+-------------------+----------+----------------------------------------------------------------------------------+-------------
 sys    | add_months | pg_catalog.date  | pg_catalog.date, boolean          | func | immutable  | safe     | system | invoker  |                   | c        | add_months_bool                                                                  | 
 sys    | add_months | pg_catalog.date  | pg_catalog.date, integer          | func | immutable  | safe     | system | invoker  |                   | c        | add_months                                                                       | 
 sys    | add_months | date             | timestamp with time zone, integer | func | immutable  | safe     | system | invoker  |                   | sql      |  select (add_months($1::pg_catalog.date, $2)  + $1::pg_catalog.time)::sys.date;  | 
(3 rows)

qtbg=> \df+ months_between
                                                                                                                                                           List of functions
 Schema |      Name      | Result data type |                Argument data types                 | Type | Volatility | Parallel | Owner  | Security | Access privileges | Language |                                                             Source code                                                             | Descri
ption 
--------+----------------+------------------+----------------------------------------------------+------+------------+----------+--------+----------+-------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------+-------
------
 sys    | months_between | double precision | date, date                                         | func | volatile   | unsafe   | system | invoker  |                   | plsql    |                                                                                                                                    +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |   begin                                                                                                                            +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    case when                                                                                                                       +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (last_day($1) = $1 and last_day($2) = $2)                                                                                     +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |     or                                                                                                                             +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(day from $1) = extract(day from $2))                                                                                 +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    then                                                                                                                            +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |     return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float +                                  +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(month from $1)::int - extract(month from $2)::int)::float);                                                          +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    else                                                                                                                            +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |     return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float +                                  +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(month from $1)::int - extract(month from $2)::int)::float +                                                          +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(day from $1)::int - extract(day from $2)::int)/31::float +                                                           +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(hour from $1)::int * 3600 + extract(minutes from $1)::int * 60 + extract(seconds from $1)::int)/(3600*24*31)::float -+| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(hour from $2)::int * 3600 + extract(minutes from $2)::int * 60 + extract(seconds from $2)::int)/(3600*24*31)::float);+| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    end case;                                                                                                                       +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |   end;                                                                                                                             +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |                                                                                                                                     | 
 sys    | months_between | double precision | timestamp with time zone, timestamp with time zone | func | volatile   | unsafe   | system | invoker  |                   | plsql    |                                                                                                                                    +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |   begin                                                                                                                            +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    case when                                                                                                                       +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (last_day($1) = $1 and last_day($2) = $2)                                                                                     +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |     or                                                                                                                             +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(day from $1) = extract(day from $2))                                                                                 +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    then                                                                                                                            +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |     return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float +                                  +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(month from $1)::int - extract(month from $2)::int)::float);                                                          +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    else                                                                                                                            +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |     return (select (extract(years from $1)::int * 12 - extract(years from $2)::int * 12)::float +                                  +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(month from $1)::int - extract(month from $2)::int)::float +                                                          +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(day from $1)::int - extract(day from $2)::int)/31::float +                                                           +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(hour from $1)::int * 3600 + extract(minutes from $1)::int * 60 + extract(seconds from $1)::int)/(3600*24*31)::float -+| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |      (extract(hour from $2)::int * 3600 + extract(minutes from $2)::int * 60 + extract(seconds from $2)::int)/(3600*24*31)::float);+| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |    end case;                                                                                                                       +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |   end;                                                                                                                             +| 
        |                |                  |                                                    |      |            |          |        |          |                   |          |                                                                                                                                     | 
(2 rows)

当时就给我整崩溃了,研发真的太坑啦😂😂😂。

months_between 函数居然还是 volatile(不稳定)状态,我一直以为研发写好的ORACLE 兼容函数都是 immutable 或者 是 stable 状态,所以刚开始也没往这方面想。

PG的自定义函数有三种状态,volatile、immutable 、stable 。

volatile 是不可以走并行的,这三种状态具体代表的是什么含义,如果不清楚的同学可以下去百度,这里就不废话了,我也怕解释不清楚。😁

和客户沟通以后,我写了一个 months_between1 函数代替原来的 months_between函数,返回结果小数点的精度可能和ORACLE有点区别,但是客户表示可以接受。

months_between1 函数:

复制代码
CREATE OR REPLACE FUNCTION months_between1(date1 DATE, date2 DATE) 
RETURNS FLOAT AS $$
DECLARE
    years_diff INT;
    months_diff INT;
    days_diff FLOAT;
BEGIN
    years_diff := EXTRACT(YEAR FROM date1) - EXTRACT(YEAR FROM date2);
    months_diff := EXTRACT(MONTH FROM date1) - EXTRACT(MONTH FROM date2);
    days_diff := EXTRACT(DAY FROM date1) - EXTRACT(DAY FROM date2);
    days_diff := days_diff / 30.0;  
    RETURN (years_diff * 12) + months_diff + days_diff;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

months_between1 设置可以并行:

复制代码
ALTER FUNCTION months_between1(date, date) PARALLEL SAFE;

months_between1 函数信息:

复制代码
\df+ months_between1                                           
                                                                                                                List of functions
 Schema |      Name       | Result data type |  Argument data types   | Type | Volatility | Parallel | Owner | Security | Access privileges | Language |                                Source code                                | Description 
--------+-----------------+------------------+------------------------+------+------------+----------+-------+----------+-------------------+----------+---------------------------------------------------------------------------+-------------
 scott  | months_between1 | double precision | date1 date, date2 date | func | immutable  | safe     | qtbg  | invoker  |                   | plpgsql  |                                                                          +| 
        |                 |                  |                        |      |            |          |       |          |                   |          | DECLARE                                                                  +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     years_diff INT;                                                      +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     months_diff INT;                                                     +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     days_diff FLOAT;                                                     +| 
        |                 |                  |                        |      |            |          |       |          |                   |          | BEGIN                                                                    +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     years_diff := EXTRACT(YEAR FROM date1) - EXTRACT(YEAR FROM date2);   +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     months_diff := EXTRACT(MONTH FROM date1) - EXTRACT(MONTH FROM date2);+| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     days_diff := EXTRACT(DAY FROM date1) - EXTRACT(DAY FROM date2);      +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     days_diff := days_diff / 30.0;                                       +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |     RETURN (years_diff * 12) + months_diff + days_diff;                  +| 
        |                 |                  |                        |      |            |          |       |          |                   |          | END;                                                                     +| 
        |                 |                  |                        |      |            |          |       |          |                   |          |                                                                           | 
(1 row)

最后执行SQL测试效率:

复制代码
 1 explain analyze
 2 SELECT  c_qxxxxaode,
 3         '2023-03-22 00:00:00' AS d_cdate,
 4         COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5ycusts,
 5         COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5ycusts,
 6         COUNT(*) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3ycusts,
 7         COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1yto3ycusts,
 8         COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mcusts,
 9         COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mcusts,
10         COUNT(*) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mcusts,
11         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -60)) AS bt5yshares,
12         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -60) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3to5yshares,
13         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -36)) AS bt3yshares,
14         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -36) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -12)) AS bt1to3yshares,
15         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -12) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -6)) AS bt6to12mshares,
16         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -6) AND l.f_qdqdqdq <= add_months('2023-03-22 00:00:00', -3)) AS bt3to6mshares,
17         SUM(f_qwwvddvvzz) FILTER (WHERE l.f_qdqdqdq > add_months('2023-03-22 00:00:00', -3)) AS btlose3mshares,
18         ROUND(AVG(months_between1('2023-03-22 00:00:00', l.f_qdqdqdq)), 2) AS avgmonth,
19         ROUND(AVG(NVL(f_qwwvddvvzz, 0)), 2) AS avgshares,
20         COUNT(a_daccoxxz) AS custsum,
21         SUM(f_qwwvddvvzz) AS sharessum
22 FROM stcdlbxxxxx l
23 WHERE NVL(l.f_qwwvddvvzz, 0) > 0
24   AND l.f_qdqdqdq <= '2023-03-22 00:00:00'
25   AND l.a_daccoxxz <> '996000000000'
26   AND c_qxxxxaode IN (SELECT c_qxxxxaode FROM  xdsfffs WHERE c_raisetype = '1')
27 GROUP BY c_qxxxxaode;

执行计划:

复制代码
Finalize GroupAggregate  (cost=406085.34..424225.64 rows=151 width=424) (actual time=9533.754..16818.165 rows=150 loops=1)
  Group Key: l.c_qxxxxaode
  ->  Gather Merge  (cost=406085.34..424155.80 rows=906 width=392) (actual time=9484.778..16812.958 rows=1050 loops=1)
        Workers Planned: 6
        Workers Launched: 6
        ->  Partial GroupAggregate  (cost=405085.25..423045.59 rows=151 width=392) (actual time=9383.774..16482.619 rows=150 loops=7)
              Group Key: l.c_qxxxxaode
              ->  Sort  (cost=405085.25..405210.81 rows=50225 width=38) (actual time=9358.407..10328.135 rows=4293828 loops=7)
                    Sort Key: l.c_qxxxxaode
                    Sort Method: quicksort  Memory: 461463kB
                    Worker 0:  Sort Method: quicksort  Memory: 452754kB
                    Worker 1:  Sort Method: quicksort  Memory: 452822kB
                    Worker 2:  Sort Method: quicksort  Memory: 451891kB
                    Worker 3:  Sort Method: quicksort  Memory: 448327kB
                    Worker 4:  Sort Method: quicksort  Memory: 454387kB
                    Worker 5:  Sort Method: quicksort  Memory: 453623kB
                    ->  Hash Semi Join  (cost=8.78..401163.65 rows=50225 width=38) (actual time=0.219..4369.485 rows=4293828 loops=7)
                          Hash Cond: (l.c_qxxxxaode = tfundinfo.c_qxxxxaode)
                          ->  Parallel Seq Scan on stcdlbxxxxx l  (cost=0.00..396201.45 rows=1674157 width=38) (actual time=0.017..3948.943 rows=4308170 loops=7)
"                                Filter: ((NVL(f_qwwvddvvzz, '0'::numeric) > '0'::numeric) AND (f_qdqdqdq <= '2023-03-22 00:00:00'::timestamp without time zone) AND (a_daccoxxz <> '996000000000'::text))"
                                Rows Removed by Filter: 691830
                          ->  Hash  (cost=6.91..6.91 rows=150 width=8) (actual time=0.164..0.173 rows=150 loops=7)
                                Buckets: 1024  Batches: 1  Memory Usage: 14kB
                                ->  Index Only Scan using idx_tfundinfo_fundcode on  xdsfffs  (cost=0.28..6.91 rows=150 width=8) (actual time=0.072..0.151 rows=150 loops=7)
                                      Index Cond: (c_raisetype = '1'::text)
                                      Heap Fetches: 0
Planning Time: 0.588 ms
Execution Time: 16851.196 ms

最后可以看到原来 6分钟的SQL,现在已经 16.8秒就能跑出结果,研发真的太坑了。🤣🤣🤣🤣

其实这条SQL完全不用改成 (聚合函数新增 FILTER 属性)这种写法,原来的SQL只要把 months_between 函数替换成 months_between1,一样也可以走并行。

但是为了更好看,还是把这种新的写法提交给了客户。😁😁 😁 😁 😁

相关推荐
lzhlizihang4 分钟前
【Hive sql 面试题】求出各类型专利top 10申请人,以及对应的专利申请数(难)
大数据·hive·sql·面试题
威哥爱编程1 小时前
SQL Server 数据太多如何优化
数据库·sql·sqlserver
Mephisto.java2 小时前
【大数据学习 | kafka高级部分】kafka的kraft集群
大数据·sql·oracle·kafka·json·hbase
Mephisto.java2 小时前
【大数据学习 | kafka高级部分】kafka的文件存储原理
大数据·sql·oracle·kafka·json
数新网络6 小时前
《深入浅出Apache Spark》系列②:Spark SQL原理精髓全解析
大数据·sql·spark
师太,答应老衲吧8 小时前
SQL实战训练之,力扣:2020. 无流量的帐户数(递归)
数据库·sql·leetcode
NiNg_1_23411 小时前
高级 SQL 技巧详解
sql
全栈开发圈18 小时前
新书速览|Spark SQL大数据分析快速上手
sql·数据分析·spark
WeeJot嵌入式18 小时前
数据库----SQL
数据库·sql
矛取矛求18 小时前
高级sql技巧
数据库·sql·mysql