用SQL执行累计值汇总的几种方法

有这么一个需求，要对某个表的某列，按累计分组计数汇总输出。

比如：表t列a的数据如下：

复制代码

┌───┐ 
│ a │ 
├───┤ 
│ 1 │ 
│ 2 │ 
│ 3 │ 
│ 1 │ 
│ 4 │ 
└───┘

现在要把a>=1、a>=2、a>=3、a>=4的个数分别汇总输出，得到如下的结果：

复制代码

┌───┬──────┐ 
│ a │ cnt2 │ 
├───┼──────┤ 
│ 1 │ 5    │ 
│ 2 │ 3    │ 
│ 3 │ 2    │ 
│ 4 │ 1    │ 
└───┴──────┘

下面以duckdb数据库为例，把代码稍作修改，也能在postgresql上实现。

方法1：用case when分组和分析函数sum累计

先建立表t，

复制代码

create table t as 
(select * from values 
(1), 
(2), 
(3), 
(1), 
(4) 
t(a));

在postgresql中需要把values子句包含在一对小括号中，即：

复制代码

create table t as 
(select * from (values 
(1), 
(2), 
(3), 
(1), 
(4)) 
t(a));

然后输入以下查询：

复制代码

select case when a>=1 then 1 
when a>=2 then 2 
when a>=3 then 3 
when a>=4 then 4 
end x, 
count(1) cnt 
from t 
group by x;

得到：

复制代码

┌───┬─────┐ 
│ x │ cnt │ 
├───┼─────┤ 
│ 1 │ 5   │ 
└───┴─────┘

这不是我们需要的结果，因为case when 有短路的性质，满足前面某一个条件(case)的统计以后，就不再判断以后的条件。

这样，因为所有行都满足a>=1，所以，全部行都被统计到了x=1的分组，而其他分组没有再被统计。

容易想到，把条件从>=4到>=1倒序排列，可以求出满足每个分组的计数。

复制代码

select case 
when a>=4 then 4 
when a>=3 then 3 
when a>=2 then 2 
when a>=1 then 1 
end x, 
count(1) cnt 
from t 
group by x; 
  
┌───┬─────┐ 
│ x │ cnt │ 
├───┼─────┤ 
│ 1 │ 2   │ 
│ 2 │ 1   │ 
│ 3 │ 1   │ 
│ 4 │ 1   │ 
└───┴─────┘

这仍然不是我们要求的结果，因为case when 仍然存在短路问题，这种写法实际上隐含地滤掉了同时满足前一个条件的结果，比如这里的when a>=3，实际上是when a>=3 and a<4，不包含a>=4的结果，因为a>=4的行在前一个条件when a>=4已经被统计，就不会再次被统计。需要把这个结果再次累计汇总，才能得到要求的结果。

复制代码

with 
t2 as( 
select case 
when a>=4 then 4 
when a>=3 then 3 
when a>=2 then 2 
when a>=1 then 1 
end x, 
count(1) cnt 
from t 
group by x) 
select x,cnt,sum(cnt)over(order by x desc)cnt2 from t2 order by x; 
  
┌───┬─────┬──────┐ 
│ x │ cnt │ cnt2 │ 
├───┼─────┼──────┤ 
│ 1 │ 2   │ 5    │ 
│ 2 │ 1   │ 3    │ 
│ 3 │ 1   │ 2    │ 
│ 4 │ 1   │ 1    │ 
└───┴─────┴──────┘

为了明显起见，这个查询保留了原查询的cnt和通过sum(cnt)over(order by x desc)新汇总的cnt2两列，这个分析函数写法的含义是，按照x从大到小的顺序，即逆序（ desc）对cnt列累计求和，这样，cnt2列a>=4累计的结果就是原查询a>=4的cnt值，保持现状，a>=3累计的结果就是上一步a>=4累计值和a>=3的cnt值之和，a>=2累计的结果就是上一步a>=3累计值和a>=2的cnt值之和，以此类推。

所以，x和cnt2列就是所需的结果。

方法2：利用多个case when打分组标记，然后通过标记的存在性统计

先看打完标记的情况：

复制代码

select 
case when a>=1 then 1 else 'a' end || 
case when a>=2 then 2 else 'a' end || 
case when a>=3 then 3 else 'a' end || 
case when a>=4 then 4 else 'a' end 
x, 
count(1) cnt 
from t 
group by x; 
  
┌──────┬─────┐ 
│  x   │ cnt │ 
├──────┼─────┤ 
│ 1aaa │ 2   │ 
│ 12aa │ 1   │ 
│ 123a │ 1   │ 
│ 1234 │ 1   │ 
└──────┴─────┘

在这个结果中，x列现在包含一个字符串，其中用字符n标出了满足第n个条件，cnt列的的统计结果与前一种方法第一步的中间结果一致，然后针对这个结果用sum(case when)方法，凡是出现字符1的都被统计到1组，出现字符2的都被统计到2组，以此类推。

为了防止出现短路现象，用了一个包含1到4的临时表做笛卡尔积，把临时表中出现每个字符，都和上述x字符串进行比较，这样包含字符串1的第1_{4行都被统计到1组，包含字符串2的第2}4行都被统计到2组，以此类推，实现了累计的效果。

在postgresql中，对列的类型一致性要求更严格，需要把代码1、2、3、4用单引号括起来，表示字符类型。

查询语句和结果如下：

复制代码

with t2 as 
( 
select 
case when a>=1 then 1 else 'a' end || 
case when a>=2 then 2 else 'a' end || 
case when a>=3 then 3 else 'a' end || 
case when a>=4 then 4 else 'a' end 
x, 
count(1) cnt 
from t 
group by x) 
select a, sum(case when instr(x,a::varchar)>0 then cnt end)cnt2 from t2,values(1),(2),(3),(4) t3(a) 
group by a 
order by a; 
  
┌───┬──────┐ 
│ a │ cnt2 │ 
├───┼──────┤ 
│ 1 │ 5    │ 
│ 2 │ 3    │ 
│ 3 │ 2    │ 
│ 4 │ 1    │ 
└───┴──────┘

在postgresql中，要用strpos代替instr。

方法3：利用码表笛卡尔积对代码分组

在实现上述方法2的过程中，考虑到第一步把x值转变成代码，第二步从代码判断，能否合成一步呢？从而有如下的思路，先建立一张码表code，列出每个代码表示的上下限，然后就省去了打标记的步骤，直接根据x的值统计即可。

复制代码

with code(c,low,high) as( 
values 
(1,1,9999), 
(2,2,9999), 
(3,3,9999), 
(4,4,9999) 
) 
select c, count(case when a>=low and a<high then 1 end)cnt2 from t,code 
group by c 
order by c; 
  
┌───┬──────┐ 
│ c │ cnt2 │ 
├───┼──────┤ 
│ 1 │ 5    │ 
│ 2 │ 3    │ 
│ 3 │ 2    │ 
│ 4 │ 1    │ 
└───┴──────┘

上述语句中，上限9999是一个超出范围的大数，使得a<high永远为真，可根据实际需求调整，比如，下列查询得到不累计的计数：

复制代码

with code(c,low,high) as( 
values 
(1,1,2), 
(2,2,3), 
(3,3,4), 
(4,4,9999) 
) 
select c, count(case when a>=low and a<high then 1 end)cnt2 from t,code 
group by c 
order by c; 
┌───┬──────┐ 
│ c │ cnt2 │ 
├───┼──────┤ 
│ 1 │ 2    │ 
│ 2 │ 1    │ 
│ 3 │ 1    │ 
│ 4 │ 1    │ 
└───┴──────┘

方法4：用union all合并多个条件汇总结果

复制代码

select 1 x, count(1) cnt from t where a>=1 
union all 
select 2 x, count(1) cnt from t where a>=2 
union all 
select 3 x, count(1) cnt from t where a>=3 
union all 
select 4 x, count(1) cnt from t where a>=4 
; 
┌───┬─────┐ 
│ x │ cnt │ 
├───┼─────┤ 
│ 1 │ 5   │ 
│ 2 │ 3   │ 
│ 3 │ 2   │ 
│ 4 │ 1   │ 
└───┴─────┘

完全用手工统计每种条件的结果，然后把结果合并，是最容易的方法。