Hive的CTE 公共表达式

目录

1.语法

[2. 使用场景](#2. 使用场景)

select语句

[chaining CTEs 链式](#chaining CTEs 链式)

union语句

[insert into 语句](#insert into 语句)

[create table as 语句](#create table as 语句)

前言

Common Table Expressions(CTE):公共表达式是一个临时的结果集,该结果集是从with子句中指定的查询派生而来的,紧跟在select 或 insert关键字之前。CTE可以在 select,insert, create table as select 等语句中使用。

1.语法

sql 复制代码
[wtih CommonTableExpression]
select
        column1,
        column2, ...
from table 
[where 条件] 
[group by column]
[order by column] 
[cluster by column| [distribute by column] [sort by column] 
[limit [offset,] rows];

2. 使用场景

select语句

sql 复制代码
with tmp as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) rk
    from t_order
)
 select
    uid,
    --每个用户一月份的订单数
    sum(if(dt = '2018-01', 1, 0)) as  m1_count,
    --每个用户二月份的订单数
    sum(if(dt = '2018-02', 1, 0)) as  m2_count
from tmp
 group by uid
 having m1_count >0 and m2_count=0;

chaining CTEs 链式

sql 复制代码
with tmp1 as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) as rk
    from t_order
),
     tmp2 as
         (select
              uid,
              --每个用户一月份的订单数
              sum(if(dt = '2018-01', 1, 0)) as m1_count,
              --每个用户二月份的订单数
              sum(if(dt = '2018-02', 1, 0)) as m2_count
          from tmp1
          group by uid
          having m1_count > 0
             and m2_count = 0)
select * from tmp2 limit 1;

union语句

sql 复制代码
with q1 as (select * from student where num = 95002),
     q2 as (select * from student where num = 95004)
select * from q1 union all select * from q2;

insert into 语句

sql 复制代码
with tmp1 as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) as rk
    from t_order
),
     tmp2 as
         (select
              uid,
              --每个用户一月份的订单数
              sum(if(dt = '2018-01', 1, 0)) as m1_count,
              --每个用户二月份的订单数
              sum(if(dt = '2018-02', 1, 0)) as m2_count
          from tmp1
          group by uid
          having m1_count > 0
             and m2_count = 0)

insert into tmp3
select * from tmp2 limit 10;

create table as 语句

sql 复制代码
--- 从tmp2 表中取10条数据,基于此创建表tmp3 
create table tmp3 as 
with tmp1 as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) as rk
    from t_order
),
     tmp2 as
         (select
              uid,
              --每个用户一月份的订单数
              sum(if(dt = '2018-01', 1, 0)) as m1_count,
              --每个用户二月份的订单数
              sum(if(dt = '2018-02', 1, 0)) as m2_count
          from tmp1
          group by uid
          having m1_count > 0
             and m2_count = 0)
select * from tmp2 limit 10;
相关推荐
RestCloud9 小时前
神州通用数据库的 ETL 集成方案:兼容性与性能实战
数据库·数据仓库·etl·数据处理·数据集成·数据传输·神州通用
士心凡11 小时前
数据仓库教程
大数据·数据仓库·spark
一颗宁檬不酸14 小时前
《Java Web 期末项目分享:MVC+DBUtils+c3p0 玩转数据库增删改查》——第一弹
数据仓库·hive·hadoop
丸码14 小时前
Servlet生命周期全解析
数据仓库·hive·hadoop
士心凡14 小时前
Hive教程
数据仓库·hive·hadoop
元拓数智1 天前
IntaLink:破解数仓建设痛点,重塑高效建设新范式
大数据·数据仓库·人工智能·数据关系·intalink
清平乐的技术专栏1 天前
hive中with as用法及注意事项
数据仓库·hive·hadoop
larance2 天前
spark 支持hive
hive·spark
RestCloud2 天前
实时 vs 批处理:ETL在混合架构下的实践
数据仓库·etl·cdc·数据处理·批处理·数据传输·数据同步
howard20052 天前
7.1 Hive内置函数
hive·内置函数