Hive的CTE 公共表达式

目录

1.语法

[2. 使用场景](#2. 使用场景)

select语句

[chaining CTEs 链式](#chaining CTEs 链式)

union语句

[insert into 语句](#insert into 语句)

[create table as 语句](#create table as 语句)

前言

Common Table Expressions(CTE):公共表达式是一个临时的结果集,该结果集是从with子句中指定的查询派生而来的,紧跟在select 或 insert关键字之前。CTE可以在 select,insert, create table as select 等语句中使用。

1.语法

sql 复制代码
[wtih CommonTableExpression]
select
        column1,
        column2, ...
from table 
[where 条件] 
[group by column]
[order by column] 
[cluster by column| [distribute by column] [sort by column] 
[limit [offset,] rows];

2. 使用场景

select语句

sql 复制代码
with tmp as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) rk
    from t_order
)
 select
    uid,
    --每个用户一月份的订单数
    sum(if(dt = '2018-01', 1, 0)) as  m1_count,
    --每个用户二月份的订单数
    sum(if(dt = '2018-02', 1, 0)) as  m2_count
from tmp
 group by uid
 having m1_count >0 and m2_count=0;

chaining CTEs 链式

sql 复制代码
with tmp1 as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) as rk
    from t_order
),
     tmp2 as
         (select
              uid,
              --每个用户一月份的订单数
              sum(if(dt = '2018-01', 1, 0)) as m1_count,
              --每个用户二月份的订单数
              sum(if(dt = '2018-02', 1, 0)) as m2_count
          from tmp1
          group by uid
          having m1_count > 0
             and m2_count = 0)
select * from tmp2 limit 1;

union语句

sql 复制代码
with q1 as (select * from student where num = 95002),
     q2 as (select * from student where num = 95004)
select * from q1 union all select * from q2;

insert into 语句

sql 复制代码
with tmp1 as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) as rk
    from t_order
),
     tmp2 as
         (select
              uid,
              --每个用户一月份的订单数
              sum(if(dt = '2018-01', 1, 0)) as m1_count,
              --每个用户二月份的订单数
              sum(if(dt = '2018-02', 1, 0)) as m2_count
          from tmp1
          group by uid
          having m1_count > 0
             and m2_count = 0)

insert into tmp3
select * from tmp2 limit 10;

create table as 语句

sql 复制代码
--- 从tmp2 表中取10条数据,基于此创建表tmp3 
create table tmp3 as 
with tmp1 as (
    select
        oid,
        uid,
        otime,
        date_format(otime, 'yyyy-MM') as dt,
        oamount,
        ---计算rk的目的是为了获取记录中的第一条
        row_number() over (partition by uid,date_format(otime, 'yyyy-MM') order by otime) as rk
    from t_order
),
     tmp2 as
         (select
              uid,
              --每个用户一月份的订单数
              sum(if(dt = '2018-01', 1, 0)) as m1_count,
              --每个用户二月份的订单数
              sum(if(dt = '2018-02', 1, 0)) as m2_count
          from tmp1
          group by uid
          having m1_count > 0
             and m2_count = 0)
select * from tmp2 limit 10;
相关推荐
计艺回忆路1 小时前
Hive自定义函数(UDF)开发和应用流程
hive·自定义函数·udf
天翼云开发者社区21 小时前
数据治理的长效机制
大数据·数据仓库
王小王-1231 天前
基于Hadoop与LightFM的美妆推荐系统设计与实现
大数据·hive·hadoop·大数据美妆推荐系统·美妆商品用户行为·美妆电商
万能小锦鲤2 天前
《大数据技术原理与应用》实验报告七 熟悉 Spark 初级编程实践
hive·hadoop·ubuntu·flink·spark·vmware·实验报告
Leo.yuan2 天前
ETL还是ELT,大数据处理怎么选更靠谱?
大数据·数据库·数据仓库·信息可视化·etl
万能小锦鲤2 天前
《大数据技术原理与应用》实验报告五 熟悉 Hive 的基本操作
hive·hadoop·ubuntu·eclipse·vmware·实验报告·hiveql
張萠飛2 天前
flink sql如何对hive string类型的时间戳进行排序
hive·sql·flink
張萠飛2 天前
flink sql读hive catalog数据,将string类型的时间戳数据排序后写入kafka,如何保障写入kafka的数据是有序的
hive·sql·flink
isNotNullX3 天前
数据怎么分层?从ODS、DW、ADS三大层一一拆解!
大数据·开发语言·数据仓库·分布式·spark
随心............3 天前
hive的相关的优化
数据仓库·hive·hadoop