[数仓]三、离线数仓(Hive数仓系统)

1 章 数仓分层

1.1 为什么要分层

DIM:dimensionality 维度

1. 2 数据集市与数据仓库概念

1. 3 数仓命名 规范

1.3.1 表命名

  • ODS层命名为ods_表名
  • DIM层命名为dim_表名
  • DWD层命名为dwd_表名
  • DWS层命名为dws_表名
  • DWT层命名为dwt_表名
  • ADS层命名为ads_表名
  • 临时表命名为tmp_表名

1.3.2 脚本命名

  • 数据源_to_目标_db/log.sh
  • 用户行为脚本以log为后缀;业务数据脚本以db为后缀。

1.3.3 表字段类型

  • 数量类型为bigint
  • 金额类型为decimal(16, 2),表示:16位有效数字,其中小数部分2位
  • 字符串(名字,描述信息等)类型为string
  • 主键外键类型为string
  • 时间戳类型为bigint

2 章 数仓理论

2 .1 范式理论

2. 1.1 范式概念

1)定义

数据建模必须遵循一定的规则,在关系建模中,这种规则就是范式。

2)目的

采用范式,可以降低数据的冗余性。

为什么要降低数据冗余性?

(1)十几年前,磁盘很贵,为了减少磁盘存储。

(2)以前没有分布式系统,都是单机,只能增加磁盘,磁盘个数也是有限的

(3)一次修改,需要修改多个表,很难保证数据一致性

3)缺点

范式的缺点是获取数据时,需要通过Join拼接出最后的数据。

4)分类

目前业界范式有:第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)、第五范式(5NF)。

2.1.2 函数依赖

2.1.3 三范式区分

2.2 关系建模与维度建模

关系建模和维度建模是两种数据仓库的建模技术。关系建模由Bill Inmon所倡导,维度建模由Ralph Kimball所倡导。

2.2. 1 关系建模

关系建模将复杂的数据抽象为两个概念------实体和关系,并使用规范化的方式表示出来。关系模型如图所示,从图中可以看出,较为松散、零碎,物理表数量多。

关系模型严格遵循第三范式(3NF),数据冗余程度低,数据的一致性容易得到保证。由于数据分布于众多的表中,查询会相对复杂,在大数据的场景下,查询效率相对较低。

2. 2. 2 维度建模

维度模型如图所示,从图中可以看出,模型相对清晰、简洁。

维度模型以数据分析作为出发点,不遵循三范式,故数据存在一定的冗余。维度模型面向业务,将业务用事实表和维度表呈现出来。表结构简单,故查询简单,查询效率较高。

2 .3 维度表和事实表(重点)

2 .3 .1 维度表

维度表 :一般是对事实的描述信息。每一张维表对应现实世界中的一个对象或者概念。 例如:用户、商品、日期、地区等。

维表的特征:

  • 维表的范围很宽(具有多个属性、列比较多)
  • 跟事实表相比,行数相对较小:通常< 10万条
  • 内容相对固定:编码表

时间维度表:

|------------|-------------|-------------|----|-----|
| 日期ID | day of week | day of year | 季度 | 节假日 |
| 2020-01-01 | 2 | 1 | 1 | 元旦 |
| 2020-01-02 | 3 | 2 | 1 | 无 |
| 2020-01-03 | 4 | 3 | 1 | 无 |
| 2020-01-04 | 5 | 4 | 1 | 无 |
| 2020-01-05 | 6 | 5 | 1 | 无 |

2 . 3 .2 事实表

事实表中的每行数据代表一个业务事件(下单、支付、退款、评价等)**** 。"事实"这个术语表示的是业务事件的度量值(可统计次数、个数、金额等),例如,2020年5月21日,宋宋老师在京东花了250块钱买了一瓶海狗人参丸。维度表:时间、用户、商品、商家。事实表:250块钱、一瓶

每一个事实表的行包括:具有可加性的数值型的度量值、与维表相连接的外键,通常具有两个和两个以上的外键。

事实表的特征:

  • 非常的大
  • 内容相对的窄:列数较少(主要是外键id和度量值)
  • 经常发生变化,每天会新增加很多。

1)事务型事实表

每个事务或事件为单位,例如一个销售订单记录,一笔支付记录等,作为事实表里的一行数据。一旦事务被提交,事实表数据被插入,数据就不再进行更改,其更新方式为增量更新。

2)周期型快照事实表

周期型快照事实表中不会保留所有数据只保留固定时间间隔的数据,例如每天或者每月的销售额,或每月的账户余额等。

例如购物车,有加减商品,随时都有可能变化,但是我们更关心每天结束时这里面有多少商品,方便我们后期统计分析。

3)累积型快照事实表

****累计快照事实表用于跟踪业务事实的变化。****例如,数据仓库中可能需要累积或者存储订单从下订单开始,到订单商品被打包、运输、和签收的各个业务阶段的时间点数据来跟踪订单声明周期的进展情况。当这个业务过程进行时,事实表的记录也要不断更新。

|-----------------------|--------------|--------------|--------------|--------------|--------------|--------------|
| 订单 id | 用户id | 下单时间 | 打包时间 | 发货时间 | 签收时间 | 订单金额 |
| | | 3-8 | 3-8 | 3-9 | 3-10 | |

2 .4 维度模型分类

在维度建模的基础上又分为三种模型:星型模型、雪花模型、星座模型。

2.5 数据仓库建模(绝对重点)

2.5.1 ODS

1)HDFS用户行为数据

2)HDFS业务数据

3)针对HDFS上的用户行为数据和业务数据,我们如何规划处理?

(1)保持数据原貌不做任何修改,起到备份数据的作用。

(2)数据采用压缩,减少磁盘存储空间(例如:原始数据100G,可以压缩到10G左右)

(3)创建分区表,防止后续的全表扫描

2.5. 2 DIM层和 DWD

DIM层DWD层需构建维度模型,一般采用星型模型,呈现的状态一般为星座模型。

维度建模一般按照以下四个步骤:

选择业务过程→声明粒度→确认维度→确认事实

(1)选择业务过程

在业务系统中,挑选我们感兴趣的业务线,比如下单业务,支付业务,退款业务,物流业务,一条业务线对应一张事实表。

(2)声明粒度

数据粒度指数据仓库的数据中保存数据的细化程度或综合程度的级别。

声明粒度意味着精确定义事实表中的一行数据表示什么,应该尽可能选择最小粒度,以此来应各种各样的需求。

典型的粒度声明如下:

订单事实表中一行数据表示的是一个订单中的一个商品项。

支付事实表中一行数据表示的是一个支付记录。

(3)确定维度

维度的主要作用是描述业务是事实,主要表示的是"谁,何处,何时"等信息。

确定维度的原则是:后续需求中是否要分析相关维度的指标。例如,需要统计,什么时间下的订单多,哪个地区下的订单多,哪个用户下的订单多。需要确定的维度就包括:时间维度、地区维度、用户维度。

(4)确定事实

此处的"事实"一词,指的是业务中的度量值(次数、个数、件数、金额,可以进行累加),例如订单金额、下单次数等。

在DWD层,以业务过程 为建模驱动,基于每个具体业务过程的特点,构建最细粒度的明细层事实表。事实表可做适当的宽表化处理。

事实表和维度表的关联比较灵活,但是为了应对更复杂的业务需求,可以将能关联上的表尽量关联上。

|---------------|------------|------------|------------|------------|-------------|------------|-------------------|
| | 时间 | 用户 | 地区 | 商品 | 优惠券 | 活动 | 度量值 |
| 订单 | √ | √ | √ | | | | 运费/优惠金额/原始金额/最终金额 |
| 订单详情 | √ | √ | √ | √ | √ | √ | 件数/优惠金额/原始金额/最终金额 |
| 支付 | √ | √ | √ | | | | 支付金额 |
| 加购 | √ | √ | | √ | | | 件数/金额 |
| 收藏 | √ | √ | | √ | | | 次数 |
| 评价 | √ | √ | | √ | | | 次数 |
| 退单 | √ | √ | √ | √ | | | 件数/金额 |
| 退款 | √ | √ | √ | √ | | | 件数/金额 |
| 优惠券领用 | √ | √ | | | √ | | 次数 |

至此,数据仓库的维度建模已经完毕,DWD层是以业务过程为驱动。DWS层、DWT层和ADS层都是以需求为驱动,和维度建模已经没有关系了。DWS和DWT都是建宽表,按照主题去建表。主题相当于观察问题的角度。对应着维度表。

2.5. 3 DWS 层与DWT层

DWS层和DWT层统称宽表层,这两层的设计思想大致相同,通过以下案例进行阐述。

1)问题引出:两个需求,统计每个省份订单的个数、统计每个省份订单的总金额

2)处理办法:都是将省份表和订单表进行join,group by省份,然后计算。同样数据被计算了两次,实际上类似的场景还会更多。

那怎么设计能避免重复计算呢?

针对上述场景,可以设计一张地区宽表,其主键为地区ID,字段包含为:下单次数、下单金额、支付次数、支付金额等。上述所有指标都统一进行计算,并将结果保存在该宽表中,这样就能有效避免数据的重复计算。

3)总结:

(1)需要建哪些宽表:以维度为基准。

(2)宽表里面的字段:是站在不同维度的角度去看事实表,重点关注事实表聚合后的度量值。

(3)DWS和DWT层的区别:DWS层存放的所有主题对象当天的汇总行为,例如每个地区当天的下单次数,下单金额等,DWT层存放的是所有主题对象的累积行为,例如每个地区最近7天(15天、30天、60天)的下单次数、下单金额等。

2.5.4 ADS

对电商系统各大主题指标分别进行分析。

第3章 数仓环境搭建

3 .1 Hive环境搭建

3.1. 1 Hive 引擎简介

Hive引擎包括:默认MR、tez、spark

Hive on Spark:Hive既作为存储元数据又负责SQL的解析优化,语法是HQL语法,执行引擎变成了Spark,Spark负责采用RDD执行。

Spark on Hive: Hive只作为存储元数据,Spark负责SQL解析优化,语法是Spark SQL语法,Spark负责采用RDD执行。

3.1.2 Hive on Spark配置

1)兼容性说明

注意:官网下载的Hive3.1.2和Spark3.0.0默认是不兼容的。因为Hive3.1.2支持的Spark版本是2.4.5,所以需要我们重新编译Hive3.1.2版本。

编译步骤:官网下载Hive3.1.2源码,修改pom文件中引用的Spark版本为3.0.0,如果编译通过,直接打包获取jar包。如果报错,就根据提示,修改相关方法,直到不报错,打包获取jar包。

2 )在Hive所在节点部署Spark

如果之前已经部署了Spark,则该步骤可以跳过,但要检查SPARK_HOME的环境变量配置是否正确。

(1)Spark官网下载jar包地址:

Downloads | Apache Spark

(2)上传并解压解压spark-3.0.0-bin-hadoop3.2.tgz

seven@hadoop102 software\]$ tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/ \[seven@hadoop102 software\]$ mv /opt/module/spark-3.0.0-bin-hadoop3.2 /opt/module/spark (3)配置SPARK_HOME环境变量 \[seven@hadoop102 software\]$ sudo vim /etc/profile.d/my_env.sh 添加如下内容 ```bash # SPARK_HOME export SPARK_HOME=/opt/module/spark export PATH=$PATH:$SPARK_HOME/bin ``` source 使其生效 \[seven@hadoop102 software\]$ source /etc/profile.d/my_env.sh ****3)在hive中创建spark配置文件**** \[seven@hadoop102 software\]$ vim /opt/module/hive/conf/**spark-defaults.conf** 添加如下内容(在执行任务时,会根据如下参数执行) ```bash spark.master yarn spark.eventLog.enabled true spark.eventLog.dir hdfs://hadoop102:8020/spark-history spark.executor.memory 1g spark.driver.memory 1g ``` 在HDFS创建如下路径,用于存储历史日志 \[seven@hadoop102 software\]$ hadoop fs -mkdir /spark-history ****4)向HDFS上传Spark纯净版jar包**** 说明1:由于Spark3.0.0非纯净版默认支持的是hive2.3.7版本,直接使用会和安装的Hive3.1.2出现兼容性问题。所以采用Spark纯净版jar包,不包含hadoop和hive相关依赖,避免冲突。 说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。 (1)上传并解压spark-3.0.0-bin-without-hadoop.tgz \[seven@hadoop102 software\]$ tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz (2)上传Spark纯净版jar包到HDFS \[seven@hadoop102 software\]$ hadoop fs -mkdir /spark-jars \[seven@hadoop102 software\]$ hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/\* /spark-jars ****5)修改hive-site**** ****.xml**** ****文件**** \[seven@hadoop102 \~\]$ vim /opt/module/hive/conf/hive-site.xml 添加如下内容 ```XML spark.yarn.jars hdfs://hadoop102:8020/spark-jars/* hive.execution.engine spark ``` **3.1.3 Hive on Spark** **测试** (1)启动hive客户端 \[seven@hadoop102 hive\]$ bin/hive (2)创建一张测试表 hive (default)\> create table student(id int, name string); (3)通过insert测试效果 hive (default)\> insert into table student values(1,'abc'); 若结果如下,则说明配置成功 ![](https://i-blog.csdnimg.cn/direct/0efd2a1751374fb69707cf698aa00c34.png) ### **3** **.2** **Yarn配置** #### **3.** **2.1** **增加Application** **Master** **资源比例** 容量调度器对每个资源队列中同时运行的Application Master占用的资源进行了限制,该限制通过yarn.scheduler.capacity.maximum-am-resource-percent参数实现,其默认值是0.1,表示每个资源队列上Application Master最多可使用的资源为该队列总资源的10%,目的是防止大部分资源都被Application Master占用,而导致Map/Reduce Task无法执行。 生产环境该参数可使用默认值。但学习环境,集群资源总数很少,如果只分配10%的资源给Application Master,则可能出现,同一时刻只能运行一个Job的情况,因为一个Application Master使用的资源就可能已经达到10%的上限了。故此处可将该值适当调大。 (1)在hadoop102的/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml文件中****修改****如下参数值 \[seven@hadoop102 hadoop\]$ vim capacity-scheduler.xml ```XML yarn.scheduler.capacity.maximum-am-resource-percent 0.8 drop table if exists ods_log; CREATE EXTERNAL TABLE ods_log (`line` string) PARTITIONED BY (`dt` string) -- 按照时间创建分区 STORED AS -- 指定存储方式,读数据采用LzoTextInputFormat; INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_log' -- 指定数据在hdfs上的存储位置 ; ``` 说明Hive的LZO压缩:[LanguageManual LZO - Apache Hive - Apache Software Foundation](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO "LanguageManual LZO - Apache Hive - Apache Software Foundation") (2)分区规划 ![](https://i-blog.csdnimg.cn/direct/4700bf0042cf4e01bd4a8da1985e4ab8.png) 2)加载数据 ![](https://i-blog.csdnimg.cn/direct/78ab16ac1c744d319ebbcb86de65b9c1.png) ```bash hive (gmall)> load data inpath '/origin_data/gmall/log/topic_log/2020-06-14' into table ods_log partition(dt='2020-06-14'); ``` 注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式 3)为lzo压缩文件创建索引 ```bash [seven@hadoop102 bin]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14 ``` #### **4.1.** **2 Shell** **中单引号** **和双引号** **区别** 1)在/home/seven/bin创建一个test.sh文件 \[seven@hadoop102 bin\]$ vim test.sh 在文件中添加如下内容 ```bash #!/bin/bash do_date=$1 echo '$do_date' echo "$do_date" echo "'$do_date'" echo '"$do_date"' echo `date` ``` 2)查看执行结果 ```bash [seven@hadoop102 bin]$ test.sh 2020-06-14 $do_date 2020-06-14 '2020-06-14' "$do_date" 2020年 06月 18日 星期四 21:02:08 CST ``` 3)总结: (1)单引号不取变量值 (2)双引号取变量值 (3)反引号\`,执行引号中命令 (4)双引号内部嵌套单引号,取出变量值 (5)单引号内部嵌套双引号,不取出变量值 #### **4.1.** **3 ODS层** **日志表加载** **数据脚本** ****1)编写脚本**** (1)在hadoop102的/home/seven/bin目录下创建脚本 \[seven@hadoop102 bin\]$ vim hdfs_to_ods_log.sh 在脚本中编写如下内容 ```bash #!/bin/bash # 定义变量方便修改 APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$1" ] ;then do_date=$1 else do_date=`date -d "-1 day" +%F` fi echo ================== 日志日期为 $do_date ================== sql=" load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date'); " hive -e "$sql" hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date ``` (1)说明1: \[ -n 变量值 \] 判断变量的值,是否为空 -- 变量的值,非空,返回true -- 变量的值,为空,返回false 注意:\[ -n 变量值 \]不会解析数据,使用\[ -n 变量值 \]时,需要对变量加上双引号(" ") (2)说明2: 查看date命令的使用,date --help (2)增加脚本执行权限 \[seven@hadoop102 bin\]$ chmod 777 hdfs_to_ods_log.sh 2)脚本使用 (1)执行脚本 \[seven@hadoop102 module\]$ hdfs_to_ods_log.sh 2020-06-14 (2)查看导入数据 ### **4.2 ODS** **层(业务数据)** ODS层业务表分区规划如下: ![](https://i-blog.csdnimg.cn/direct/0e198a50bb024d58ab4c7e947d739110.png) ODS层业务表数据装载思路如下: ![](https://i-blog.csdnimg.cn/direct/c0c07f9898244c60874e72571d24c299.png) #### **4.2.1** **活动信息表** ```sql DROP TABLE IF EXISTS ods_activity_info; CREATE EXTERNAL TABLE ods_activity_info( `id` STRING COMMENT '编号', `activity_name` STRING COMMENT '活动名称', `activity_type` STRING COMMENT '活动类型', `start_time` STRING COMMENT '开始时间', `end_time` STRING COMMENT '结束时间', `create_time` STRING COMMENT '创建时间' ) COMMENT '活动信息表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_activity_info/'; ``` #### **4.2.2** **活动规则表** ```sql DROP TABLE IF EXISTS ods_activity_rule; CREATE EXTERNAL TABLE ods_activity_rule( `id` STRING COMMENT '编号', `activity_id` STRING COMMENT '活动ID', `activity_type` STRING COMMENT '活动类型', `condition_amount` DECIMAL(16,2) COMMENT '满减金额', `condition_num` BIGINT COMMENT '满减件数', `benefit_amount` DECIMAL(16,2) COMMENT '优惠金额', `benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣', `benefit_level` STRING COMMENT '优惠级别' ) COMMENT '活动规则表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_activity_rule/'; ``` #### **4** **.2.3** **一级品类表** ```sql DROP TABLE IF EXISTS ods_base_category1; CREATE EXTERNAL TABLE ods_base_category1( `id` STRING COMMENT 'id', `name` STRING COMMENT '名称' ) COMMENT '商品一级分类表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_category1/'; ``` #### **4.** **2** **.4** **二级品类表** ```sql DROP TABLE IF EXISTS ods_base_category2; CREATE EXTERNAL TABLE ods_base_category2( `id` STRING COMMENT ' id', `name` STRING COMMENT '名称', `category1_id` STRING COMMENT '一级品类id' ) COMMENT '商品二级分类表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_category2/'; ``` #### **4.** **2** **.5** **三级品类表** ```sql DROP TABLE IF EXISTS ods_base_category3; CREATE EXTERNAL TABLE ods_base_category3( `id` STRING COMMENT ' id', `name` STRING COMMENT '名称', `category2_id` STRING COMMENT '二级品类id' ) COMMENT '商品三级分类表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_category3/'; ``` #### **4.** **2** **.6** **编码字典表** ```sql DROP TABLE IF EXISTS ods_base_dic; CREATE EXTERNAL TABLE ods_base_dic( `dic_code` STRING COMMENT '编号', `dic_name` STRING COMMENT '编码名称', `parent_code` STRING COMMENT '父编码', `create_time` STRING COMMENT '创建日期', `operate_time` STRING COMMENT '操作日期' ) COMMENT '编码字典表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_dic/'; ``` #### **4.** **2** **.7** **省份表** ```sql DROP TABLE IF EXISTS ods_base_province; CREATE EXTERNAL TABLE ods_base_province ( `id` STRING COMMENT '编号', `name` STRING COMMENT '省份名称', `region_id` STRING COMMENT '地区ID', `area_code` STRING COMMENT '地区编码', `iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用', `iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用' ) COMMENT '省份表' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_province/'; ``` #### **4.** **2** **.8** **地区表** ```sql DROP TABLE IF EXISTS ods_base_region; CREATE EXTERNAL TABLE ods_base_region ( `id` STRING COMMENT '编号', `region_name` STRING COMMENT '地区名称' ) COMMENT '地区表' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_region/'; ``` #### **4.** **2** **.9** **品牌表** ```sql DROP TABLE IF EXISTS ods_base_trademark; CREATE EXTERNAL TABLE ods_base_trademark ( `id` STRING COMMENT '编号', `tm_name` STRING COMMENT '品牌名称' ) COMMENT '品牌表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_base_trademark/'; ``` #### **4.** **2** **.10** **购物车表** ```sql DROP TABLE IF EXISTS ods_cart_info; CREATE EXTERNAL TABLE ods_cart_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'skuid', `cart_price` DECIMAL(16,2) COMMENT '放入购物车时价格', `sku_num` BIGINT COMMENT '数量', `sku_name` STRING COMMENT 'sku名称 (冗余)', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '修改时间', `is_ordered` STRING COMMENT '是否已经下单', `order_time` STRING COMMENT '下单时间', `source_type` STRING COMMENT '来源类型', `source_id` STRING COMMENT '来源编号' ) COMMENT '加购表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_cart_info/'; ``` #### **4.** **2** **.11** **评论表** ```sql DROP TABLE IF EXISTS ods_comment_info; CREATE EXTERNAL TABLE ods_comment_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `sku_id` STRING COMMENT '商品sku', `spu_id` STRING COMMENT '商品spu', `order_id` STRING COMMENT '订单ID', `appraise` STRING COMMENT '评价', `create_time` STRING COMMENT '评价时间' ) COMMENT '商品评论表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_comment_info/'; ``` #### **4.** **2** **.12** **优惠券信息表** ```sql DROP TABLE IF EXISTS ods_coupon_info; CREATE EXTERNAL TABLE ods_coupon_info( `id` STRING COMMENT '购物券编号', `coupon_name` STRING COMMENT '购物券名称', `coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券', `condition_amount` DECIMAL(16,2) COMMENT '满额数', `condition_num` BIGINT COMMENT '满件数', `activity_id` STRING COMMENT '活动编号', `benefit_amount` DECIMAL(16,2) COMMENT '减金额', `benefit_discount` DECIMAL(16,2) COMMENT '折扣', `create_time` STRING COMMENT '创建时间', `range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌', `limit_num` BIGINT COMMENT '最多领用次数', `taken_count` BIGINT COMMENT '已领用次数', `start_time` STRING COMMENT '开始领取时间', `end_time` STRING COMMENT '结束领取时间', `operate_time` STRING COMMENT '修改时间', `expire_time` STRING COMMENT '过期时间' ) COMMENT '优惠券表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_coupon_info/'; ``` #### **4.** **2** **.13** **优惠券领用表** ```sql DROP TABLE IF EXISTS ods_coupon_use; CREATE EXTERNAL TABLE ods_coupon_use( `id` STRING COMMENT '编号', `coupon_id` STRING COMMENT '优惠券ID', `user_id` STRING COMMENT 'skuid', `order_id` STRING COMMENT 'spuid', `coupon_status` STRING COMMENT '优惠券状态', `get_time` STRING COMMENT '领取时间', `using_time` STRING COMMENT '使用时间(下单)', `used_time` STRING COMMENT '使用时间(支付)', `expire_time` STRING COMMENT '过期时间' ) COMMENT '优惠券领用表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_coupon_use/'; ``` #### **4.2.14** **收藏表** ```sql DROP TABLE IF EXISTS ods_favor_info; CREATE EXTERNAL TABLE ods_favor_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'skuid', `spu_id` STRING COMMENT 'spuid', `is_cancel` STRING COMMENT '是否取消', `create_time` STRING COMMENT '收藏时间', `cancel_time` STRING COMMENT '取消时间' ) COMMENT '商品收藏表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_favor_info/'; ``` #### **4.** **2** **.15** **订单明细表** ```sql DROP TABLE IF EXISTS ods_order_detail; CREATE EXTERNAL TABLE ods_order_detail( `id` STRING COMMENT '编号', `order_id` STRING COMMENT '订单号', `sku_id` STRING COMMENT '商品id', `sku_name` STRING COMMENT '商品名称', `order_price` DECIMAL(16,2) COMMENT '商品价格', `sku_num` BIGINT COMMENT '商品数量', `create_time` STRING COMMENT '创建时间', `source_type` STRING COMMENT '来源类型', `source_id` STRING COMMENT '来源编号', `split_final_amount` DECIMAL(16,2) COMMENT '分摊最终金额', `split_activity_amount` DECIMAL(16,2) COMMENT '分摊活动优惠', `split_coupon_amount` DECIMAL(16,2) COMMENT '分摊优惠券优惠' ) COMMENT '订单详情表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_order_detail/'; ``` #### **4.** **2** **.16** **订单明细活动关联表** ```sql DROP TABLE IF EXISTS ods_order_detail_activity; CREATE EXTERNAL TABLE ods_order_detail_activity( `id` STRING COMMENT '编号', `order_id` STRING COMMENT '订单号', `order_detail_id` STRING COMMENT '订单明细id', `activity_id` STRING COMMENT '活动id', `activity_rule_id` STRING COMMENT '活动规则id', `sku_id` BIGINT COMMENT '商品id', `create_time` STRING COMMENT '创建时间' ) COMMENT '订单详情活动关联表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/'; ``` #### **4.** **2** **.17** **订单明细优惠券关联表** ```sql DROP TABLE IF EXISTS ods_order_detail_coupon; CREATE EXTERNAL TABLE ods_order_detail_coupon( `id` STRING COMMENT '编号', `order_id` STRING COMMENT '订单号', `order_detail_id` STRING COMMENT '订单明细id', `coupon_id` STRING COMMENT '优惠券id', `coupon_use_id` STRING COMMENT '优惠券领用记录id', `sku_id` STRING COMMENT '商品id', `create_time` STRING COMMENT '创建时间' ) COMMENT '订单详情活动关联表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/'; ``` #### **4.** **2** **.18** **订单表** ```sql DROP TABLE IF EXISTS ods_order_info; CREATE EXTERNAL TABLE ods_order_info ( `id` STRING COMMENT '订单号', `final_amount` DECIMAL(16,2) COMMENT '订单最终金额', `order_status` STRING COMMENT '订单状态', `user_id` STRING COMMENT '用户id', `payment_way` STRING COMMENT '支付方式', `delivery_address` STRING COMMENT '送货地址', `out_trade_no` STRING COMMENT '支付流水号', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '操作时间', `expire_time` STRING COMMENT '过期时间', `tracking_no` STRING COMMENT '物流单编号', `province_id` STRING COMMENT '省份ID', `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额', `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额', `original_amount` DECIMAL(16,2) COMMENT '订单原价金额', `feight_fee` DECIMAL(16,2) COMMENT '运费', `feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免' ) COMMENT '订单表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_order_info/'; ``` #### **4.** **2** **.19** **退单表** ```sql DROP TABLE IF EXISTS ods_order_refund_info; CREATE EXTERNAL TABLE ods_order_refund_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单ID', `sku_id` STRING COMMENT '商品ID', `refund_type` STRING COMMENT '退单类型', `refund_num` BIGINT COMMENT '退单件数', `refund_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_reason_type` STRING COMMENT '退单原因类型', `refund_status` STRING COMMENT '退单状态',--退单状态应包含买家申请、卖家审核、卖家收货、退款完成等状态。此处未涉及到,故该表按增量处理 `create_time` STRING COMMENT '退单时间' ) COMMENT '退单表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_order_refund_info/'; ``` #### **4.** **2** **.20** **订单状态日志表** ```sql DROP TABLE IF EXISTS ods_order_status_log; CREATE EXTERNAL TABLE ods_order_status_log ( `id` STRING COMMENT '编号', `order_id` STRING COMMENT '订单ID', `order_status` STRING COMMENT '订单状态', `operate_time` STRING COMMENT '修改时间' ) COMMENT '订单状态表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_order_status_log/'; ``` #### **4.** **2** **.21** **支付表** ```sql DROP TABLE IF EXISTS ods_payment_info; CREATE EXTERNAL TABLE ods_payment_info( `id` STRING COMMENT '编号', `out_trade_no` STRING COMMENT '对外业务编号', `order_id` STRING COMMENT '订单编号', `user_id` STRING COMMENT '用户编号', `payment_type` STRING COMMENT '支付类型', `trade_no` STRING COMMENT '交易编号', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `subject` STRING COMMENT '交易内容', `payment_status` STRING COMMENT '支付状态', `create_time` STRING COMMENT '创建时间', `callback_time` STRING COMMENT '回调时间' ) COMMENT '支付流水表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_payment_info/'; ``` #### **4.** **2** **.22** **退款表** ```sql DROP TABLE IF EXISTS ods_refund_payment; CREATE EXTERNAL TABLE ods_refund_payment( `id` STRING COMMENT '编号', `out_trade_no` STRING COMMENT '对外业务编号', `order_id` STRING COMMENT '订单编号', `sku_id` STRING COMMENT 'SKU编号', `payment_type` STRING COMMENT '支付类型', `trade_no` STRING COMMENT '交易编号', `refund_amount` DECIMAL(16,2) COMMENT '支付金额', `subject` STRING COMMENT '交易内容', `refund_status` STRING COMMENT '支付状态', `create_time` STRING COMMENT '创建时间', `callback_time` STRING COMMENT '回调时间' ) COMMENT '支付流水表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_refund_payment/'; ``` #### **4.** **2** **.23** **商品平台属性表** ```sql DROP TABLE IF EXISTS ods_sku_attr_value; CREATE EXTERNAL TABLE ods_sku_attr_value( `id` STRING COMMENT '编号', `attr_id` STRING COMMENT '平台属性ID', `value_id` STRING COMMENT '平台属性值ID', `sku_id` STRING COMMENT '商品ID', `attr_name` STRING COMMENT '平台属性名称', `value_name` STRING COMMENT '平台属性值名称' ) COMMENT 'sku平台属性表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/'; ``` #### **4.** **2** **.24** **商品(SKU)表** ```sql DROP TABLE IF EXISTS ods_sku_info; CREATE EXTERNAL TABLE ods_sku_info( `id` STRING COMMENT 'skuId', `spu_id` STRING COMMENT 'spuid', `price` DECIMAL(16,2) COMMENT '价格', `sku_name` STRING COMMENT '商品名称', `sku_desc` STRING COMMENT '商品描述', `weight` DECIMAL(16,2) COMMENT '重量', `tm_id` STRING COMMENT '品牌id', `category3_id` STRING COMMENT '品类id', `is_sale` STRING COMMENT '是否在售', `create_time` STRING COMMENT '创建时间' ) COMMENT 'SKU商品表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_sku_info/'; ``` #### **4.** **2** **.25** **商品销售属性表** ```sql DROP TABLE IF EXISTS ods_sku_sale_attr_value; CREATE EXTERNAL TABLE ods_sku_sale_attr_value( `id` STRING COMMENT '编号', `sku_id` STRING COMMENT 'sku_id', `spu_id` STRING COMMENT 'spu_id', `sale_attr_value_id` STRING COMMENT '销售属性值id', `sale_attr_id` STRING COMMENT '销售属性id', `sale_attr_name` STRING COMMENT '销售属性名称', `sale_attr_value_name` STRING COMMENT '销售属性值名称' ) COMMENT 'sku销售属性名称' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/'; ``` #### **4.** **2** **.26** **商品(SPU)表** ```sql DROP TABLE IF EXISTS ods_spu_info; CREATE EXTERNAL TABLE ods_spu_info( `id` STRING COMMENT 'spuid', `spu_name` STRING COMMENT 'spu名称', `category3_id` STRING COMMENT '品类id', `tm_id` STRING COMMENT '品牌id' ) COMMENT 'SPU商品表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_spu_info/'; ``` #### **4.** **2** **.27** **用户表** ```sql DROP TABLE IF EXISTS ods_user_info; CREATE EXTERNAL TABLE ods_user_info( `id` STRING COMMENT '用户id', `login_name` STRING COMMENT '用户名称', `nick_name` STRING COMMENT '用户昵称', `name` STRING COMMENT '用户姓名', `phone_num` STRING COMMENT '手机号码', `email` STRING COMMENT '邮箱', `user_level` STRING COMMENT '用户等级', `birthday` STRING COMMENT '生日', `gender` STRING COMMENT '性别', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '操作时间' ) COMMENT '用户表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_user_info/'; ``` #### **4.** **2** **.28** **ODS层** **业务表首日数据装载脚本** ****1)编写脚本**** (1)在/home/seven/bin目录下创建脚本hdfs_to_ods_db_init.sh \[seven@hadoop102 bin\]$ vim hdfs_to_ods_db_init.sh 在脚本中填写如下内容 ```bash #!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi ods_order_info=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');" ods_order_detail=" load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');" ods_sku_info=" load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');" ods_user_info=" load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');" ods_payment_info=" load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');" ods_base_category1=" load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');" ods_base_category2=" load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');" ods_base_category3=" load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); " ods_base_trademark=" load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); " ods_activity_info=" load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); " ods_cart_info=" load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); " ods_comment_info=" load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); " ods_coupon_info=" load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); " ods_coupon_use=" load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); " ods_favor_info=" load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); " ods_order_refund_info=" load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); " ods_order_status_log=" load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); " ods_spu_info=" load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); " ods_activity_rule=" load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic=" load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); " ods_order_detail_activity=" load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); " ods_order_detail_coupon=" load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); " ods_refund_payment=" load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); " ods_sku_attr_value=" load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); " ods_sku_sale_attr_value=" load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); " ods_base_province=" load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;" ods_base_region=" load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;" case $1 in "ods_order_info"){ hive -e "$ods_order_info" };; "ods_order_detail"){ hive -e "$ods_order_detail" };; "ods_sku_info"){ hive -e "$ods_sku_info" };; "ods_user_info"){ hive -e "$ods_user_info" };; "ods_payment_info"){ hive -e "$ods_payment_info" };; "ods_base_category1"){ hive -e "$ods_base_category1" };; "ods_base_category2"){ hive -e "$ods_base_category2" };; "ods_base_category3"){ hive -e "$ods_base_category3" };; "ods_base_trademark"){ hive -e "$ods_base_trademark" };; "ods_activity_info"){ hive -e "$ods_activity_info" };; "ods_cart_info"){ hive -e "$ods_cart_info" };; "ods_comment_info"){ hive -e "$ods_comment_info" };; "ods_coupon_info"){ hive -e "$ods_coupon_info" };; "ods_coupon_use"){ hive -e "$ods_coupon_use" };; "ods_favor_info"){ hive -e "$ods_favor_info" };; "ods_order_refund_info"){ hive -e "$ods_order_refund_info" };; "ods_order_status_log"){ hive -e "$ods_order_status_log" };; "ods_spu_info"){ hive -e "$ods_spu_info" };; "ods_activity_rule"){ hive -e "$ods_activity_rule" };; "ods_base_dic"){ hive -e "$ods_base_dic" };; "ods_order_detail_activity"){ hive -e "$ods_order_detail_activity" };; "ods_order_detail_coupon"){ hive -e "$ods_order_detail_coupon" };; "ods_refund_payment"){ hive -e "$ods_refund_payment" };; "ods_sku_attr_value"){ hive -e "$ods_sku_attr_value" };; "ods_sku_sale_attr_value"){ hive -e "$ods_sku_sale_attr_value" };; "ods_base_province"){ hive -e "$ods_base_province" };; "ods_base_region"){ hive -e "$ods_base_region" };; "all"){ hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region" };; esac ``` (2)增加执行权限 \[seven@hadoop102 bin\]$ chmod +x hdfs_to_ods_db_init.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ hdfs_to_ods_db_init.sh all 2020-06-14 (2)查看数据是否导入成功 #### **4.** **2** **.2** **9 ODS层** **业务表每日数据装载脚本** ****1)编写脚本**** (1)在/home/seven/bin目录下创建脚本hdfs_to_ods_db.sh \[seven@hadoop102 bin\]$ vim hdfs_to_ods_db.sh 在脚本中填写如下内容 ```bash #!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi ods_order_info=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');" ods_order_detail=" load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');" ods_sku_info=" load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');" ods_user_info=" load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');" ods_payment_info=" load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');" ods_base_category1=" load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');" ods_base_category2=" load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');" ods_base_category3=" load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); " ods_base_trademark=" load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); " ods_activity_info=" load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); " ods_cart_info=" load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); " ods_comment_info=" load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); " ods_coupon_info=" load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); " ods_coupon_use=" load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); " ods_favor_info=" load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); " ods_order_refund_info=" load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); " ods_order_status_log=" load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); " ods_spu_info=" load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); " ods_activity_rule=" load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic=" load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); " ods_order_detail_activity=" load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); " ods_order_detail_coupon=" load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); " ods_refund_payment=" load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); " ods_sku_attr_value=" load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); " ods_sku_sale_attr_value=" load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); " ods_base_province=" load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;" ods_base_region=" load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;" case $1 in "ods_order_info"){ hive -e "$ods_order_info" };; "ods_order_detail"){ hive -e "$ods_order_detail" };; "ods_sku_info"){ hive -e "$ods_sku_info" };; "ods_user_info"){ hive -e "$ods_user_info" };; "ods_payment_info"){ hive -e "$ods_payment_info" };; "ods_base_category1"){ hive -e "$ods_base_category1" };; "ods_base_category2"){ hive -e "$ods_base_category2" };; "ods_base_category3"){ hive -e "$ods_base_category3" };; "ods_base_trademark"){ hive -e "$ods_base_trademark" };; "ods_activity_info"){ hive -e "$ods_activity_info" };; "ods_cart_info"){ hive -e "$ods_cart_info" };; "ods_comment_info"){ hive -e "$ods_comment_info" };; "ods_coupon_info"){ hive -e "$ods_coupon_info" };; "ods_coupon_use"){ hive -e "$ods_coupon_use" };; "ods_favor_info"){ hive -e "$ods_favor_info" };; "ods_order_refund_info"){ hive -e "$ods_order_refund_info" };; "ods_order_status_log"){ hive -e "$ods_order_status_log" };; "ods_spu_info"){ hive -e "$ods_spu_info" };; "ods_activity_rule"){ hive -e "$ods_activity_rule" };; "ods_base_dic"){ hive -e "$ods_base_dic" };; "ods_order_detail_activity"){ hive -e "$ods_order_detail_activity" };; "ods_order_detail_coupon"){ hive -e "$ods_order_detail_coupon" };; "ods_refund_payment"){ hive -e "$ods_refund_payment" };; "ods_sku_attr_value"){ hive -e "$ods_sku_attr_value" };; "ods_sku_sale_attr_value"){ hive -e "$ods_sku_sale_attr_value" };; "all"){ hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value" };; esac ``` (2)修改权限 \[seven@hadoop102 bin\]$ chmod +x hdfs_to_ods_db.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ hdfs_to_ods_db.sh all 2020-06-14 (2)查看数据是否导入成功 ## **第5章 数仓搭建-DIM层** ### **5.1 商品维度表(全量)** ****1.建表语句**** ```sql DROP TABLE IF EXISTS dim_sku_info; CREATE EXTERNAL TABLE dim_sku_info ( `id` STRING COMMENT '商品id', `price` DECIMAL(16,2) COMMENT '商品价格', `sku_name` STRING COMMENT '商品名称', `sku_desc` STRING COMMENT '商品描述', `weight` DECIMAL(16,2) COMMENT '重量', `is_sale` BOOLEAN COMMENT '是否在售', `spu_id` STRING COMMENT 'spu编号', `spu_name` STRING COMMENT 'spu名称', `category3_id` STRING COMMENT '三级分类id', `category3_name` STRING COMMENT '三级分类名称', `category2_id` STRING COMMENT '二级分类id', `category2_name` STRING COMMENT '二级分类名称', `category1_id` STRING COMMENT '一级分类id', `category1_name` STRING COMMENT '一级分类名称', `tm_id` STRING COMMENT '品牌id', `tm_name` STRING COMMENT '品牌名称', `sku_attr_values` ARRAY> COMMENT '平台属性', `sku_sale_attr_values` ARRAY> COMMENT '销售属性', `create_time` STRING COMMENT '创建时间' ) COMMENT '商品维度表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_sku_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` ****2**** ****.**** ****分区规划**** ![](https://i-blog.csdnimg.cn/direct/3cfdddcd4ed644be86004a8872be9ddf.png) ****3**** ****.数据装载**** ![](https://i-blog.csdnimg.cn/direct/35c8bec67d614f90890bf6d24e4bde48.png) ****1)Hive读取索引文件问题**** ****(1)两种方式,分别查询数据有多少行**** hive (gmall)\> select \* from ods_log; Time taken: 0.706 seconds, Fetched: 2955 row(s) hive (gmall)\> select count(\*) from ods_log; 2959 ****(2)两次查询结果不一致。**** ****原因是**** select \* from ods_log不执行MR操作,直接采用的是ods_log建表语句中指定的DeprecatedLzoTextInputFormat,能够识别lzo.index为索引文件。 select count(\*) from ods_log执行MR操作,会先经过hive.input.format,其默认值为CombineHiveInputFormat,其会先将索引文件当成小文件合并,将其当做普通文件处理。更严重的是,这会导致LZO文件无法切片。 hive (gmall)\> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; ****解决办法:修改**** CombineHiveInputFormat为HiveInputFormat hive (gmall)\> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; ****2)首日装载**** ```sql with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ods_sku_info where dt='2020-06-14' ), spu as ( select id, spu_name from ods_spu_info where dt='2020-06-14' ), c3 as ( select id, name, category2_id from ods_base_category3 where dt='2020-06-14' ), c2 as ( select id, name, category1_id from ods_base_category2 where dt='2020-06-14' ), c1 as ( select id, name from ods_base_category1 where dt='2020-06-14' ), tm as ( select id, tm_name from ods_base_trademark where dt='2020-06-14' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ods_sku_attr_value where dt='2020-06-14' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ods_sku_sale_attr_value where dt='2020-06-14' group by sku_id ) insert overwrite table dim_sku_info partition(dt='2020-06-14') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id; ``` ****3)每日装载**** ```sql with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ods_sku_info where dt='2020-06-15' ), spu as ( select id, spu_name from ods_spu_info where dt='2020-06-15' ), c3 as ( select id, name, category2_id from ods_base_category3 where dt='2020-06-15' ), c2 as ( select id, name, category1_id from ods_base_category2 where dt='2020-06-15' ), c1 as ( select id, name from ods_base_category1 where dt='2020-06-15' ), tm as ( select id, tm_name from ods_base_trademark where dt='2020-06-15' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ods_sku_attr_value where dt='2020-06-15' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ods_sku_sale_attr_value where dt='2020-06-15' group by sku_id ) insert overwrite table dim_sku_info partition(dt='2020-06-15') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id; ``` ### **5.** **2** **优惠券维度表(全量)** ****1.建表语句**** ```sql DROP TABLE IF EXISTS dim_coupon_info; CREATE EXTERNAL TABLE dim_coupon_info( `id` STRING COMMENT '购物券编号', `coupon_name` STRING COMMENT '购物券名称', `coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券', `condition_amount` DECIMAL(16,2) COMMENT '满额数', `condition_num` BIGINT COMMENT '满件数', `activity_id` STRING COMMENT '活动编号', `benefit_amount` DECIMAL(16,2) COMMENT '减金额', `benefit_discount` DECIMAL(16,2) COMMENT '折扣', `create_time` STRING COMMENT '创建时间', `range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌', `limit_num` BIGINT COMMENT '最多领取次数', `taken_count` BIGINT COMMENT '已领取次数', `start_time` STRING COMMENT '可以领取的开始日期', `end_time` STRING COMMENT '可以领取的结束日期', `operate_time` STRING COMMENT '修改时间', `expire_time` STRING COMMENT '过期时间' ) COMMENT '优惠券维度表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_coupon_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` ****2**** ****.**** ****分区规划**** ![](https://i-blog.csdnimg.cn/direct/8b1a0fda63c847a2a2e3b49b0c5be468.png) ****3**** ****.数据装载**** ![](https://i-blog.csdnimg.cn/direct/2d7b267ee9554651aebeddf50c99b9b5.png) ****1)首日装载**** ```sql insert overwrite table dim_coupon_info partition(dt='2020-06-14') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ods_coupon_info where dt='2020-06-14'; ``` ****2)每日装载**** ```sql insert overwrite table dim_coupon_info partition(dt='2020-06-15') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ods_coupon_info where dt='2020-06-15'; ``` ### **5.** **3** **活动维度表(全量)** ****1.建表语句**** ```sql DROP TABLE IF EXISTS dim_activity_rule_info; CREATE EXTERNAL TABLE dim_activity_rule_info( `activity_rule_id` STRING COMMENT '活动规则ID', `activity_id` STRING COMMENT '活动ID', `activity_name` STRING COMMENT '活动名称', `activity_type` STRING COMMENT '活动类型', `start_time` STRING COMMENT '开始时间', `end_time` STRING COMMENT '结束时间', `create_time` STRING COMMENT '创建时间', `condition_amount` DECIMAL(16,2) COMMENT '满减金额', `condition_num` BIGINT COMMENT '满减件数', `benefit_amount` DECIMAL(16,2) COMMENT '优惠金额', `benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣', `benefit_level` STRING COMMENT '优惠级别' ) COMMENT '活动信息表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_activity_rule_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` ****2**** ****.**** ****分区规划**** ![](https://i-blog.csdnimg.cn/direct/e71aad55856f465886adf5d5f5a9e95a.png) ****3**** ****.数据装载**** ![](https://i-blog.csdnimg.cn/direct/90995e8162f1438abb68dc8f67252f89.png) ****1)首日装载**** ```sql insert overwrite table dim_activity_rule_info partition(dt='2020-06-14') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ods_activity_rule where dt='2020-06-14' )ar left join ( select id, activity_name, start_time, end_time, create_time from ods_activity_info where dt='2020-06-14' )ai on ar.activity_id=ai.id; ``` ****2)每日转载**** ```sql insert overwrite table dim_activity_rule_info partition(dt='2020-06-15') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ods_activity_rule where dt='2020-06-15' )ar left join ( select id, activity_name, start_time, end_time, create_time from ods_activity_info where dt='2020-06-15' )ai on ar.activity_id=ai.id; ``` ### **5.** **4** **地区维度表(特殊)** ****1.建表语句**** ```sql DROP TABLE IF EXISTS dim_base_province; CREATE EXTERNAL TABLE dim_base_province ( `id` STRING COMMENT 'id', `province_name` STRING COMMENT '省市名称', `area_code` STRING COMMENT '地区编码', `iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用', `iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用', `region_id` STRING COMMENT '地区id', `region_name` STRING COMMENT '地区名称' ) COMMENT '地区维度表' STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_base_province/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` ****2.数据装载**** 地区维度表数据相对稳定,变化概率较低,故无需每日装载。 ![](https://i-blog.csdnimg.cn/direct/00e7ffaeff814d6ba26e967503be9d1d.png) ```sql insert overwrite table dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.iso_3166_2, bp.region_id, br.region_name from ods_base_province bp join ods_base_region br on bp.region_id = br.id; ``` ### **5.5 时间维度表(特殊)** ****1.建表语句**** ```sql DROP TABLE IF EXISTS dim_date_info; CREATE EXTERNAL TABLE dim_date_info( `date_id` STRING COMMENT '日', `week_id` STRING COMMENT '周ID', `week_day` STRING COMMENT '周几', `day` STRING COMMENT '每月的第几天', `month` STRING COMMENT '第几月', `quarter` STRING COMMENT '第几季度', `year` STRING COMMENT '年', `is_workday` STRING COMMENT '是否是工作日', `holiday_id` STRING COMMENT '节假日' ) COMMENT '时间维度表' STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_date_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` ****2.数据装载**** 通常情况下,时间维度表的数据并不是来自于业务系统,而是手动写入,并且由于时间维度表数据的可预见性,无须每日导入,一般可一次性导入一年的数据。 1)创建临时表 ```sql DROP TABLE IF EXISTS tmp_dim_date_info; CREATE EXTERNAL TABLE tmp_dim_date_info ( `date_id` STRING COMMENT '日', `week_id` STRING COMMENT '周ID', `week_day` STRING COMMENT '周几', `day` STRING COMMENT '每月的第几天', `month` STRING COMMENT '第几月', `quarter` STRING COMMENT '第几季度', `year` STRING COMMENT '年', `is_workday` STRING COMMENT '是否是工作日', `holiday_id` STRING COMMENT '节假日' ) COMMENT '时间维度表' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/'; ``` 2)将数据文件上传到HFDS上临时表指定路径/warehouse/gmall/tmp/tmp_dim_date_info/ ![](https://i-blog.csdnimg.cn/direct/479724b4dae846a6ba0f87a8b9047c8a.png) 其内容如下: ```bash 2020-01-01 1 3 1 1 1 2020 0 元旦 2020-01-02 1 4 2 1 1 2020 1 \N 2020-01-03 1 5 3 1 1 2020 1 \N 2020-01-04 1 6 4 1 1 2020 0 \N ``` 3)执行以下语句将其导入时间维度表 insert overwrite table dim_date_info select \* from tmp_dim_date_info; 4)检查数据是否导入成功 select \* from dim_date_info; ### **5.6 用户维度表(拉链表)** #### **5** **.6.1** **拉链表概述** ****1)什么是拉链表**** ![](https://i-blog.csdnimg.cn/direct/ac45d2f0e30e4d0aa21344b7c3cb24ed.png) ****2)为什么要做拉链表**** ![](https://i-blog.csdnimg.cn/direct/f17c420e97084b739fa2ff9d6b8fb35b.png) ****3)如何使用拉链表**** ![](https://i-blog.csdnimg.cn/direct/94468b2d163645669f94d438e8f9b5a3.png) ****4**** ****)拉链表形成过程**** ![](https://i-blog.csdnimg.cn/direct/a59bbc4bc18540788ee17cc2b5d9d29e.png) #### **5.6.2** **制作拉链表** ****1.建表语句**** ```sql DROP TABLE IF EXISTS dim_user_info; CREATE EXTERNAL TABLE dim_user_info( `id` STRING COMMENT '用户id', `login_name` STRING COMMENT '用户名称', `nick_name` STRING COMMENT '用户昵称', `name` STRING COMMENT '用户姓名', `phone_num` STRING COMMENT '手机号码', `email` STRING COMMENT '邮箱', `user_level` STRING COMMENT '用户等级', `birthday` STRING COMMENT '生日', `gender` STRING COMMENT '性别', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '操作时间', `start_date` STRING COMMENT '开始日期', `end_date` STRING COMMENT '结束日期' ) COMMENT '用户表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_user_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` ****2**** ****.**** ****分区规划**** ![](https://i-blog.csdnimg.cn/direct/6278b9dc52f6453ea0d284cace3935ad.png) ****3.**** ****数据装载**** ![](https://i-blog.csdnimg.cn/direct/e2d97ffc6bf64690ab3312770195c00b.png) ![](https://i-blog.csdnimg.cn/direct/fe13e1aeeb004f65b59ab40a60df740a.png) ****1)首日装载**** 拉链表首日装载,需要进行初始化操作,具体工作为将截止到初始化当日的全部历史用户导入一次性导入到拉链表中。目前的ods_user_info表的第一个分区,即2020-06-14分区中就是全部的历史用户,故将该分区数据进行一定处理后导入拉链表的9999-99-99分区即可。 ```sql insert overwrite table dim_user_info partition(dt='9999-99-99') select id, login_name, nick_name, md5(name), md5(phone_num), md5(email), user_level, birthday, gender, create_time, operate_time, '2020-06-14', '9999-99-99' from ods_user_info where dt='2020-06-14'; ``` ****2)每日装载**** ****(1)实现思路**** ![](https://i-blog.csdnimg.cn/direct/fdbd23e542d445498bac9348856c1374.png) ![](https://i-blog.csdnimg.cn/direct/e708ab6d627f4d90bb89c9af967e869b.png) ![](https://i-blog.csdnimg.cn/direct/0670c90577b54e09bd0f45b026e195e6.png) ****(2)sql编写**** ```sql with tmp as ( select old.id old_id, old.login_name old_login_name, old.nick_name old_nick_name, old.name old_name, old.phone_num old_phone_num, old.email old_email, old.user_level old_user_level, old.birthday old_birthday, old.gender old_gender, old.create_time old_create_time, old.operate_time old_operate_time, old.start_date old_start_date, old.end_date old_end_date, new.id new_id, new.login_name new_login_name, new.nick_name new_nick_name, new.name new_name, new.phone_num new_phone_num, new.email new_email, new.user_level new_user_level, new.birthday new_birthday, new.gender new_gender, new.create_time new_create_time, new.operate_time new_operate_time, new.start_date new_start_date, new.end_date new_end_date from ( select id, login_name, nick_name, name, phone_num, email, user_level, birthday, gender, create_time, operate_time, start_date, end_date from dim_user_info where dt='9999-99-99' )old full outer join ( select id, login_name, nick_name, md5(name) name, md5(phone_num) phone_num, md5(email) email, user_level, birthday, gender, create_time, operate_time, '2020-06-15' start_date, '9999-99-99' end_date from ods_user_info where dt='2020-06-15' )new on old.id=new.id ) insert overwrite table dim_user_info partition(dt) select nvl(new_id,old_id), nvl(new_login_name,old_login_name), nvl(new_nick_name,old_nick_name), nvl(new_name,old_name), nvl(new_phone_num,old_phone_num), nvl(new_email,old_email), nvl(new_user_level,old_user_level), nvl(new_birthday,old_birthday), nvl(new_gender,old_gender), nvl(new_create_time,old_create_time), nvl(new_operate_time,old_operate_time), nvl(new_start_date,old_start_date), nvl(new_end_date,old_end_date), nvl(new_end_date,old_end_date) dt from tmp union all select old_id, old_login_name, old_nick_name, old_name, old_phone_num, old_email, old_user_level, old_birthday, old_gender, old_create_time, old_operate_time, old_start_date, cast(date_add('2020-06-15',-1) as string), cast(date_add('2020-06-15',-1) as string) dt from tmp where new_id is not null and old_id is not null; ``` ### **5.7** **DIM层首日数据装载脚本** ****1)编写脚本**** (1)在/home/seven/bin目录下创建脚本ods_to_dim_db_init.sh \[seven@hadoop102 bin\]$ vim ods_to_dim_db_init.sh 在脚本中填写如下内容 ```bash #!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dim_user_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_user_info partition(dt='9999-99-99') select id, login_name, nick_name, md5(name), md5(phone_num), md5(email), user_level, birthday, gender, create_time, operate_time, '$do_date', '9999-99-99' from ${APP}.ods_user_info where dt='$do_date'; " dim_sku_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ${APP}.ods_sku_info where dt='$do_date' ), spu as ( select id, spu_name from ${APP}.ods_spu_info where dt='$do_date' ), c3 as ( select id, name, category2_id from ${APP}.ods_base_category3 where dt='$do_date' ), c2 as ( select id, name, category1_id from ${APP}.ods_base_category2 where dt='$do_date' ), c1 as ( select id, name from ${APP}.ods_base_category1 where dt='$do_date' ), tm as ( select id, tm_name from ${APP}.ods_base_trademark where dt='$do_date' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ${APP}.ods_sku_attr_value where dt='$do_date' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ${APP}.ods_sku_sale_attr_value where dt='$do_date' group by sku_id ) insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id; " dim_base_province=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.iso_3166_2, bp.region_id, br.region_name from ${APP}.ods_base_province bp join ${APP}.ods_base_region br on bp.region_id = br.id; " dim_coupon_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ${APP}.ods_coupon_info where dt='$do_date'; " dim_activity_rule_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ${APP}.ods_activity_rule where dt='$do_date' )ar left join ( select id, activity_name, start_time, end_time, create_time from ${APP}.ods_activity_info where dt='$do_date' )ai on ar.activity_id=ai.id; " case $1 in "dim_user_info"){ hive -e "$dim_user_info" };; "dim_sku_info"){ hive -e "$dim_sku_info" };; "dim_base_province"){ hive -e "$dim_base_province" };; "dim_coupon_info"){ hive -e "$dim_coupon_info" };; "dim_activity_rule_info"){ hive -e "$dim_activity_rule_info" };; "all"){ hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info$dim_base_province" };; esac ``` (2)增加执行权限 \[seven@hadoop102 bin\]$ chmod +x ods_to_dim_db_init.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ ods_to_dim_db_init.sh all 2020-06-14 ****注意:该脚本不包含时间维度表的装载,时间维度表需手动装载数据,参考5**** ****.5**** ****节。**** (2)查看数据是否导入成功 ### **5.** **8** **DIM层每日数据装载脚本** ****1)编写脚本**** (1)在/home/seven/bin目录下创建脚本ods_to_dim_db.sh \[seven@hadoop102 bin\]$ vim ods_to_dim_db.sh 在脚本中填写如下内容 ```bash #!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi dim_user_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; with tmp as ( select old.id old_id, old.login_name old_login_name, old.nick_name old_nick_name, old.name old_name, old.phone_num old_phone_num, old.email old_email, old.user_level old_user_level, old.birthday old_birthday, old.gender old_gender, old.create_time old_create_time, old.operate_time old_operate_time, old.start_date old_start_date, old.end_date old_end_date, new.id new_id, new.login_name new_login_name, new.nick_name new_nick_name, new.name new_name, new.phone_num new_phone_num, new.email new_email, new.user_level new_user_level, new.birthday new_birthday, new.gender new_gender, new.create_time new_create_time, new.operate_time new_operate_time, new.start_date new_start_date, new.end_date new_end_date from ( select id, login_name, nick_name, name, phone_num, email, user_level, birthday, gender, create_time, operate_time, start_date, end_date from ${APP}.dim_user_info where dt='9999-99-99' and start_date<'$do_date' )old full outer join ( select id, login_name, nick_name, md5(name) name, md5(phone_num) phone_num, md5(email) email, user_level, birthday, gender, create_time, operate_time, '$do_date' start_date, '9999-99-99' end_date from ${APP}.ods_user_info where dt='$do_date' )new on old.id=new.id ) insert overwrite table ${APP}.dim_user_info partition(dt) select nvl(new_id,old_id), nvl(new_login_name,old_login_name), nvl(new_nick_name,old_nick_name), nvl(new_name,old_name), nvl(new_phone_num,old_phone_num), nvl(new_email,old_email), nvl(new_user_level,old_user_level), nvl(new_birthday,old_birthday), nvl(new_gender,old_gender), nvl(new_create_time,old_create_time), nvl(new_operate_time,old_operate_time), nvl(new_start_date,old_start_date), nvl(new_end_date,old_end_date), nvl(new_end_date,old_end_date) dt from tmp union all select old_id, old_login_name, old_nick_name, old_name, old_phone_num, old_email, old_user_level, old_birthday, old_gender, old_create_time, old_operate_time, old_start_date, cast(date_add('$do_date',-1) as string), cast(date_add('$do_date',-1) as string) dt from tmp where new_id is not null and old_id is not null; " dim_sku_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ${APP}.ods_sku_info where dt='$do_date' ), spu as ( select id, spu_name from ${APP}.ods_spu_info where dt='$do_date' ), c3 as ( select id, name, category2_id from ${APP}.ods_base_category3 where dt='$do_date' ), c2 as ( select id, name, category1_id from ${APP}.ods_base_category2 where dt='$do_date' ), c1 as ( select id, name from ${APP}.ods_base_category1 where dt='$do_date' ), tm as ( select id, tm_name from ${APP}.ods_base_trademark where dt='$do_date' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ${APP}.ods_sku_attr_value where dt='$do_date' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ${APP}.ods_sku_sale_attr_value where dt='$do_date' group by sku_id ) insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id; " dim_base_province=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.iso_3166_2, bp.region_id, bp.name from ${APP}.ods_base_province bp join ${APP}.ods_base_region br on bp.region_id = br.id; " dim_coupon_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ${APP}.ods_coupon_info where dt='$do_date'; " dim_activity_rule_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ${APP}.ods_activity_rule where dt='$do_date' )ar left join ( select id, activity_name, start_time, end_time, create_time from ${APP}.ods_activity_info where dt='$do_date' )ai on ar.activity_id=ai.id; " case $1 in "dim_user_info"){ hive -e "$dim_user_info" };; "dim_sku_info"){ hive -e "$dim_sku_info" };; "dim_base_province"){ hive -e "$dim_base_province" };; "dim_coupon_info"){ hive -e "$dim_coupon_info" };; "dim_activity_rule_info"){ hive -e "$dim_activity_rule_info" };; "all"){ hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info" };; esac ``` (2)增加执行权限 \[seven@hadoop102 bin\]$ chmod +x ods_to_dim_db.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ ods_to_dim_db.sh all 2020-06-14 (2)查看数据是否导入成功 ## **第6章 数仓** **搭建** **-** **DWD** **层** 1)对用户行为数据解析。 2)对业务数据采用****维度模型****重新建模。 ### **6.1 DWD** **层(用户行为日志)** #### **6.1.1** **日志解析思路** ****1)日志结构回顾**** (1)页面埋点日志 ![](https://i-blog.csdnimg.cn/direct/54cab037411e468a8f3fa7bc11820a50.png) (2)启动日志 ![](https://i-blog.csdnimg.cn/direct/f6c2ed93183347fabbfa3139fe05ea2a.png) ****2)日志解析思路**** ![](https://i-blog.csdnimg.cn/direct/960e6e969fa940f2956b64036576d105.png) #### **6** **.1.** **2 get_json_object** **函数使用** 1)数据 \[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}

2)取出第一个json对象

hive (gmall)>select get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]','$[0]');

结果是:{"name":"大郎","sex":"男","age":"25"}

3)取出第一个json的age字段的值

hive (gmall)>SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]',"$[0].age");

结果是:25

6 .1.3 启动日志表

启动日志解析思路: 启动日志表中每行数据对应一个启动记录,一个启动记录应该包含日志中的公共信息和启动信息。先将所有包含start字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

1)建表语句

sql 复制代码
DROP TABLE IF EXISTS dwd_start_log;
CREATE EXTERNAL TABLE dwd_start_log(
    `area_code` STRING COMMENT '地区编码',
    `brand` STRING COMMENT '手机品牌',
    `channel` STRING COMMENT '渠道',
    `is_new` STRING COMMENT '是否首次启动',
    `model` STRING COMMENT '手机型号',
    `mid_id` STRING COMMENT '设备id',
    `os` STRING COMMENT '操作系统',
    `user_id` STRING COMMENT '会员id',
    `version_code` STRING COMMENT 'app版本号',
    `entry` STRING COMMENT 'icon手机图标 notice 通知 install 安装后启动',
    `loading_time` BIGINT COMMENT '启动加载时间',
    `open_ad_id` STRING COMMENT '广告页ID ',
    `open_ad_ms` BIGINT COMMENT '广告总共播放时间',
    `open_ad_skip_ms` BIGINT COMMENT '用户跳过广告时点',
    `ts` BIGINT COMMENT '时间'
) COMMENT '启动日志表'
PARTITIONED BY (`dt` STRING) -- 按照时间创建分区
STORED AS PARQUET -- 采用parquet列式存储
LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存储位置
TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO压缩
;

2)数据导入

sql 复制代码
hive (gmall)> 
insert overwrite table dwd_start_log partition(dt='2020-06-14')
select
    get_json_object(line,'$.common.ar'),
    get_json_object(line,'$.common.ba'),
    get_json_object(line,'$.common.ch'),
    get_json_object(line,'$.common.is_new'),
    get_json_object(line,'$.common.md'),
    get_json_object(line,'$.common.mid'),
    get_json_object(line,'$.common.os'),
    get_json_object(line,'$.common.uid'),
    get_json_object(line,'$.common.vc'),
    get_json_object(line,'$.start.entry'),
    get_json_object(line,'$.start.loading_time'),
    get_json_object(line,'$.start.open_ad_id'),
    get_json_object(line,'$.start.open_ad_ms'),
    get_json_object(line,'$.start.open_ad_skip_ms'),
    get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.start') is not null;

3 )查看数据

hive (gmall)> select * from dwd_start_log where dt='2020-06-14' limit 2;

6 .1.4 页面日志表

页面日志解析思路: 页面日志表中每行数据对应一个页面访问记录,一个页面访问记录应该包含日志中的公共信息和页面信息。先将所有包含page字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

1)建表语句

sql 复制代码
DROP TABLE IF EXISTS dwd_page_log;
CREATE EXTERNAL TABLE dwd_page_log(
    `area_code` STRING COMMENT '地区编码',
    `brand` STRING COMMENT '手机品牌',
    `channel` STRING COMMENT '渠道',
    `is_new` STRING COMMENT '是否首次启动',
    `model` STRING COMMENT '手机型号',
    `mid_id` STRING COMMENT '设备id',
    `os` STRING COMMENT '操作系统',
    `user_id` STRING COMMENT '会员id',
    `version_code` STRING COMMENT 'app版本号',
    `during_time` BIGINT COMMENT '持续时间毫秒',
    `page_item` STRING COMMENT '目标id ',
    `page_item_type` STRING COMMENT '目标类型',
    `last_page_id` STRING COMMENT '上页类型',
    `page_id` STRING COMMENT '页面ID ',
    `source_type` STRING COMMENT '来源类型',
    `ts` bigint
) COMMENT '页面日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_page_log'
TBLPROPERTIES('parquet.compression'='lzo');

2)数据导入

sql 复制代码
hive (gmall)>
insert overwrite table dwd_page_log partition(dt='2020-06-14')
select
    get_json_object(line,'$.common.ar'),
    get_json_object(line,'$.common.ba'),
    get_json_object(line,'$.common.ch'),
    get_json_object(line,'$.common.is_new'),
    get_json_object(line,'$.common.md'),
    get_json_object(line,'$.common.mid'),
    get_json_object(line,'$.common.os'),
    get_json_object(line,'$.common.uid'),
    get_json_object(line,'$.common.vc'),
    get_json_object(line,'$.page.during_time'),
    get_json_object(line,'$.page.item'),
    get_json_object(line,'$.page.item_type'),
    get_json_object(line,'$.page.last_page_id'),
    get_json_object(line,'$.page.page_id'),
    get_json_object(line,'$.page.source_type'),
    get_json_object(line,'$.ts')
from ods_log
where dt='2020-06-14'
and get_json_object(line,'$.page') is not null;

3 )查看数据

hive (gmall)>

select * from dwd_page_log where dt='2020-06-14' limit 2;

6 .1.5 动作日志表

动作日志解析思路: 动作日志表中每行数据对应用户的一个动作记录,一个动作记录应当包含公共信息、页面信息以及动作信息。先将包含action字段的日志过滤出来,然后通过UDTF函数,将action数组"炸开"(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。

1)建表语句

sql 复制代码
DROP TABLE IF EXISTS dwd_action_log;
CREATE EXTERNAL TABLE dwd_action_log(
    `area_code` STRING COMMENT '地区编码',
    `brand` STRING COMMENT '手机品牌',
    `channel` STRING COMMENT '渠道',
    `is_new` STRING COMMENT '是否首次启动',
    `model` STRING COMMENT '手机型号',
    `mid_id` STRING COMMENT '设备id',
    `os` STRING COMMENT '操作系统',
    `user_id` STRING COMMENT '会员id',
    `version_code` STRING COMMENT 'app版本号',
    `during_time` BIGINT COMMENT '持续时间毫秒',
    `page_item` STRING COMMENT '目标id ',
    `page_item_type` STRING COMMENT '目标类型',
    `last_page_id` STRING COMMENT '上页类型',
    `page_id` STRING COMMENT '页面id ',
    `source_type` STRING COMMENT '来源类型',
    `action_id` STRING COMMENT '动作id',
    `item` STRING COMMENT '目标id ',
    `item_type` STRING COMMENT '目标类型',
    `ts` BIGINT COMMENT '时间'
) COMMENT '动作日志表'
PARTITIONED BY (`dt` STRING)
STORED AS PARQUET
LOCATION '/warehouse/gmall/dwd/dwd_action_log'
TBLPROPERTIES('parquet.compression'='lzo');

2 )创建UDTF函数------设计思路

3 )创建UDTF函数------编写代码

(1)创建 一个maven工程 hivefunction
(2)创建 包名:com. seven .hive . udtf
3 )引入如下依赖

XML 复制代码
<dependencies>
    <!--添加hive依赖-->
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>3.1.2</version>
    </dependency>
</dependencies>

4 )编码

java 复制代码
package com.seven.hive.udtf;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.json.JSONArray;

import java.util.ArrayList;
import java.util.List;

public class ExplodeJSONArray extends GenericUDTF {

    @Override
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {

        // 1 参数合法性检查
        if (argOIs.length != 1) {
            throw new UDFArgumentException("explode_json_array 只需要一个参数");
        }

        // 2 第一个参数必须为string
        //判断参数是否为基础数据类型
        if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
            throw new UDFArgumentException("explode_json_array 只接受基础类型参数");
        }

        //将参数对象检查器强转为基础类型对象检查器
        PrimitiveObjectInspector argumentOI = (PrimitiveObjectInspector) argOIs[0];

        //判断参数是否为String类型
        if (argumentOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) {
            throw new UDFArgumentException("explode_json_array 只接受string类型的参数");
        }

        // 3 定义返回值名称和类型
        List<String> fieldNames = new ArrayList<String>();
        List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

        fieldNames.add("items");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }

    public void process(Object[] objects) throws HiveException {

        // 1 获取传入的数据
        String jsonArray = objects[0].toString();

        // 2 将string转换为json数组
        JSONArray actions = new JSONArray(jsonArray);

        // 3 循环一次,取出数组中的一个json,并写出
        for (int i = 0; i < actions.length(); i++) {

            String[] result = new String[1];
            result[0] = actions.getString(i);
            forward(result);
        }
    }

    public void close() throws HiveException {

    }

}

4 )创建函数

(1)打包

(2)将hivefunction-1.0-SNAPSHOT.jar上传到hadoop102的/opt/module,然后再将该jar包上传到HDFS的/user/hive/jars路径下

seven@hadoop102 module\]$ hadoop fs -mkdir -p /user/hive/jars \[seven@hadoop102 module\]$ hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars (3)创建永久函数与开发好的java class关联 create function explode_json_array as 'com.seven.hive.udtf.ExplodeJSONArray' using jar 'hdfs://hadoop102:8020/user/hive/jars/hivefunction-1.0-SNAPSHOT.jar'; (4)注意:如果修改了自定义函数重新生成jar包怎么处理?只需要替换HDFS路径上的旧jar包,然后重启Hive客户端即可。 ****5**** ****)数据导入**** ```sql insert overwrite table dwd_action_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(action,'$.action_id'), get_json_object(action,'$.item'), get_json_object(action,'$.item_type'), get_json_object(action,'$.ts') from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action where dt='2020-06-14' and get_json_object(line,'$.actions') is not null; ``` ****3)查看数据**** select \* from dwd_action_log where dt='2020-06-14' limit 2; #### **6** **.1.6** **曝光日志表** ****曝光日志解析思路:**** 曝光日志表中每行数据对应一个曝光记录,一个曝光记录应当包含公共信息、页面信息以及曝光信息。先将包含display字段的日志过滤出来,然后通过UDTF函数,将display数组"炸开"(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。 ![](https://i-blog.csdnimg.cn/direct/0a83ddb0cf2c4d8dafe1eab268848f7e.png) ****1)建表语句**** ```sql DROP TABLE IF EXISTS dwd_display_log; CREATE EXTERNAL TABLE dwd_display_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `during_time` BIGINT COMMENT 'app版本号', `page_item` STRING COMMENT '目标id ', `page_item_type` STRING COMMENT '目标类型', `last_page_id` STRING COMMENT '上页类型', `page_id` STRING COMMENT '页面ID ', `source_type` STRING COMMENT '来源类型', `ts` BIGINT COMMENT 'app版本号', `display_type` STRING COMMENT '曝光类型', `item` STRING COMMENT '曝光对象id ', `item_type` STRING COMMENT 'app版本号', `order` BIGINT COMMENT '曝光顺序', `pos_id` BIGINT COMMENT '曝光位置' ) COMMENT '曝光日志表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_display_log' TBLPROPERTIES('parquet.compression'='lzo'); ``` ****2)数据导入**** ```sql insert overwrite table dwd_display_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts'), get_json_object(display,'$.display_type'), get_json_object(display,'$.item'), get_json_object(display,'$.item_type'), get_json_object(display,'$.order'), get_json_object(display,'$.pos_id') from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display where dt='2020-06-14' and get_json_object(line,'$.displays') is not null; ``` ****3**** ****)查看数据**** select \* from dwd_display_log where dt='2020-06-14' limit 2; #### **6** **.1.7** **错误日志表** ****错误日志解析思路:**** 错误日志表中每行数据对应一个错误记录,为方便定位错误,一个错误记录应当包含与之对应的公共信息、页面信息、曝光信息、动作信息、启动信息以及错误信息。先将包含err字段的日志过滤出来,然后使用get_json_object函数解析所有字段。 ![](https://i-blog.csdnimg.cn/direct/df0a39e6268a4ff495fb659c7d21f764.png) ****1)建表语句**** ```sql DROP TABLE IF EXISTS dwd_error_log; CREATE EXTERNAL TABLE dwd_error_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `page_item` STRING COMMENT '目标id ', `page_item_type` STRING COMMENT '目标类型', `last_page_id` STRING COMMENT '上页类型', `page_id` STRING COMMENT '页面ID ', `source_type` STRING COMMENT '来源类型', `entry` STRING COMMENT ' icon手机图标 notice 通知 install 安装后启动', `loading_time` STRING COMMENT '启动加载时间', `open_ad_id` STRING COMMENT '广告页ID ', `open_ad_ms` STRING COMMENT '广告总共播放时间', `open_ad_skip_ms` STRING COMMENT '用户跳过广告时点', `actions` STRING COMMENT '动作', `displays` STRING COMMENT '曝光', `ts` STRING COMMENT '时间', `error_code` STRING COMMENT '错误码', `msg` STRING COMMENT '错误信息' ) COMMENT '错误日志表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_error_log' TBLPROPERTIES('parquet.compression'='lzo'); ``` ****说明:此处为对动作数组和曝光数组做处理,如需分析错误与单个动作或曝光的关联,可先使用explode**** ****_json_array**** ****函数将数组"炸开",再使用get**** ****_json_object**** ****函数获取具体字段。**** ****4**** ****)数据导入**** ```sql insert overwrite table dwd_error_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.actions'), get_json_object(line,'$.displays'), get_json_object(line,'$.ts'), get_json_object(line,'$.err.error_code'), get_json_object(line,'$.err.msg') from ods_log where dt='2020-06-14' and get_json_object(line,'$.err') is not null; ``` ****5**** ****)查看数据**** hive (gmall)\> select \* from dwd_error_log where dt='2020-06-14' limit 2; #### **6.1.** **8** **DWD层** **用户行为数据加载脚本** 1)编写脚本 (1)在hadoop102的/home/seven/bin目录下创建脚本 \[seven@hadoop102 bin\]$ vim ods_to_dwd_log.sh 在脚本中编写如下内容 ```sql #!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi dwd_start_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.ts') from ${APP}.ods_log where dt='$do_date' and get_json_object(line,'$.start') is not null;" dwd_page_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts') from ${APP}.ods_log where dt='$do_date' and get_json_object(line,'$.page') is not null;" dwd_action_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(action,'$.action_id'), get_json_object(action,'$.item'), get_json_object(action,'$.item_type'), get_json_object(action,'$.ts') from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action where dt='$do_date' and get_json_object(line,'$.actions') is not null;" dwd_display_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts'), get_json_object(display,'$.display_type'), get_json_object(display,'$.item'), get_json_object(display,'$.item_type'), get_json_object(display,'$.order'), get_json_object(display,'$.pos_id') from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display where dt='$do_date' and get_json_object(line,'$.displays') is not null;" dwd_error_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.actions'), get_json_object(line,'$.displays'), get_json_object(line,'$.ts'), get_json_object(line,'$.err.error_code'), get_json_object(line,'$.err.msg') from ${APP}.ods_log where dt='$do_date' and get_json_object(line,'$.err') is not null;" case $1 in dwd_start_log ) hive -e "$dwd_start_log" ;; dwd_page_log ) hive -e "$dwd_page_log" ;; dwd_action_log ) hive -e "$dwd_action_log" ;; dwd_display_log ) hive -e "$dwd_display_log" ;; dwd_error_log ) hive -e "$dwd_error_log" ;; all ) hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log" ;; esac ``` (2)增加脚本执行权限 \[seven@hadoop102 bin\]$ chmod 777 ods_to_dwd_log.sh ****2**** ****)脚本使用**** (1)执行脚本 \[seven@hadoop102 module\]$ ods_to_dwd_log.sh all 2020-06-14 (2)查询导入结果 ### **6** **.** **2** **DWD** **层(业务数据)** 业务数据方面DWD层的搭建主要注意点在于维度建模。 #### **6.2.1** **评价事实表(事务型事实表)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_comment_info; CREATE EXTERNAL TABLE dwd_comment_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `sku_id` STRING COMMENT '商品sku', `spu_id` STRING COMMENT '商品spu', `order_id` STRING COMMENT '订单ID', `appraise` STRING COMMENT '评价(好评、中评、差评、默认评价)', `create_time` STRING COMMENT '评价时间' ) COMMENT '评价事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_comment_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/2ce3683eeb94493f92e5d7975b54b869.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/f551352c250447c1a2c91540dd6e33c7.png) (1)首日装载 ```sql insert overwrite table dwd_comment_info partition (dt) select id, user_id, sku_id, spu_id, order_id, appraise, create_time, date_format(create_time,'yyyy-MM-dd') from ods_comment_info where dt='2020-06-14'; ``` (2)每日装载 ```sql insert overwrite table dwd_comment_info partition(dt='2020-06-15') select id, user_id, sku_id, spu_id, order_id, appraise, create_time from ods_comment_info where dt='2020-06-15'; ``` #### **6.2.2** **订单明细事实表(事务型事实表)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_order_detail; CREATE EXTERNAL TABLE dwd_order_detail ( `id` STRING COMMENT '订单编号', `order_id` STRING COMMENT '订单号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'sku商品id', `province_id` STRING COMMENT '省份ID', `activity_id` STRING COMMENT '活动ID', `activity_rule_id` STRING COMMENT '活动规则ID', `coupon_id` STRING COMMENT '优惠券ID', `create_time` STRING COMMENT '创建时间', `source_type` STRING COMMENT '来源类型', `source_id` STRING COMMENT '来源编号', `sku_num` BIGINT COMMENT '商品数量', `original_amount` DECIMAL(16,2) COMMENT '原始价格', `split_activity_amount` DECIMAL(16,2) COMMENT '活动优惠分摊', `split_coupon_amount` DECIMAL(16,2) COMMENT '优惠券优惠分摊', `split_final_amount` DECIMAL(16,2) COMMENT '最终价格分摊' ) COMMENT '订单明细事实表表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_order_detail/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/d412f8a2ef574049b83d02315b536c96.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/5ee40910876f44ef9ca8f3e428c6a36e.png) (1)首日装载 ```sql insert overwrite table dwd_order_detail partition(dt) select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount, date_format(create_time,'yyyy-MM-dd') from ( select * from ods_order_detail where dt='2020-06-14' )od left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-14' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ods_order_detail_activity where dt='2020-06-14' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ods_order_detail_coupon where dt='2020-06-14' )odc on od.id=odc.order_detail_id; ``` (2)每日装载 ```sql insert overwrite table dwd_order_detail partition(dt='2020-06-15') select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount from ( select * from ods_order_detail where dt='2020-06-15' )od left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-15' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ods_order_detail_activity where dt='2020-06-15' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ods_order_detail_coupon where dt='2020-06-15' )odc on od.id=odc.order_detail_id; ``` #### **6.2.3** **退** **单** **事实表(事务型事实表)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_order_refund_info; CREATE EXTERNAL TABLE dwd_order_refund_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单ID', `sku_id` STRING COMMENT '商品ID', `province_id` STRING COMMENT '地区ID', `refund_type` STRING COMMENT '退单类型', `refund_num` BIGINT COMMENT '退单件数', `refund_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_reason_type` STRING COMMENT '退单原因类型', `create_time` STRING COMMENT '退单时间' ) COMMENT '退单事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/68b5d4abc2504f87899c7f8c182d23da.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/5815e201a81545aa9da24752d541fb73.png) (1)首日装载 ```sql insert overwrite table dwd_order_refund_info partition(dt) select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time, date_format(ri.create_time,'yyyy-MM-dd') from ( select * from ods_order_refund_info where dt='2020-06-14' )ri left join ( select id,province_id from ods_order_info where dt='2020-06-14' )oi on ri.order_id=oi.id; ``` (2)每日装载 ```sql insert overwrite table dwd_order_refund_info partition(dt='2020-06-15') select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time from ( select * from ods_order_refund_info where dt='2020-06-15' )ri left join ( select id,province_id from ods_order_info where dt='2020-06-15' )oi on ri.order_id=oi.id; ``` 3)查询加载结果 #### **6.2.** **4 加购事实表** **(** **周期型快照事实表,每日快照** **)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_cart_info; CREATE EXTERNAL TABLE dwd_cart_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `sku_id` STRING COMMENT '商品ID', `source_type` STRING COMMENT '来源类型', `source_id` STRING COMMENT '来源编号', `cart_price` DECIMAL(16,2) COMMENT '加入购物车时的价格', `is_ordered` STRING COMMENT '是否已下单', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '修改时间', `order_time` STRING COMMENT '下单时间', `sku_num` BIGINT COMMENT '加购数量' ) COMMENT '加购事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_cart_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/acde1b81a77f4a65868f593efd9c57e7.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/295e42302fe6443082b96ae7fede9cb7.png) (1)首日装载 ```sql insert overwrite table dwd_cart_info partition(dt='2020-06-14') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ods_cart_info where dt='2020-06-14'; ``` (2)每日装载 ```sql insert overwrite table dwd_cart_info partition(dt='2020-06-15') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ods_cart_info where dt='2020-06-15'; ``` #### **6.2.** **5 收藏事实表** **(** **周期型快照事实表,每日快照** **)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_favor_info; CREATE EXTERNAL TABLE dwd_favor_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'skuid', `spu_id` STRING COMMENT 'spuid', `is_cancel` STRING COMMENT '是否取消', `create_time` STRING COMMENT '收藏时间', `cancel_time` STRING COMMENT '取消时间' ) COMMENT '收藏事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_favor_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/9e34e0b24ba94214a5d92674475e7674.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/6cf38ef05f654b73b05e678eecc0f6a6.png) (1)首日装载 ```sql insert overwrite table dwd_favor_info partition(dt='2020-06-14') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ods_favor_info where dt='2020-06-14'; ``` (2)每日装载 ```sql insert overwrite table dwd_favor_info partition(dt='2020-06-15') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ods_favor_info where dt='2020-06-15'; ``` #### **6.2.** **6 优惠券领用事实表** **(** **累积型快照事实表** **)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_coupon_use; CREATE EXTERNAL TABLE dwd_coupon_use( `id` STRING COMMENT '编号', `coupon_id` STRING COMMENT '优惠券ID', `user_id` STRING COMMENT 'userid', `order_id` STRING COMMENT '订单id', `coupon_status` STRING COMMENT '优惠券状态', `get_time` STRING COMMENT '领取时间', `using_time` STRING COMMENT '使用时间(下单)', `used_time` STRING COMMENT '使用时间(支付)', `expire_time` STRING COMMENT '过期时间' ) COMMENT '优惠券领用事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/ef6ca9f0e16f4c4c816c085409c46ba0.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/6619c021c3e84b5c8428dd46bafe4cc6.png) (1)首日装载 ```sql insert overwrite table dwd_coupon_use partition(dt) select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time, coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99') from ods_coupon_use where dt='2020-06-14'; ``` (2)每日装载 a.装载逻辑 ![](https://i-blog.csdnimg.cn/direct/7924e4cea6b34dffb40d47a4f43c1921.png) b.转载语句 ```sql insert overwrite table dwd_coupon_use partition(dt) select nvl(new.id,old.id), nvl(new.coupon_id,old.coupon_id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.coupon_status,old.coupon_status), nvl(new.get_time,old.get_time), nvl(new.using_time,old.using_time), nvl(new.used_time,old.used_time), nvl(new.expire_time,old.expire_time), coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from dwd_coupon_use where dt='9999-99-99' )old full outer join ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from ods_coupon_use where dt='2020-06-15' )new on old.id=new.id; ``` #### **6.2.** **7 支付事实表(** **累积** **型** **快照** **事实表)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_payment_info; CREATE EXTERNAL TABLE dwd_payment_info ( `id` STRING COMMENT '编号', `order_id` STRING COMMENT '订单编号', `user_id` STRING COMMENT '用户编号', `province_id` STRING COMMENT '地区ID', `trade_no` STRING COMMENT '交易编号', `out_trade_no` STRING COMMENT '对外交易编号', `payment_type` STRING COMMENT '支付类型', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `payment_status` STRING COMMENT '支付状态', `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间 `callback_time` STRING COMMENT '完成时间'--支付完成时间,即支付成功回调时间 ) COMMENT '支付事实表表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_payment_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/de707bc28517419da07ec38f0a4d377f.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/c52f268a377e4684adc303d337638984.png) (1)首日装载 ```sql insert overwrite table dwd_payment_info partition(dt) select pi.id, pi.order_id, pi.user_id, oi.province_id, pi.trade_no, pi.out_trade_no, pi.payment_type, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time, nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select * from ods_payment_info where dt='2020-06-14' )pi left join ( select id,province_id from ods_order_info where dt='2020-06-14' )oi on pi.order_id=oi.id; ``` (2)每日装载 ```sql insert overwrite table dwd_payment_info partition(dt) select nvl(new.id,old.id), nvl(new.order_id,old.order_id), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.payment_amount,old.payment_amount), nvl(new.payment_status,old.payment_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, order_id, user_id, province_id, trade_no, out_trade_no, payment_type, payment_amount, payment_status, create_time, callback_time from dwd_payment_info where dt = '9999-99-99' )old full outer join ( select pi.id, pi.out_trade_no, pi.order_id, pi.user_id, oi.province_id, pi.payment_type, pi.trade_no, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time from ( select * from ods_payment_info where dt='2020-06-15' )pi left join ( select id,province_id from ods_order_info where dt='2020-06-15' )oi on pi.order_id=oi.id )new on old.id=new.id; ``` #### **6.2.** **8 退款事实表(** **累积** **型** **快照** **事实表)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_refund_payment; CREATE EXTERNAL TABLE dwd_refund_payment ( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单编号', `sku_id` STRING COMMENT 'SKU编号', `province_id` STRING COMMENT '地区ID', `trade_no` STRING COMMENT '交易编号', `out_trade_no` STRING COMMENT '对外交易编号', `payment_type` STRING COMMENT '支付类型', `refund_amount` DECIMAL(16,2) COMMENT '退款金额', `refund_status` STRING COMMENT '退款状态', `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间 `callback_time` STRING COMMENT '回调时间'--支付接口回调时间,即支付成功时间 ) COMMENT '退款事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/f25ad7f785024cebb189f0c39571c32c.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/178405325d7e4f82a7d1ff769a3b4298.png) (1)首日装载 ```sql insert overwrite table dwd_refund_payment partition(dt) select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time, nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ods_refund_payment where dt='2020-06-14' )rp left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-14' )oi on rp.order_id=oi.id; ``` (2)每日装载 ```sql insert overwrite table dwd_refund_payment partition(dt) select nvl(new.id,old.id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.sku_id,old.sku_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.refund_amount,old.refund_amount), nvl(new.refund_status,old.refund_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from dwd_refund_payment where dt='9999-99-99' )old full outer join ( select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ods_refund_payment where dt='2020-06-15' )rp left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-15' )oi on rp.order_id=oi.id )new on old.id=new.id; ``` 3)查询加载结果 #### **6.2.** **9 订单事实表** **(** **累积型快照事实表** **)** 1)建表语句 ```sql DROP TABLE IF EXISTS dwd_order_info; CREATE EXTERNAL TABLE dwd_order_info( `id` STRING COMMENT '编号', `order_status` STRING COMMENT '订单状态', `user_id` STRING COMMENT '用户ID', `province_id` STRING COMMENT '地区ID', `payment_way` STRING COMMENT '支付方式', `delivery_address` STRING COMMENT '邮寄地址', `out_trade_no` STRING COMMENT '对外交易编号', `tracking_no` STRING COMMENT '物流单号', `create_time` STRING COMMENT '创建时间(未支付状态)', `payment_time` STRING COMMENT '支付时间(已支付状态)', `cancel_time` STRING COMMENT '取消时间(已取消状态)', `finish_time` STRING COMMENT '完成时间(已完成状态)', `refund_time` STRING COMMENT '退款时间(退款中状态)', `refund_finish_time` STRING COMMENT '退款完成时间(退款完成状态)', `expire_time` STRING COMMENT '过期时间', `feight_fee` DECIMAL(16,2) COMMENT '运费', `feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免', `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免', `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免', `original_amount` DECIMAL(16,2) COMMENT '订单原始价格', `final_amount` DECIMAL(16,2) COMMENT '订单最终价格' ) COMMENT '订单事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_order_info/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)分区规划 ![](https://i-blog.csdnimg.cn/direct/446209aef41b43c698277b8b8c906488.png) 3)数据装载 ![](https://i-blog.csdnimg.cn/direct/801d87f551ad4ab283c3c61f01ef4408.png) ![](https://i-blog.csdnimg.cn/direct/653ee758b8844a2f838fe8702b03fc19.png) ![](https://i-blog.csdnimg.cn/direct/c986694c4c2d467189751e43a4412777.png) ![](https://i-blog.csdnimg.cn/direct/9b897c4636dc451dbb10b89cc1ac492d.png) (1)首日装载 ```sql insert overwrite table dwd_order_info partition(dt) select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount, case when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd') when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7) when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd') when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select * from ods_order_info where dt='2020-06-14' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ods_order_status_log where dt='2020-06-14' group by order_id )times on oi.id=times.order_id; ``` (2)每日装载 ```sql insert overwrite table dwd_order_info partition(dt) select nvl(new.id,old.id), nvl(new.order_status,old.order_status), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.payment_way,old.payment_way), nvl(new.delivery_address,old.delivery_address), nvl(new.out_trade_no,old.out_trade_no), nvl(new.tracking_no,old.tracking_no), nvl(new.create_time,old.create_time), nvl(new.payment_time,old.payment_time), nvl(new.cancel_time,old.cancel_time), nvl(new.finish_time,old.finish_time), nvl(new.refund_time,old.refund_time), nvl(new.refund_finish_time,old.refund_finish_time), nvl(new.expire_time,old.expire_time), nvl(new.feight_fee,old.feight_fee), nvl(new.feight_fee_reduce,old.feight_fee_reduce), nvl(new.activity_reduce_amount,old.activity_reduce_amount), nvl(new.coupon_reduce_amount,old.coupon_reduce_amount), nvl(new.original_amount,old.original_amount), nvl(new.final_amount,old.final_amount), case when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd') when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15' when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd') when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select id, order_status, user_id, province_id, payment_way, delivery_address, out_trade_no, tracking_no, create_time, payment_time, cancel_time, finish_time, refund_time, refund_finish_time, expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from dwd_order_info where dt='9999-99-99' )old full outer join ( select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from ( select * from ods_order_info where dt='2020-06-15' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ods_order_status_log where dt='2020-06-15' group by order_id )times on oi.id=times.order_id )new on old.id=new.id; ``` #### **6.2.1** **0 DWD层** **业务数据首日装载脚本** ****1)编写脚本**** (1)在/home/seven/bin目录下创建脚本ods_to_dwd_db_init.sh \[seven@hadoop102 bin\]$ vim ods_to_dwd_db_init.sh 在脚本中填写如下内容 ```bash #!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dwd_order_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_info partition(dt) select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount, case when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd') when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7) when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd') when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select * from ${APP}.ods_order_info where dt='$do_date' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ${APP}.ods_order_status_log where dt='$do_date' group by order_id )times on oi.id=times.order_id;" dwd_order_detail=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_detail partition(dt) select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount, date_format(create_time,'yyyy-MM-dd') from ( select * from ${APP}.ods_order_detail where dt='$do_date' )od left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ${APP}.ods_order_detail_activity where dt='$do_date' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ${APP}.ods_order_detail_coupon where dt='$do_date' )odc on od.id=odc.order_detail_id;" dwd_payment_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_payment_info partition(dt) select pi.id, pi.order_id, pi.user_id, oi.province_id, pi.trade_no, pi.out_trade_no, pi.payment_type, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time, nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select * from ${APP}.ods_payment_info where dt='$do_date' )pi left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on pi.order_id=oi.id;" dwd_cart_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ${APP}.ods_cart_info where dt='$do_date';" dwd_comment_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_comment_info partition(dt) select id, user_id, sku_id, spu_id, order_id, appraise, create_time, date_format(create_time,'yyyy-MM-dd') from ${APP}.ods_comment_info where dt='$do_date'; " dwd_favor_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ${APP}.ods_favor_info where dt='$do_date';" dwd_coupon_use=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_coupon_use partition(dt) select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time, coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99') from ${APP}.ods_coupon_use where dt='$do_date';" dwd_order_refund_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_refund_info partition(dt) select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time, date_format(ri.create_time,'yyyy-MM-dd') from ( select * from ${APP}.ods_order_refund_info where dt='$do_date' )ri left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on ri.order_id=oi.id;" dwd_refund_payment=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_refund_payment partition(dt) select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time, nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ${APP}.ods_refund_payment where dt='$do_date' )rp left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on rp.order_id=oi.id;" case $1 in dwd_order_info ) hive -e "$dwd_order_info" ;; dwd_order_detail ) hive -e "$dwd_order_detail" ;; dwd_payment_info ) hive -e "$dwd_payment_info" ;; dwd_cart_info ) hive -e "$dwd_cart_info" ;; dwd_comment_info ) hive -e "$dwd_comment_info" ;; dwd_favor_info ) hive -e "$dwd_favor_info" ;; dwd_coupon_use ) hive -e "$dwd_coupon_use" ;; dwd_order_refund_info ) hive -e "$dwd_order_refund_info" ;; dwd_refund_payment ) hive -e "$dwd_refund_payment" ;; all ) hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment" ;; esac ``` (2)增加执行权限 \[seven@hadoop102 bin\]$ chmod +x ods_to_dwd_db_init.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ ods_to_dwd_db_init.sh all 2020-06-14 (2)查看数据是否导入成功 #### **6.2.1** **1 DWD层** **业务数据每日装载脚本** ****1**** ****)编写脚本**** (1)在/home/seven/bin目录下创建脚本ods_to_dwd_db.sh \[seven@hadoop102 bin\]$ vim ods_to_dwd_db.sh 在脚本中填写如下内容 ```bash #!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi # 假设某累积型快照事实表,某天所有的业务记录全部完成,则会导致9999-99-99分区的数据未被覆盖,从而导致数据重复,该函数根据9999-99-99分区的数据的末次修改时间判断其是否被覆盖了,如果未被覆盖,就手动清理 clear_data(){ current_date=`date +%F` current_date_timestamp=`date -d "$current_date" +%s` last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'` last_modified_date_timestamp=`date -d "$last_modified_date" +%s` if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; then echo "clear table $1 partition(dt=9999-99-99)" hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/* fi } dwd_order_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_order_info partition(dt) select nvl(new.id,old.id), nvl(new.order_status,old.order_status), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.payment_way,old.payment_way), nvl(new.delivery_address,old.delivery_address), nvl(new.out_trade_no,old.out_trade_no), nvl(new.tracking_no,old.tracking_no), nvl(new.create_time,old.create_time), nvl(new.payment_time,old.payment_time), nvl(new.cancel_time,old.cancel_time), nvl(new.finish_time,old.finish_time), nvl(new.refund_time,old.refund_time), nvl(new.refund_finish_time,old.refund_finish_time), nvl(new.expire_time,old.expire_time), nvl(new.feight_fee,old.feight_fee), nvl(new.feight_fee_reduce,old.feight_fee_reduce), nvl(new.activity_reduce_amount,old.activity_reduce_amount), nvl(new.coupon_reduce_amount,old.coupon_reduce_amount), nvl(new.original_amount,old.original_amount), nvl(new.final_amount,old.final_amount), case when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd') when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date' when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd') when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select id, order_status, user_id, province_id, payment_way, delivery_address, out_trade_no, tracking_no, create_time, payment_time, cancel_time, finish_time, refund_time, refund_finish_time, expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from ${APP}.dwd_order_info where dt='9999-99-99' )old full outer join ( select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from ( select * from ${APP}.ods_order_info where dt='$do_date' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ${APP}.ods_order_status_log where dt='$do_date' group by order_id )times on oi.id=times.order_id )new on old.id=new.id;" dwd_order_detail=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date') select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount from ( select * from ${APP}.ods_order_detail where dt='$do_date' )od left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ${APP}.ods_order_detail_activity where dt='$do_date' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ${APP}.ods_order_detail_coupon where dt='$do_date' )odc on od.id=odc.order_detail_id;" dwd_payment_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_payment_info partition(dt) select nvl(new.id,old.id), nvl(new.order_id,old.order_id), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.payment_amount,old.payment_amount), nvl(new.payment_status,old.payment_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, order_id, user_id, province_id, trade_no, out_trade_no, payment_type, payment_amount, payment_status, create_time, callback_time from ${APP}.dwd_payment_info where dt = '9999-99-99' )old full outer join ( select pi.id, pi.out_trade_no, pi.order_id, pi.user_id, oi.province_id, pi.payment_type, pi.trade_no, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time from ( select * from ${APP}.ods_payment_info where dt='$do_date' )pi left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on pi.order_id=oi.id )new on old.id=new.id;" dwd_cart_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ${APP}.ods_cart_info where dt='$do_date';" dwd_comment_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, order_id, appraise, create_time from ${APP}.ods_comment_info where dt='$do_date';" dwd_favor_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ${APP}.ods_favor_info where dt='$do_date';" dwd_coupon_use=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_coupon_use partition(dt) select nvl(new.id,old.id), nvl(new.coupon_id,old.coupon_id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.coupon_status,old.coupon_status), nvl(new.get_time,old.get_time), nvl(new.using_time,old.using_time), nvl(new.used_time,old.used_time), nvl(new.expire_time,old.expire_time), coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from ${APP}.dwd_coupon_use where dt='9999-99-99' )old full outer join ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from ${APP}.ods_coupon_use where dt='$do_date' )new on old.id=new.id;" dwd_order_refund_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date') select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time from ( select * from ${APP}.ods_order_refund_info where dt='$do_date' )ri left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on ri.order_id=oi.id;" dwd_refund_payment=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_refund_payment partition(dt) select nvl(new.id,old.id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.sku_id,old.sku_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.refund_amount,old.refund_amount), nvl(new.refund_status,old.refund_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from ${APP}.dwd_refund_payment where dt='9999-99-99' )old full outer join ( select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ${APP}.ods_refund_payment where dt='$do_date' )rp left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on rp.order_id=oi.id )new on old.id=new.id;" case $1 in dwd_order_info ) hive -e "$dwd_order_info" clear_data dwd_order_info ;; dwd_order_detail ) hive -e "$dwd_order_detail" ;; dwd_payment_info ) hive -e "$dwd_payment_info" clear_data dwd_payment_info ;; dwd_cart_info ) hive -e "$dwd_cart_info" ;; dwd_comment_info ) hive -e "$dwd_comment_info" ;; dwd_favor_info ) hive -e "$dwd_favor_info" ;; dwd_coupon_use ) hive -e "$dwd_coupon_use" clear_data dwd_coupon_use ;; dwd_order_refund_info ) hive -e "$dwd_order_refund_info" ;; dwd_refund_payment ) hive -e "$dwd_refund_payment" clear_data dwd_refund_payment ;; all ) hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment" clear_data dwd_order_info clear_data dwd_payment_info clear_data dwd_coupon_use clear_data dwd_refund_payment ;; esac ``` (2)增加脚本执行权限 \[seven@hadoop102 bin\]$ chmod 777 ods_to_dwd_db.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ ods_to_dwd_db.sh all 2020-06-14 (2)查看数据是否导入成功 ## **第** **7** **章 数仓** **搭建** **-** **DWS** **层** ### **7.1** **系统函数** #### **7.1** **.** **1** **nvl函数** 1)基本语法 NVL(表达式1,表达式2) 如果表达式1为空值,NVL返回值为表达式2的值,否则返回表达式1的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式1和表达式2的数据类型必须为同一个类型。 2)案例实操 hive (gmall)\> select nvl(1,0); 1 hive (gmall)\> select nvl(null,"hello"); hello #### **7.1** **.** **2** **日期处理函数** 1)date_format函数(根据格式整理日期) hive (gmall)\> select date_format('2020-06-14','yyyy-MM'); 2020-06 2)date_add函数(加减日期) hive (gmall)\> select date_add('2020-06-14',-1); 2020-06-13 hive (gmall)\> select date_add('2020-06-14',1); 2020-06-15 3)next_day函数 (1)取当前天的下一个周一 hive (gmall)\> select next_day('2020-06-14','MO'); 2020-06-15 说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday) (2)取当前周的周一 hive (gmall)\> select date_add(next_day('2020-06-14','MO'),-7); 2020-06-8 4)last_day函数(求当月最后一天日期) hive (gmall)\> select last_day('2020-06-14'); 2020-06-30 #### **7.1** **.** **3** **复杂数据类型定义** 1)map结构数据定义 map\ 2)array结构数据定义 array\ 3)struct结构数据定义 struct\ 4)struct和array嵌套定义 array\\> ### **7.2 DWS** **层** ![](https://i-blog.csdnimg.cn/direct/8f7d2a496a174596968b452d825573b4.png) ![](https://i-blog.csdnimg.cn/direct/6e5160f72de4447ca1b5cd7dd78d1f13.png) ![](https://i-blog.csdnimg.cn/direct/c5200ea178c944f1990ba095b6f5bc72.png) ![](https://i-blog.csdnimg.cn/direct/461673017c9e4a6cb779aa091c6ef778.png) #### **7.2.** **1** **访客主题** 1)建表语句 ```sql DROP TABLE IF EXISTS dws_visitor_action_daycount; CREATE EXTERNAL TABLE dws_visitor_action_daycount ( `mid_id` STRING COMMENT '设备id', `brand` STRING COMMENT '设备品牌', `model` STRING COMMENT '设备型号', `is_new` STRING COMMENT '是否首次访问', `channel` ARRAY COMMENT '渠道', `os` ARRAY COMMENT '操作系统', `area_code` ARRAY COMMENT '地区ID', `version_code` ARRAY COMMENT '应用版本', `visit_count` BIGINT COMMENT '访问次数', `page_stats` ARRAY> COMMENT '页面访问统计' ) COMMENT '每日设备行为表' PARTITIONED BY(`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_visitor_action_daycount' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)数据装载 ```sql insert overwrite table dws_visitor_action_daycount partition(dt='2020-06-14') select t1.mid_id, t1.brand, t1.model, t1.is_new, t1.channel, t1.os, t1.area_code, t1.version_code, t1.visit_count, t3.page_stats from ( select mid_id, brand, model, if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理 collect_set(channel) channel, collect_set(os) os, collect_set(area_code) area_code, collect_set(version_code) version_code, sum(if(last_page_id is null,1,0)) visit_count from dwd_page_log where dt='2020-06-14' and last_page_id is null group by mid_id,model,brand )t1 join ( select mid_id, brand, model, collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats from ( select mid_id, brand, model, page_id, count(*) page_count, sum(during_time) during_time from dwd_page_log where dt='2020-06-14' group by mid_id,model,brand,page_id )t2 group by mid_id,model,brand )t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model; ``` 3)查询加载结果 #### **7.2** **.2** **用户主题** 1)建表语句 ```sql DROP TABLE IF EXISTS dws_user_action_daycount; CREATE EXTERNAL TABLE dws_user_action_daycount ( `user_id` STRING COMMENT '用户id', `login_count` BIGINT COMMENT '登录次数', `cart_count` BIGINT COMMENT '加入购物车次数', `favor_count` BIGINT COMMENT '收藏次数', `order_count` BIGINT COMMENT '下单次数', `order_activity_count` BIGINT COMMENT '订单参与活动次数', `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(活动)', `order_coupon_count` BIGINT COMMENT '订单用券次数', `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(优惠券)', `order_original_amount` DECIMAL(16,2) COMMENT '订单单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '订单总金额', `payment_count` BIGINT COMMENT '支付次数', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `refund_order_count` BIGINT COMMENT '退单次数', `refund_order_num` BIGINT COMMENT '退单件数', `refund_order_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_payment_count` BIGINT COMMENT '退款次数', `refund_payment_num` BIGINT COMMENT '退款件数', `refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额', `coupon_get_count` BIGINT COMMENT '优惠券领取次数', `coupon_using_count` BIGINT COMMENT '优惠券使用(下单)次数', `coupon_used_count` BIGINT COMMENT '优惠券使用(支付)次数', `appraise_good_count` BIGINT COMMENT '好评数', `appraise_mid_count` BIGINT COMMENT '中评数', `appraise_bad_count` BIGINT COMMENT '差评数', `appraise_default_count` BIGINT COMMENT '默认评价数', `order_detail_stats` array> COMMENT '下单明细统计' ) COMMENT '每日用户行为' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_user_action_daycount/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)数据装载 (1)首日装载 ```sql with tmp_login as ( select dt, user_id, count(*) login_count from dwd_page_log where user_id is not null and last_page_id is null group by dt,user_id ), tmp_cf as ( select dt, user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where user_id is not null and action_id in ('cart_add','favor_add') group by dt,user_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, user_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),user_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount, callback_time from dwd_refund_payment )rp left join ( select user_id, order_id, sku_id, refund_num from dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.user_id ), tmp_coupon as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt, coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id, nvl(coupon_get_count,0) coupon_get_count, nvl(coupon_using_count,0) coupon_using_count, nvl(coupon_used_count,0) coupon_used_count from ( select date_format(get_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_get_count from dwd_coupon_use where get_time is not null group by user_id,date_format(get_time,'yyyy-MM-dd') )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_using_count from dwd_coupon_use where using_time is not null group by user_id,date_format(using_time,'yyyy-MM-dd') )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.user_id=coupon_using.user_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_used_count from dwd_coupon_use where used_time is not null group by user_id,date_format(used_time,'yyyy-MM-dd') )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_od as ( select dt, user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id )t1 group by dt,user_id ) insert overwrite table dws_user_action_daycount partition(dt) select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats, coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt) from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id and tmp_login.dt=tmp_cf.dt full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt; ``` (2)每日装载 ```sql with tmp_login as ( select user_id, count(*) login_count from dwd_page_log where dt='2020-06-15' and user_id is not null and last_page_id is null group by user_id ), tmp_cf as ( select user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where dt='2020-06-15' and user_id is not null and action_id in ('cart_add','favor_add') group by user_id ), tmp_order as ( select user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info where (dt='2020-06-15' or dt='9999-99-99') and date_format(create_time,'yyyy-MM-dd')='2020-06-15' group by user_id ), tmp_pay as ( select user_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info where dt='2020-06-15' group by user_id ), tmp_ri as ( select user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info where dt='2020-06-15' group by user_id ), tmp_rp as ( select rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount from dwd_refund_payment where dt='2020-06-15' )rp left join ( select user_id, order_id, sku_id, refund_num from dwd_order_refund_info where dt>=date_add('2020-06-15',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by rp.user_id ), tmp_coupon as ( select user_id, sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_using_count, sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_used_count from dwd_coupon_use where (dt='2020-06-15' or dt='9999-99-99') and (date_format(get_time, 'yyyy-MM-dd') = '2020-06-15' or date_format(using_time,'yyyy-MM-dd')='2020-06-15' or date_format(used_time,'yyyy-MM-dd')='2020-06-15') group by user_id ), tmp_comment as ( select user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info where dt='2020-06-15' group by user_id ), tmp_od as ( select user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from dwd_order_detail where dt='2020-06-15' group by user_id,sku_id )t1 group by user_id ) insert overwrite table dws_user_action_daycount partition(dt='2020-06-15') select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id; ``` 3)查询加载结果 #### **7.2.3** **商品主题** 1)建表语句 ```sql DROP TABLE IF EXISTS dws_sku_action_daycount; CREATE EXTERNAL TABLE dws_sku_action_daycount ( `sku_id` STRING COMMENT 'sku_id', `order_count` BIGINT COMMENT '被下单次数', `order_num` BIGINT COMMENT '被下单件数', `order_activity_count` BIGINT COMMENT '参与活动被下单次数', `order_coupon_count` BIGINT COMMENT '使用优惠券被下单次数', `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(活动)', `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(优惠券)', `order_original_amount` DECIMAL(16,2) COMMENT '被下单原价金额', `order_final_amount` DECIMAL(16,2) COMMENT '被下单最终金额', `payment_count` BIGINT COMMENT '被支付次数', `payment_num` BIGINT COMMENT '被支付件数', `payment_amount` DECIMAL(16,2) COMMENT '被支付金额', `refund_order_count` BIGINT COMMENT '被退单次数', `refund_order_num` BIGINT COMMENT '被退单件数', `refund_order_amount` DECIMAL(16,2) COMMENT '被退单金额', `refund_payment_count` BIGINT COMMENT '被退款次数', `refund_payment_num` BIGINT COMMENT '被退款件数', `refund_payment_amount` DECIMAL(16,2) COMMENT '被退款金额', `cart_count` BIGINT COMMENT '被加入购物车次数', `favor_count` BIGINT COMMENT '被收藏次数', `appraise_good_count` BIGINT COMMENT '好评数', `appraise_mid_count` BIGINT COMMENT '中评数', `appraise_bad_count` BIGINT COMMENT '差评数', `appraise_default_count` BIGINT COMMENT '默认评价数' ) COMMENT '每日商品行为' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_sku_action_daycount/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)数据装载 (1)首日装载 ```sql with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from dwd_order_detail od join ( select order_id, callback_time from dwd_payment_info where callback_time is not null )pi on pi.order_id=od.order_id group by date_format(callback_time,'yyyy-MM-dd'),sku_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount, callback_time from dwd_refund_payment )rp left join ( select order_id, sku_id, refund_num from dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id ), tmp_cf as ( select dt, item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where action_id in ('cart_add','favor_add') group by dt,item ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ) insert overwrite table dws_sku_action_daycount partition(dt) select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count), dt from ( select dt, sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by dt,sku_id; ``` (2)每日装载 ```sql with tmp_order as ( select sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where dt='2020-06-15' group by sku_id ), tmp_pay as ( select sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from dwd_order_detail where (dt='2020-06-15' or dt=date_add('2020-06-15',-1)) and order_id in ( select order_id from dwd_payment_info where dt='2020-06-15' ) group by sku_id ), tmp_ri as ( select sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info where dt='2020-06-15' group by sku_id ), tmp_rp as ( select rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount from dwd_refund_payment where dt='2020-06-15' )rp left join ( select order_id, sku_id, refund_num from dwd_order_refund_info where dt>=date_add('2020-06-15',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by rp.sku_id ), tmp_cf as ( select item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where dt='2020-06-15' and action_id in ('cart_add','favor_add') group by item ), tmp_comment as ( select sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info where dt='2020-06-15' group by sku_id ) insert overwrite table dws_sku_action_daycount partition(dt='2020-06-15') select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count) from ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by sku_id; ``` 3)查询加载结果 #### **7.2.4** **优惠券主题** 1)建表语句 ```sql DROP TABLE IF EXISTS dws_coupon_info_daycount; CREATE EXTERNAL TABLE dws_coupon_info_daycount( `coupon_id` STRING COMMENT '优惠券ID', `get_count` BIGINT COMMENT '被领取次数', `order_count` BIGINT COMMENT '被使用(下单)次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '用券下单优惠金额', `order_original_amount` DECIMAL(16,2) COMMENT '用券订单原价金额', `order_final_amount` DECIMAL(16,2) COMMENT '用券下单最终金额', `payment_count` BIGINT COMMENT '被使用(支付)次数', `payment_reduce_amount` DECIMAL(16,2) COMMENT '用券支付优惠金额', `payment_amount` DECIMAL(16,2) COMMENT '用券支付总金额', `expire_count` BIGINT COMMENT '过期次数' ) COMMENT '每日活动统计' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_coupon_info_daycount/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)数据装载 (1)首日装载 ```sql with tmp_cu as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt, coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id, nvl(get_count,0) get_count, nvl(order_count,0) order_count, nvl(payment_count,0) payment_count, nvl(expire_count,0) expire_count from ( select date_format(get_time,'yyyy-MM-dd') dt, coupon_id, count(*) get_count from dwd_coupon_use group by date_format(get_time,'yyyy-MM-dd'),coupon_id )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, coupon_id, count(*) order_count from dwd_coupon_use where using_time is not null group by date_format(using_time,'yyyy-MM-dd'),coupon_id )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.coupon_id=coupon_using.coupon_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, coupon_id, count(*) payment_count from dwd_coupon_use where used_time is not null group by date_format(used_time,'yyyy-MM-dd'),coupon_id )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_id full outer join ( select date_format(expire_time,'yyyy-MM-dd') dt, coupon_id, count(*) expire_count from dwd_coupon_use where expire_time is not null group by date_format(expire_time,'yyyy-MM-dd'),coupon_id )coupon_exprie on coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dt and coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where coupon_id is not null group by date_format(create_time,'yyyy-MM-dd'),coupon_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select order_id, coupon_id, split_coupon_amount, split_final_amount from dwd_order_detail where coupon_id is not null )od join ( select order_id, callback_time from dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),coupon_id ) insert overwrite table dws_coupon_info_daycount partition(dt) select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count), dt from ( select dt, coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select dt, coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select dt, coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by dt,coupon_id; ``` (2)每日装载 ```sql with tmp_cu as ( select coupon_id, sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) order_count, sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) payment_count, sum(if(date_format(expire_time,'yyyy-MM-dd')='2020-06-15',1,0)) expire_count from dwd_coupon_use where dt='9999-99-99' or dt='2020-06-15' group by coupon_id ), tmp_order as ( select coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where dt='2020-06-15' and coupon_id is not null group by coupon_id ), tmp_pay as ( select coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from dwd_order_detail where (dt='2020-06-15' or dt=date_add('2020-06-15',-1)) and coupon_id is not null and order_id in ( select order_id from dwd_payment_info where dt='2020-06-15' ) group by coupon_id ) insert overwrite table dws_coupon_info_daycount partition(dt='2020-06-15') select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count) from ( select coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by coupon_id; ``` 3)查询加载结果 #### **7.2.5** **活动主题** 1)建表语句 ```sql DROP TABLE IF EXISTS dws_activity_info_daycount; CREATE EXTERNAL TABLE dws_activity_info_daycount( `activity_rule_id` STRING COMMENT '活动规则ID', `activity_id` STRING COMMENT '活动ID', `order_count` BIGINT COMMENT '参与某活动某规则下单次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单减免金额', `order_original_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单最终金额', `payment_count` BIGINT COMMENT '参与某活动某规则支付次数', `payment_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付减免金额', `payment_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付金额' ) COMMENT '每日活动统计' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_activity_info_daycount/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)数据装载 (1)首日装载 ```sql with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where activity_id is not null group by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select activity_rule_id, activity_id, order_id, split_activity_amount, split_final_amount from dwd_order_detail where activity_id is not null )od join ( select order_id, callback_time from dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id ) insert overwrite table dws_activity_info_daycount partition(dt) select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), dt from ( select dt, activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select dt, activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by dt,activity_rule_id,activity_id; ``` (2)每日装载 ```sql with tmp_order as ( select activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where dt='2020-06-15' and activity_id is not null group by activity_rule_id,activity_id ), tmp_pay as ( select activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from dwd_order_detail where (dt='2020-06-15' or dt=date_add('2020-06-15',-1)) and activity_id is not null and order_id in ( select order_id from dwd_payment_info where dt='2020-06-15' ) group by activity_rule_id,activity_id ) insert overwrite table dws_activity_info_daycount partition(dt='2020-06-15') select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount) from ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by activity_rule_id,activity_id; ``` 3)查询加载结果 #### **7.2.6** **地区主题** 1)建表语句 ```sql DROP TABLE IF EXISTS dws_area_stats_daycount; CREATE EXTERNAL TABLE dws_area_stats_daycount( `province_id` STRING COMMENT '地区编号', `visit_count` BIGINT COMMENT '访问次数', `login_count` BIGINT COMMENT '登录次数', `visitor_count` BIGINT COMMENT '访客人数', `user_count` BIGINT COMMENT '用户人数', `order_count` BIGINT COMMENT '下单次数', `order_original_amount` DECIMAL(16,2) COMMENT '下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '下单最终金额', `payment_count` BIGINT COMMENT '支付次数', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `refund_order_count` BIGINT COMMENT '退单次数', `refund_order_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_payment_count` BIGINT COMMENT '退款次数', `refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额' ) COMMENT '每日地区统计表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_area_stats_daycount/' TBLPROPERTIES ("parquet.compression"="lzo"); ``` 2)数据装载 (1)首日装载 ```sql with tmp_vu as ( select dt, id province_id, visit_count, login_count, visitor_count, user_count from ( select dt, area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from dwd_page_log where last_page_id is null group by dt,area_code )tmp left join dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),province_id ), tmp_ro as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from dwd_refund_payment group by date_format(callback_time,'yyyy-MM-dd'),province_id ) insert overwrite table dws_area_stats_daycount partition(dt) select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount), dt from ( select dt, province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by dt,province_id; ``` (2)每日装载 ```sql with tmp_vu as ( select id province_id, visit_count, login_count, visitor_count, user_count from ( select area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from dwd_page_log where dt='2020-06-15' and last_page_id is null group by area_code )tmp left join dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info where dt='2020-06-15' or dt='9999-99-99' and date_format(create_time,'yyyy-MM-dd')='2020-06-15' group by province_id ), tmp_pay as ( select province_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info where dt='2020-06-15' group by province_id ), tmp_ro as ( select province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from dwd_order_refund_info where dt='2020-06-15' group by province_id ), tmp_rp as ( select province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from dwd_refund_payment where dt='2020-06-15' group by province_id ) insert overwrite table dws_area_stats_daycount partition(dt='2020-06-15') select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount) from ( select province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by province_id; ``` 3)查询加载结果 #### **7.2.7 DWS层** **首日数据装载脚本** ****1)编写脚本**** (1)在/home/seven/bin目录下创建脚本dwd_to_dws_init.sh ```sql #!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dws_visitor_action_daycount=" insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date') select t1.mid_id, t1.brand, t1.model, t1.is_new, t1.channel, t1.os, t1.area_code, t1.version_code, t1.visit_count, t3.page_stats from ( select mid_id, brand, model, if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理 collect_set(channel) channel, collect_set(os) os, collect_set(area_code) area_code, collect_set(version_code) version_code, sum(if(last_page_id is null,1,0)) visit_count from ${APP}.dwd_page_log where dt='$do_date' and last_page_id is null group by mid_id,model,brand )t1 join ( select mid_id, brand, model, collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats from ( select mid_id, brand, model, page_id, count(*) page_count, sum(during_time) during_time from ${APP}.dwd_page_log where dt='$do_date' group by mid_id,model,brand,page_id )t2 group by mid_id,model,brand )t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model; " dws_area_stats_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_vu as ( select dt, id province_id, visit_count, login_count, visitor_count, user_count from ( select dt, area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from ${APP}.dwd_page_log where last_page_id is null group by dt,area_code )tmp left join ${APP}.dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),province_id ), tmp_ro as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from ${APP}.dwd_refund_payment group by date_format(callback_time,'yyyy-MM-dd'),province_id ) insert overwrite table ${APP}.dws_area_stats_daycount partition(dt) select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount), dt from ( select dt, province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by dt,province_id; " dws_user_action_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_login as ( select dt, user_id, count(*) login_count from ${APP}.dwd_page_log where user_id is not null and last_page_id is null group by dt,user_id ), tmp_cf as ( select dt, user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where user_id is not null and action_id in ('cart_add','favor_add') group by dt,user_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, user_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),user_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount, callback_time from ${APP}.dwd_refund_payment )rp left join ( select user_id, order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.user_id ), tmp_coupon as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt, coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id, nvl(coupon_get_count,0) coupon_get_count, nvl(coupon_using_count,0) coupon_using_count, nvl(coupon_used_count,0) coupon_used_count from ( select date_format(get_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_get_count from ${APP}.dwd_coupon_use where get_time is not null group by user_id,date_format(get_time,'yyyy-MM-dd') )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_using_count from ${APP}.dwd_coupon_use where using_time is not null group by user_id,date_format(using_time,'yyyy-MM-dd') )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.user_id=coupon_using.user_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_used_count from ${APP}.dwd_coupon_use where used_time is not null group by user_id,date_format(used_time,'yyyy-MM-dd') )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_od as ( select dt, user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from ${APP}.dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id )t1 group by dt,user_id ) insert overwrite table ${APP}.dws_user_action_daycount partition(dt) select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats, coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt) from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id and tmp_login.dt=tmp_cf.dt full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt; " dws_activity_info_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where activity_id is not null group by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select activity_rule_id, activity_id, order_id, split_activity_amount, split_final_amount from ${APP}.dwd_order_detail where activity_id is not null )od join ( select order_id, callback_time from ${APP}.dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id ) insert overwrite table ${APP}.dws_activity_info_daycount partition(dt) select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), dt from ( select dt, activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select dt, activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by dt,activity_rule_id,activity_id;" dws_sku_action_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail od join ( select order_id, callback_time from ${APP}.dwd_payment_info where callback_time is not null )pi on pi.order_id=od.order_id group by date_format(callback_time,'yyyy-MM-dd'),sku_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount, callback_time from ${APP}.dwd_refund_payment )rp left join ( select order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id ), tmp_cf as ( select dt, item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where action_id in ('cart_add','favor_add') group by dt,item ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ) insert overwrite table ${APP}.dws_sku_action_daycount partition(dt) select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count), dt from ( select dt, sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by dt,sku_id;" dws_coupon_info_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_cu as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt, coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id, nvl(get_count,0) get_count, nvl(order_count,0) order_count, nvl(payment_count,0) payment_count, nvl(expire_count,0) expire_count from ( select date_format(get_time,'yyyy-MM-dd') dt, coupon_id, count(*) get_count from ${APP}.dwd_coupon_use group by date_format(get_time,'yyyy-MM-dd'),coupon_id )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, coupon_id, count(*) order_count from ${APP}.dwd_coupon_use where using_time is not null group by date_format(using_time,'yyyy-MM-dd'),coupon_id )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.coupon_id=coupon_using.coupon_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, coupon_id, count(*) payment_count from ${APP}.dwd_coupon_use where used_time is not null group by date_format(used_time,'yyyy-MM-dd'),coupon_id )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_id full outer join ( select date_format(expire_time,'yyyy-MM-dd') dt, coupon_id, count(*) expire_count from ${APP}.dwd_coupon_use where expire_time is not null group by date_format(expire_time,'yyyy-MM-dd'),coupon_id )coupon_exprie on coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dt and coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where coupon_id is not null group by date_format(create_time,'yyyy-MM-dd'),coupon_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select order_id, coupon_id, split_coupon_amount, split_final_amount from ${APP}.dwd_order_detail where coupon_id is not null )od join ( select order_id, callback_time from ${APP}.dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),coupon_id ) insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt) select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count), dt from ( select dt, coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select dt, coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select dt, coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by dt,coupon_id; " case $1 in "dws_visitor_action_daycount" ) hive -e "$dws_visitor_action_daycount" ;; "dws_user_action_daycount" ) hive -e "$dws_user_action_daycount" ;; "dws_activity_info_daycount" ) hive -e "$dws_activity_info_daycount" ;; "dws_area_stats_daycount" ) hive -e "$dws_area_stats_daycount" ;; "dws_sku_action_daycount" ) hive -e "$dws_sku_action_daycount" ;; "dws_coupon_info_daycount" ) hive -e "$dws_coupon_info_daycount" ;; "all" ) hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount" ;; esac ``` (2)增加执行权限 \[seven@hadoop102 bin\]$ chmod +x dwd_to_dws_init.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ dwd_to_dws_init.sh all 2020-06-14 (2)查看数据是否导入成功 #### **7.2.8 DWS层** **每日数据装载脚本** 1)编写脚本 (1)在/home/seven/bin目录下创建脚本dwd_to_dws.sh ```sql #!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi dws_visitor_action_daycount="insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date') select t1.mid_id, t1.brand, t1.model, t1.is_new, t1.channel, t1.os, t1.area_code, t1.version_code, t1.visit_count, t3.page_stats from ( select mid_id, brand, model, if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理 collect_set(channel) channel, collect_set(os) os, collect_set(area_code) area_code, collect_set(version_code) version_code, sum(if(last_page_id is null,1,0)) visit_count from ${APP}.dwd_page_log where dt='$do_date' and last_page_id is null group by mid_id,model,brand )t1 join ( select mid_id, brand, model, collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats from ( select mid_id, brand, model, page_id, count(*) page_count, sum(during_time) during_time from ${APP}.dwd_page_log where dt='$do_date' group by mid_id,model,brand,page_id )t2 group by mid_id,model,brand )t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model;" dws_user_action_daycount=" with tmp_login as ( select user_id, count(*) login_count from ${APP}.dwd_page_log where dt='$do_date' and user_id is not null and last_page_id is null group by user_id ), tmp_cf as ( select user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where dt='$do_date' and user_id is not null and action_id in ('cart_add','favor_add') group by user_id ), tmp_order as ( select user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info where (dt='$do_date' or dt='9999-99-99') and date_format(create_time,'yyyy-MM-dd')='$do_date' group by user_id ), tmp_pay as ( select user_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info where dt='$do_date' group by user_id ), tmp_ri as ( select user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info where dt='$do_date' group by user_id ), tmp_rp as ( select rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount from ${APP}.dwd_refund_payment where dt='$do_date' )rp left join ( select user_id, order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info where dt>=date_add('$do_date',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by rp.user_id ), tmp_coupon as ( select user_id, sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_using_count, sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_used_count from ${APP}.dwd_coupon_use where (dt='$do_date' or dt='9999-99-99') and (date_format(get_time, 'yyyy-MM-dd') = '$do_date' or date_format(using_time,'yyyy-MM-dd')='$do_date' or date_format(used_time,'yyyy-MM-dd')='$do_date') group by user_id ), tmp_comment as ( select user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info where dt='$do_date' group by user_id ), tmp_od as ( select user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from ${APP}.dwd_order_detail where dt='$do_date' group by user_id,sku_id )t1 group by user_id ) insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date') select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id; " dws_activity_info_daycount=" with tmp_order as ( select activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where dt='$do_date' and activity_id is not null group by activity_rule_id,activity_id ), tmp_pay as ( select activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail where (dt='$do_date' or dt=date_add('$do_date',-1)) and activity_id is not null and order_id in ( select order_id from ${APP}.dwd_payment_info where dt='$do_date' ) group by activity_rule_id,activity_id ) insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date') select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount) from ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by activity_rule_id,activity_id;" dws_sku_action_daycount=" with tmp_order as ( select sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where dt='$do_date' group by sku_id ), tmp_pay as ( select sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail where (dt='$do_date' or dt=date_add('$do_date',-1)) and order_id in ( select order_id from ${APP}.dwd_payment_info where dt='$do_date' ) group by sku_id ), tmp_ri as ( select sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info where dt='$do_date' group by sku_id ), tmp_rp as ( select rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount from ${APP}.dwd_refund_payment where dt='$do_date' )rp left join ( select order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info where dt>=date_add('$do_date',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by rp.sku_id ), tmp_cf as ( select item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where dt='$do_date' and action_id in ('cart_add','favor_add') group by item ), tmp_comment as ( select sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info where dt='$do_date' group by sku_id ) insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date') select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count) from ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by sku_id;" dws_coupon_info_daycount=" with tmp_cu as ( select coupon_id, sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) order_count, sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count, sum(if(date_format(expire_time,'yyyy-MM-dd')='$do_date',1,0)) expire_count from ${APP}.dwd_coupon_use where dt='9999-99-99' or dt='$do_date' group by coupon_id ), tmp_order as ( select coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where dt='$do_date' and coupon_id is not null group by coupon_id ), tmp_pay as ( select coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail where (dt='$do_date' or dt=date_add('$do_date',-1)) and coupon_id is not null and order_id in ( select order_id from ${APP}.dwd_payment_info where dt='$do_date' ) group by coupon_id ) insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt='$do_date') select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count) from ( select coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by coupon_id;" dws_area_stats_daycount=" with tmp_vu as ( select id province_id, visit_count, login_count, visitor_count, user_count from ( select area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from ${APP}.dwd_page_log where dt='$do_date' and last_page_id is null group by area_code )tmp left join ${APP}.dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info where dt='$do_date' or dt='9999-99-99' and date_format(create_time,'yyyy-MM-dd')='$do_date' group by province_id ), tmp_pay as ( select province_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info where dt='$do_date' group by province_id ), tmp_ro as ( select province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info where dt='$do_date' group by province_id ), tmp_rp as ( select province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from ${APP}.dwd_refund_payment where dt='$do_date' group by province_id ) insert overwrite table ${APP}.dws_area_stats_daycount partition(dt='$do_date') select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount) from ( select province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by province_id;" case $1 in "dws_visitor_action_daycount" ) hive -e "$dws_visitor_action_daycount" ;; "dws_user_action_daycount" ) hive -e "$dws_user_action_daycount" ;; "dws_activity_info_daycount" ) hive -e "$dws_activity_info_daycount" ;; "dws_area_stats_daycount" ) hive -e "$dws_area_stats_daycount" ;; "dws_sku_action_daycount" ) hive -e "$dws_sku_action_daycount" ;; "dws_coupon_info_daycount" ) hive -e "$dws_coupon_info_daycount" ;; "all" ) hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount" ;; esac ``` (2)增加执行权限 \[seven@hadoop102 bin\]$ chmod +x dwd_to_dws.sh ****2)脚本使用**** (1)执行脚本 \[seven@hadoop102 bin\]$ dwd_to_dws.sh all 2020-06-14 (2)查看数据是否导入成功

相关推荐
无级程序员5 小时前
hive2服务启动报错:/tmp/hive on HDFS should be writable(不是chmod 777能解决的)
hive·hadoop·hdfs
rui锐rui7 小时前
大数据学习2:HIve
大数据·hive·学习
凌辰揽月11 小时前
Servlet学习
hive·学习·servlet
weixin_307779131 天前
Hive集群之间迁移的Linux Shell脚本
大数据·linux·hive·bash·迁移学习
王小王-1231 天前
基于Hadoop的公共自行车数据分布式存储和计算平台的设计与实现
大数据·hive·hadoop·分布式·hadoop公共自行车·共享单车大数据分析·hadoop共享单车
王小王-1231 天前
基于Hadoop的大规模文本词频统计分析系统设计与实现
hadoop·mapreduce·hadoop词频统计·hadoop文本统计·mapreduce词频统计
陈敬雷-充电了么-CEO兼CTO1 天前
推荐算法系统系列>推荐数据仓库集市的ETL数据处理
大数据·数据库·数据仓库·数据挖掘·数据分析·etl·推荐算法
桂成林1 天前
Hive UDF 开发实战:MD5 哈希函数实现
hive·hadoop·哈希算法
isNotNullX1 天前
什么是数据分析?常见方法全解析
大数据·数据库·数据仓库·人工智能·数据分析
王小王-1232 天前
基于Hadoop的京东厨具商品数据分析及商品价格预测系统的设计与实现
hadoop·数据分析·京东厨具·厨具分析·商品分析