大数据技术——实战项目：广告数仓（第五部分）

[第9章广告数仓DIM层](#第9章广告数仓DIM层)

[9.1 广告信息维度表](#9.1 广告信息维度表)

[9.2 平台信息维度表](#9.2 平台信息维度表)

[9.3 数据装载脚本](#9.3 数据装载脚本)

[第10章广告数仓DWD层](#第10章广告数仓DWD层)

[10.1 广告事件事实表](#10.1 广告事件事实表)

[10.1.1 建表语句](#10.1.1 建表语句)

[10.1.2 数据装载](#10.1.2 数据装载)

[10.1.2.1 初步解析日志](#10.1.2.1 初步解析日志)

[10.1.2.2 解析IP和UA](#10.1.2.2 解析IP和UA)

[10.1.2.3 标注无效流量](#10.1.2.3 标注无效流量)

[10.2 数据装载脚本](#10.2 数据装载脚本)

第9章广告数仓DIM层

DIM层设计要点：

（1）DIM层的设计依据是维度建模理论，该层存储维度模型的维度表。

（2）DIM层的数据存储格式为orc列式存储+snappy压缩。

（3）DIM层表名的命名规范为dim_表名_全量表或者拉链表标识（full/zip）。

9.1 广告信息维度表

1 ）建表语句

sql 复制代码

drop table if exists dim_ads_info_full;
create external table if not exists dim_ads_info_full
(
    ad_id         string comment '广告id',
    ad_name       string comment '广告名称',
    product_id    string comment '广告产品id',
    product_name  string comment '广告产品名称',
    product_price decimal(16, 2) comment '广告产品价格',
    material_id   string comment '素材id',
    material_url  string comment '物料地址',
    group_id      string comment '广告组id'
) PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/ad/dim/dim_ads_info_full'
    TBLPROPERTIES ('orc.compress' = 'snappy');

2 ）加载数据

sql 复制代码

insert overwrite table dim_ads_info_full partition (dt='2023-01-07')
select
    ad.id,
    ad_name,
    product_id,
    name,
    price,
    material_id,
    material_url,
    group_id
from
(
    select
        id,
        ad_name,
        product_id,
        material_id,
        group_id,
        material_url
    from ods_ads_info_full
    where dt = '2023-01-07'
) ad
left join
(
    select
        id,
        name,
        price
    from ods_product_info_full
    where dt = '2023-01-07'
) pro
on ad.product_id = pro.id;

9.2 平台信息维度表

1 ）建表语句

sql 复制代码

drop table if exists dim_platform_info_full;
create external table if not exists dim_platform_info_full
(
    id               STRING comment '平台id',
    platform_name_en STRING comment '平台名称(英文)',
    platform_name_zh STRING comment '平台名称(中文)'
) PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/ad/dim/dim_platform_info_full'
    TBLPROPERTIES ('orc.compress' = 'snappy');

2 ）加载数据

sql 复制代码

insert overwrite table dim_platform_info_full partition (dt = '2023-01-07')
select
    id,
    platform_name_en,
    platform_name_zh
from ods_platform_info_full
where dt = '2023-01-07';

9.3 数据装载脚本

1 ）在 hadoop102 的 /home/atguigu/bin 目录下创建 ad_ods_to_dim.sh

大数据技术——实战项目：广告数仓（第五部分）

第9章 广告数仓DIM层

9.1 广告信息维度表

9.2 平台信息维度表

9.3 数据装载脚本

第9章广告数仓DIM层