源系统每日上传一个csv数据文件到数据中台指定目录,数据中台用hive表进行ETL工作。
先建一个外部分区表:
sql
create external table tmp_lease_contract
(
contract_id string,
vin string,
amount float
)
partitioned by (dt string)
row format delimited
fields terminated by ","
stored as textfile
TBLPROPERTIES ('skip.header.line.count'='1')
location "/dmp/tmp/sales/lease_contract";
每日数据按命名规则存放到相应的./dt=20250718这样的子目录,再加一下分区信息:
sql
alter table tmp_lease_contract add if not exists partition(dt='20250718');
select * from tmp_lease_contract where dt='20250718'
目录示例如下:
/dmp/tmp/sales/lease_contract/
|-- dt=20250716
| |-- lease_contract_20250716.csv
|-- dt=20250715
| |-- lease_contract_20250715.csv