Hive的基本操作（创建与修改）

必备知识

数据类型

基本类型

类型	写法
字符	char, varchar, string✔
整数	tinyint, smallint, int✔, bigint✔
小数	float, double, numeric(m,n), decimal(m,n)✔
布尔值	boolean✔
时间	date✔, timestamp✔

复杂类型(集合类型)

复制代码

1、数组：array<T>	
	面向用户提供的原始数据做结构化映射
	样例： [] / |156,1778,42,138|	=> 描述同一个维度数据
	
2、键值对：map<K,V>		
	样例： |LogicJava:88,mysql:89|
	
3、结构体：struct<name1:value1,name2:value2,....>
	样例： 类json格式【以{}开头结尾，且结构稳定】 => 结构化数据

【创建】表操作

一：hive建表【基本语法】

语法组成

复制代码

组成一：建表 = 基本格式 + 行格式 + 额外处理
组成二：上传数据

*基本格式

sql 复制代码

create table if not exists TABLE_NAME(
    FIELD_NAME DATA_TYPE,
    FIELD_NAME DATA_TYPE,
    ....
)[comment '描述备注']

*行格式

形式一：row format delimited

sql 复制代码

1、应用场景：面向文本，非结构化与半结构化数据

2、模拟数据：
	123,张三,16853210211116,true,26238.5,阅读;跑步;唱歌,java:98;mysql:54,province:南京;city:江宁
	
3、案例演示：
	create table if not exists TABLE_NAME(
		id int,
		name string,
		time bigint,
		isPartyMember boolean,
		hobby array<string>,
		scores map<string,int>,
		address struct<province:string,city:string>
	)
	row format delimited
	fields terminated by ','
	collection items terminated by ';'
	map keys terminated by ':'
	lines terminated by '\n'
	
4、讲解：
	fields terminated by ','				列分隔符【字段: id,name...】
	collection items terminated by ';'		集合项内部间的分隔符
	map keys terminated by ':'				键值对[map]分隔符
	lines terminated by '\n'				行分隔符【默认，一般可以省略】

形式二：row format serde 'CLASS_PATH'

sql 复制代码

1、应用场景：面向结构化数据，即：结构清晰的数据

2、CLASS_PATH有以下几种选择：
	选择一：CSV【简单类型】
		数据呈现：
			"1","2","Football"
			"2","2","Soccer"
			"3","2","Baseball & Softball"
        代码：
        	create table if not exists TABLE_NAME(
                id string,
                page string,
                word string
            )
         	row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
         	with serdeproperties(
                'separatorChar'=',',
                'quoteChar'='"',
                'escapeChar'='\\'
            )
            
	选择二：regex【正则】
		数据呈现：
			123,张三,16853210211116,true,26238.5,阅读;跑步;唱歌,java:98;mysql:54,province:南京;city:江宁
		代码：
			create table if not exists TABLE_NAME(
                id int,
                name string,
                time bigint,
                isPartyMember boolean,
                hobby array<string>,
                scores map<string,int>,
                address struct<province:string,city:string>
            )
            row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
			with serdeproperties(
            	'input.regex'='^(//d+),(.*?),(//d+),(true|false),(\\d+\\.?\\d+?)$'
            )
            
	选择三：JsonSerDe
		数据呈现：
			{"name":"henry","age":22,"gender":"male","phone":"18014499655"}
		代码：
			create table if not exists json(
                name string,
                age int,
                gender string,
                phone string
            )
            row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'

*额外处理

sql 复制代码

1、store【存储】
	基本语法：stored as '存储格式'
		存储格式：textfile✔,orc,parquet,sequencefile,...
	案例：
		stored as textfile
		
2、tblproperties【表属性】(通用)：
	案例【实际情况具体分析】：
		tblproperties(
			'skip.header.line.count'='1'			【跳过表头，即：第一行】
			...
		)

*上传数据入表

sql 复制代码

方法一【不建议用】：
	hdfs dfs -put employee.txt /hive312/warehouse/yb12211.db/inner_table_employee
	
方法二【有校验过程】：✔
	需知：
    	local ：表示数据在虚拟机本地
        	缺少local ：表示数据在hdfs上
		overload ：覆盖
			缺少overload ：追加
	第一种【本地虚拟机】：
		load data local inpath '/root/file/employee.txt'
		overwrite into table yb12211.inner_table_employee;
	第二种【hdfs】：
		load data inpath '/hive_data/hive_cha01/employee/employee.txt'
		overwrite into table yb12211.inner_table_employee;

方法三【只用于【外部表】】：✔
	基本格式：location 'hdfs中存放文件的【目录】的路径'			外部挂载

针对性实践操作

sql 复制代码

案例一：
	/*
	1|henry|1.81|1995-03-18|江苏,南京,玄武,北京东路68号|logicjava:88,javaoop:76,mysql:80,ssm:82|beauty,money,joke
	2|arill|1.59|1996-7-30|安徽,芜湖,南山,西湖东路68号|logicjava:79,javaoop:58,mysql:65,ssm:85|beauty,power,sleeping
	3|mary|1.72|1995-09-02|山东,青岛,长虹,天山东路68
	*/
	drop table if exists students;
	create table if not exists students(
		number int,
		name string,
		height decimal(3,2),
		birthday date,
		house struct<province:string,city:string,district:string,street:string>,
		scores map<string,int>,
		hobby array<string>
	)
	row format delimited
	fields terminated by "|"
	collection items terminated by ","
	map keys terminated by ":"
	stored as textfile;
	
	load data inpath '/zhou/students.txt'
	overwrite into table zhou.students;

案例二：
	/*
	user_id,auction_id,cat_id,cat1,property,buy_mount,day
786295544,41098319944,50014866,50022520,21458:86755362;13023209:3593274;10984217:21985;122217965:3227750;21477:28695579;22061:30912;122217803:3230095,2,123434123
	*/
	drop table if exists sam_mum_baby_trade;
	create external table if not exists sam_mum_baby_trade(
		user_id bigint,
		auction_id bigint,
		cat_id bigint,
		cat1 bigint,
		property map<bigint,bigint>,
		buy_mount int,
		day bigint
	)
	row format delimited
	fields terminated by ","
	collection items terminated by ";"
	map keys terminated by ":"
		stored as textfile
	tblproperties (
		'skip.header.line.count'='1'
		);
	load data inpath '/zhou/sam_mum_baby_trade.csv'
	into table zhou.sam_mum_baby_trade;

案例三:
	/*
	"1","2","Football"
	"2","2","Soccer"
	"3","2","Baseball & Softball"
	*/
	drop table if exists categories;
	create table if not exists categories(
		id string,
		page string,
		word string
	)
	row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
	with serdeproperties(
		'separatorChar'=',',
		'quoteChar'='"',
		'escapeChar'='\\'
	)
	stored as textfile;
	load data inpath '/zhou/categories.csv'
	overwrite into table zhou.categories;
	select * from categories;

案例四：
	/*
	{"name":"henry","age":22,"gender":"male","phone":"18014499655"}
	*/
	//Json
	drop table if exists json;
	create table if not exists json(
		name string,
		age int,
		gender string,
		phone string
	)
	row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
	stored as textfile;
	load data inpath '/zhou/json.log'
	overwrite into table zhou.json;

案例五：
	/*
	125;男;2015-9-7 1:52:22;1521.84
	883;男;2014-9-18 5:24:42;6391.45
	652;女;2014-5-4 5:56:45;9603.79
	*/
	create external table if not exists test1w(
		user_id int,
		user_gender string,
		order_time timestamp,
		order_amount decimal(6,2)
	)
	row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
	with serdeproperties(
		'input.regex'='(\\d+);(.*?);(\\d{4}-\\d{1,2}-\\d{1,2} \\d{1,2}:\\d{1,2}:\\d{1,2});(\\d+\.?\\d+?)'
	)
	stored as textfile
	location '/zhou/test1w';
	select * from test1w;

二：hive建表【高阶语法】

1：CTAS

复制代码

【本质】：在原有表的基础上查询并创建新表
基本语法：
	create table if not exists NEW_TABLE_NAME as select ... from OLD_TABLE_NAME ...
案例：
	原有的表：hive_ext_regex_test1w
	语句：
    	create table if not exists hive_ext_test_before2015 as
    	select * from hive_ext_regex_test1w
    	where year(order_time)<=2015;

2：CTE

sql 复制代码

【本质】：对表进行层层筛选，最终形成新表
基本语法：
	as with....select...
案例：
	场景：2015年之前的所有数据 以及 2015年之后男性5个以上订单数或5w以上订单总额的订单数据。
	原有的表：hive_ext_regex_test1w
	语句：
		create table hive_test_before2015_and_male_over5or5w as
		with
		before2015 as (
			select * from hive_ext_regex_test1w
			where year(order_time)<=2015
		),
		agg_male_over5or5w as (
			select user_id
			from hive_ext_regex_test1w
			where year(order_time)>2015 and user_gender='男'
			group by user_id
			having count(*)>=5 or sum(order_amount)>=50000
		),
		male_over5or5w as (
			select A.*
			from  hive_ext_regex_test1w A
			inner join agg_male_over5or5w B
			on year(A.order_time)>2015 and A.user_id=B.user_id
		)
		select * from before2015
		union all							【注意：union all => 将表并在一起且不去重】
		select * from male_over5or5w;

3：CTL

sql 复制代码

【本质】：复制原表的表结构
基本语法：
	create table NEW_TABLE_NAME like OLD_TABLE_NAME;
案例：
	create table hive_test1w_like like hive_ext_regex_test1w;

【修改】表操作

提前需知

复制代码

1、查看表字段基本信息：
		desc 表名;

2、查看表字段详细信息：
		desc formatted 表名;	=> 由此可查看表中可修改的属性
		
3、查看建表流程：
		show create 表名;

基本语法

复制代码

alter table TABLE_NAME
	rename to NEW_NAME;
	set tblproperties('key'='value')		-- 修改表属性：包括各种分隔符,SerDe,...
	ser fileformat FORMAT;					-- 修改文件格式
	change old_name new_name TYPE;			-- 修改字段名
	column(field_name TYPE)					-- 添加列