Hive Serde

Hive Serde

目的:

​ Hive Serde用来做序列化和反序列化,构建在数据存储和执行引擎之间,对两者实现解耦。

应用场景:

​ 1、hive主要用来存储结构化数据,如果结构化数据存储的格式嵌套比较复杂的时候,可以使用serde的方式,利用正则表达式匹配的方法来读取数据,例如,表字段如下:id,name,map<string,array<map<string,string>>>

​ 2、当读取数据的时候,数据的某些特殊格式不希望显示在数据中,如:

192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -

不希望数据显示的时候包含[]或者"",此时可以考虑使用serde的方式

语法规则:

复制代码
		row_format
		: DELIMITED 
          [FIELDS TERMINATED BY char [ESCAPED BY char]] 
          [COLLECTION ITEMS TERMINATED BY char] 
          [MAP KEYS TERMINATED BY char] 
          [LINES TERMINATED BY char] 
		: SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, 										            property_name=property_value, ...)]

应用案例:

​ 1、数据文件

复制代码
192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -

​ 2、基本操作:

sql 复制代码
--创建表
	CREATE TABLE logtbl (
	    host STRING,
	    identity STRING,
	    t_user STRING,
	    time STRING,
	    request STRING,
	    referer STRING,
	    agent STRING)
	  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
	  WITH SERDEPROPERTIES (
	    "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-		9]*)"
	  )
  STORED AS TEXTFILE;
--加载数据
	load data local inpath '/root/data/log' into table logtbl;
--查询操作
	select * from logtbl;
--数据显示如下(不包含[]和")
192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-upper.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-nav.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /asf-logo.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-button.png HTTP/1.1	304	-
192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-middle.png HTTP/1.1	304	-
	
相关推荐
weixin_3077791324 分钟前
Clickhouse统计指定表中各字段的空值、空字符串或零值比例
运维·数据仓库·clickhouse
viperrrrrrrrrr76 小时前
大数据学习(132)-HIve数据分析
大数据·hive·学习
社恐码农18 小时前
Hive开窗函数的进阶SQL案例
hive·hadoop·sql
Leo.yuan19 小时前
数据湖是什么?数据湖和数据仓库的区别是什么?
大数据·运维·数据仓库·人工智能·信息可视化
weixin_307779131 天前
Linux下GCC和C++实现统计Clickhouse数据仓库指定表中各字段的空值、空字符串或零值比例
linux·运维·c++·数据仓库·clickhouse
RestCloud2 天前
如何通过ETLCloud实现跨系统数据同步?
数据库·数据仓库·mysql·etl·数据处理·数据同步·集成平台
行云流水行云流水2 天前
数据库、数据仓库、数据中台、数据湖相关概念
数据库·数据仓库
IvanCodes2 天前
七、Sqoop Job:简化与自动化数据迁移任务及免密执行
大数据·数据库·hadoop·sqoop
冬至喵喵2 天前
【hive】函数集锦:窗口函数、列转行、日期函数
大数据·数据仓库·hive·hadoop
Theodore_10222 天前
大数据(2) 大数据处理架构Hadoop
大数据·服务器·hadoop·分布式·ubuntu·架构