hive 之select 中文乱码

此处的中文乱码和mysql的库表 编码 latin utf 无关。

直接上案例。

有时候我们需要自定义一列,有时是汉字有时是字母,结果遇到这种情况了。

说实话看到这真是糟心。这谁受得了。

单独select 没有任何问题。

这是怎么回事呢? 经过一番检查,发现有个地方类似与 "境内" as col但是没乱码,

此时怀疑就是if 函数起了作用,但是一时间不知道是为啥。。

经过多方面测试 concat("境内") concat_ws("","境内")没用,

concat_ws("",arrary("境内")) 有用,此时也不知道如何下手,只有掏出大杀器 explain.

起作用的

Plan optimized by CBO.

""

Vertex dependency in root stage

Map 1 <- Map 3 (BROADCAST_EDGE)

Reducer 2 <- Map 1 (SIMPLE_EDGE)

""

Stage-0

Fetch Operator

limit:-1

Stage-1

Reducer 2

File Output Operator [FS_14]

Select Operator [SEL_13] (rows=105 width=273)

" Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4"",""_col5"",""_col6"",""_col7""]"

Group By Operator [GBY_12] (rows=105 width=273)

" Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:KEY._col0, KEY._col1, KEY._col2, KEY._col3, KEY._col4"

<-Map 1 [SIMPLE_EDGE] vectorized

SHUFFLE [RS_28]

" PartitionCols:_col0, _col1, _col2, _col3, _col4"

Group By Operator [GBY_27] (rows=211 width=273)

" Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:_col1, _col2, _col3, _col4, _col5"

Map Join Operator [MAPJOIN_26] (rows=211 width=273)

" Conds:SEL_25._col0=RS_23._col0(Inner),Output:[""_col1"",""_col2"",""_col3"",""_col4"",""_col5""]"

<-Map 3 [BROADCAST_EDGE] vectorized

BROADCAST [RS_23]

PartitionCols:_col0

Select Operator [SEL_22] (rows=1 width=736)

" Output:[""_col0"",""_col1"",""_col2"",""_col3""]"

Filter Operator [FIL_21] (rows=1 width=736)

predicate:bank_code is not null

TableScan [TS_3] (rows=1 width=736)

" dwapsdata@dw_conf_ce_bank_dict_v,t1,Tbl:COMPLETE,Col:NONE,Output:[""bank_code"",""bank_name"",""bank_short_name"",""bank_onshore_flag""]"

<-Select Operator [SEL_25] (rows=192 width=273)

" Output:[""_col0"",""_col1""]"

Filter Operator [FIL_24] (rows=192 width=273)

predicate:bank_code is not null

TableScan [TS_0] (rows=192 width=273)

" dwdmdata@dm_ce_f_portrait_credit_line,t,Tbl:COMPLETE,Col:COMPLETE,Output:[""bank_code""]"

""

没有作用的

Plan optimized by CBO.

""

Vertex dependency in root stage

Map 1 <- Map 3 (BROADCAST_EDGE)

Reducer 2 <- Map 1 (SIMPLE_EDGE)

""

Stage-0

Fetch Operator

limit:-1

Stage-1

Reducer 2 vectorized

File Output Operator [FS_31]

Select Operator [SEL_30] (rows=105 width=273)

" Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4"",""_col5"",""_col6""]"

Group By Operator [GBY_29] (rows=105 width=273)

" Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:KEY._col0, KEY._col1, KEY._col2, KEY._col3, KEY._col4"

<-Map 1 [SIMPLE_EDGE] vectorized

SHUFFLE [RS_28]

" PartitionCols:_col0, _col1, _col2, _col3, _col4"

Group By Operator [GBY_27] (rows=211 width=273)

" Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:_col1, _col2, _col3, _col4, _col5"

Map Join Operator [MAPJOIN_26] (rows=211 width=273)

" Conds:SEL_25._col0=RS_23._col0(Inner),Output:[""_col1"",""_col2"",""_col3"",""_col4"",""_col5""]"

<-Map 3 [BROADCAST_EDGE] vectorized

BROADCAST [RS_23]

PartitionCols:_col0

Select Operator [SEL_22] (rows=1 width=736)

" Output:[""_col0"",""_col1"",""_col2"",""_col3""]"

Filter Operator [FIL_21] (rows=1 width=736)

predicate:bank_code is not null

TableScan [TS_3] (rows=1 width=736)

" dwapsdata@dw_conf_ce_bank_dict_v,t1,Tbl:COMPLETE,Col:NONE,Output:[""bank_code"",""bank_name"",""bank_short_name"",""bank_onshore_flag""]"

<-Select Operator [SEL_25] (rows=192 width=273)

" Output:[""_col0"",""_col1""]"

Filter Operator [FIL_24] (rows=192 width=273)

predicate:bank_code is not null

TableScan [TS_0] (rows=192 width=273)

" dwdmdata@dm_ce_f_portrait_credit_line,t,Tbl:COMPLETE,Col:COMPLETE,Output:[""bank_code""]"

""

对比发现

vectorzied 这个单词一出来我就知道怎么回事了。

hive decimal bug, nvl(decimal,1)=0_cclovezbf的博客-CSDN博客

这个b参数好处没体会到一点,bug到是一堆。

复制代码
set hive.vectorized.execution.enabled=false; 即可解决中文乱码问题!!!!!!!

其实还有别的办法,但是和concat_ws(array(""))一样比较丑陋,我就不说了

相关推荐
二进制_博客9 小时前
spark on hive 还是 hive on spark?
大数据·hive·spark
IT毕设梦工厂15 小时前
大数据毕业设计选题推荐-基于大数据的人体生理指标管理数据可视化分析系统-Hadoop-Spark-数据可视化-BigData
大数据·hadoop·信息可视化·spark·毕业设计·源码·bigdata
云淡风轻~~17 小时前
构建和部署Spark、Hadoop与Zeppelin集成环境
大数据·hadoop·spark
IT研究室17 小时前
大数据毕业设计选题推荐-基于大数据的人体体能活动能量消耗数据分析与可视化系统-大数据-Spark-Hadoop-Bigdata
大数据·hadoop·数据分析·spark·毕业设计·源码·bigdata
大叔_爱编程21 小时前
基于Python的交通数据分析应用-hadoop+django
hadoop·python·django·毕业设计·源码·课程设计·交通数据分析
Kay_Liang21 小时前
数据仓库入门:从超市小票看懂数仓
数据仓库·笔记·数据分析
D明明就是我1 天前
Hive 知识点梳理
数据仓库·hive·hadoop
莫叫石榴姐1 天前
SQL百题斩:从入门到精通,一站式解锁数据世界
大数据·数据仓库·sql·面试·职场和发展
IT森林里的程序猿1 天前
基于Hadoop的京东电商平台手机推荐系统的设计与实现
大数据·hadoop·智能手机
秃头菜狗1 天前
十四、运行经典案例 wordcount
大数据·linux·hadoop