hive with tez:无法从链中的任何提供者加载aws凭据

环境信息

hadoop 3.1.0

hive-3.1.3

tez 0.9.1

问题描述

可以从hadoop命令行正确地访问s3a uri。我可以创建外部表和如下命令:

复制代码
create external table mytable(a string, b string) location 's3a://mybucket/myfolder/';
select * from mytable limit 20;

执行正确,但是

复制代码
select count(*) from mytable;

失败日志:

复制代码
INFO  : Compiling command(queryId=root_20230919030746_7b38e3c8-8429-4d45-8a01-343bd26d8f6e): select count(*) from lyb0
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=root_20230919030746_7b38e3c8-8429-4d45-8a01-343bd26d8f6e); Time taken: 0.257 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20230919030746_7b38e3c8-8429-4d45-8a01-343bd26d8f6e): select count(*) from lyb0
INFO  : Query ID = root_20230919030746_7b38e3c8-8429-4d45-8a01-343bd26d8f6e
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: root_20230919030746_7b38e3c8-8429-4d45-8a01-343bd26d8f6e
INFO  : Session is already open
INFO  : Dag name: select count(*) from lyb0 (Stage-1)
INFO  : Status: Running (Executing on YARN cluster with App id application_1695092793092_0001)


----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1            container  INITIALIZING     -1          0        0       -1       0       0  
Reducer 2        container        INITED      1          0        0        1       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 9.55 s     
----------------------------------------------------------------------------------------------
ERROR : Status: Failed

ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1695092793092_0001_3_00, diagnostics=[Vertex vertex_1695092793092_0001_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: lyb0 initializer failed, vertex=vertex_1695092793092_0001_3_00 [Map 1], org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on hivesql: com.amazonaws.AmazonClientException: No AWS Credentials provided by SimpleAWSCredentialsProvider : org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or secret key is unset: No AWS Credentials provided by SimpleAWSCredentialsProvider : org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or secret key is unset

	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
......
	at java.lang.Thread.run(Thread.java:750)

Caused by: com.amazonaws.AmazonClientException: No AWS Credentials provided by SimpleAWSCredentialsProvider : org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or secret key is unset

	at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:139)
......
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)

	... 31 more

Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or secret key is unset

	at org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.getCredentials(SimpleAWSCredentialsProvider.java:75)

	at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:117)

	... 45 more

]

ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1695092793092_0001_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1695092793092_0001_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]

ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1695092793092_0001_3_00, diagnostics=[Vertex vertex_1695092793092_0001_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: lyb0 initializer failed, vertex=vertex_1695092793092_0001_3_00 [Map 1], org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on hivesql: com.amazonaws.AmazonClientException: No AWS Credentials provided by SimpleAWSCredentialsProvider : org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or secret key is unset: No AWS Credentials provided by SimpleAWSCredentialsProvider : org.apache.hadoop.fs.s3a.CredentialInitializationException: Access key or secret key is unset

	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
......

尝试将core-site.xml中的所有fs.s3a属性添加到tez-site.xml中,并在配置单元会话内设置fs,s3 a,access. key和fs.s3a.secret.key=,但仍出现相同错误。

解决方法

确保未在tez-site.xml中设置tez.use.cluster.hadoop-libs,或者如果设置了,则值应为false

但是当设置为false时,tez无法运行。

当设置为true时,得到了aws凭据错误,即使在每个可能的位置或环境变量中设置了它们。

最终通过将这个属性添加到hive-site.xml中使它工作起来

复制代码
<property>
    <name>hive.conf.hidden.list</name>
  <value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
</property>

这是正确的解决方案。但是,只是让你知道,现在你暴露了S3密钥密码在各种日志文件。一些文件,知道如下;

Hive-〉<HIVE_HOME>/logs/<user>/webhcat/webhcat.log.<date>

Hadoop -〉1个内存6个内存1个

如果您有权访问源代码,则可以修改此方法,使其不在配置单元日志中生成上述属性。

参考资料:

https://www.saoniuhuo.com/question/detail-2512416.html

https://www.saoniuhuo.com/question/detail-1939018.html

相关推荐
IT成长日记几秒前
【Hadoop入门】Hadoop生态之Hive简介
大数据·hive·hadoop
刘翔在线犯法21 分钟前
Hadoop的序列化和反序列化
大数据·hadoop·分布式
麻芝汤圆22 分钟前
利用Hadoop MapReduce实现流量统计分析
大数据·开发语言·hadoop·分布式·servlet·mapreduce
睎zyl11 小时前
Hadoop的序列化
大数据·hadoop·分布式
IT成长日记12 小时前
【Hadoop入门】Hadoop生态之ZooKeeper简介
大数据·hadoop·zookeeper
lqlj223314 小时前
Hadoop序列化与反序列化
大数据·hadoop·分布式
lqlj223316 小时前
Hadoop序列化与反序列化具体实践
大数据·hadoop·分布式
背着黄油面包的猫18 小时前
搭建hadoop集群模式并运行
大数据·hadoop·分布式
weixin_307779131 天前
使用C#配置信息类的属性生成Snowflake CREATE STAGE语句
开发语言·数据仓库·hive·c#
IT成长日记2 天前
【Hadoop入门】Hadoop生态圈概述:核心组件与应用场景概述
大数据·hadoop·分布式