Hive集成Iceberg碰到的问题

背景

想基于尚硅谷的数仓的环境(完成了hadoop组件和hive的安装)做实时流的开发,于是在集成Iceberg的时候,就复用了原始的环境,但是在成功创建iceberg表,对其插入数据,执行MR的过程中报错了。

报错情况

因为在之前数仓使用hive的时候,都是配置好Hive的环境变量,通过hive 的命令直接进入客户端进行增删改查并没有发现有什么问题。

于是在集成了iceberg的时候也采用了这种方式,并且配置文件也开启了iceberg的支持,但是执行MR就报错了。

新增配置文件

xml 复制代码
<property>
    <name>iceberg.engine.hive.enabled</name>
    <value>true</value>
</property>

报错信息

通过 yarn logs -applicatinId 查看到了具体报错,说是缺少某个类。

复制代码
2024-12-30 21:21:24,687 ERROR [CommitterEvent Processor #1] org.apache.hadoop.hive.metastore.RetryingHMSHandler: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 50 more

往下翻,提示无法链接Hive的元数据信息

复制代码
2024-12-30 21:21:25,128 INFO [Thread-71] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Job commit failed: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:62)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: MetaException(message:org/datanucleus/NucleusContext)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	... 23 more
Caused by: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
	... 34 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 50 more

解决方案

最开始,以为是版本不适配,把hadoop升级从3.1.3 升级到了3.1.4,无济于事;而后,通过调整iceberg-runtime.jar的版本,也无济于事。

通过网上查阅相关的资料,发现别人在执行的过程中需要增加Hive的元数据链接的配置,并且开启hive的元数据信息 ,具体操作如下:

1、新增元数据链接配置信息

xml 复制代码
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop101:9083</value>
</property>

2、先在后台启动hive元数据信息

shell 复制代码
hive --service metastore &

最后,完美解决。

相关推荐
tsyjjOvO3 天前
SpringMVC 从入门到精通
数据仓库·hive·hadoop
Francek Chen3 天前
【大数据存储与管理】分布式数据库HBase:05 HBase运行机制
大数据·数据库·hadoop·分布式·hdfs·hbase
zzzzzwbetter3 天前
Hadoop完全分布式部署-Master的NameNode以及Slaver2的DataNode未启动
大数据·hadoop·分布式
weixin_449310843 天前
ETL转换和数据写入小满OKKICRM的技术细节
数据仓库·php·etl
IvanCodes3 天前
Hive IDE连接及UDF实战
ide·hive·hadoop
yumgpkpm3 天前
华为昇腾910B 开源软件GPUStack的介绍(Cloudera CDH、CDP)
人工智能·hadoop·elasticsearch·flink·kafka·企业微信·big data
lifewange4 天前
Hive数据库
数据库·hive·hadoop
五月天的尾巴5 天前
hive数据库模糊查询表名
hive·查询表名
蓝魔Y5 天前
hive—1.1、执行优化
hive
快乐非自愿5 天前
OpenClaw 生态适配:Hadoop/Hive 技能现状与企业级集成方案
大数据·hive·hadoop·分布式·openclaw