Hive集成Iceberg碰到的问题

背景

想基于尚硅谷的数仓的环境(完成了hadoop组件和hive的安装)做实时流的开发,于是在集成Iceberg的时候,就复用了原始的环境,但是在成功创建iceberg表,对其插入数据,执行MR的过程中报错了。

报错情况

因为在之前数仓使用hive的时候,都是配置好Hive的环境变量,通过hive 的命令直接进入客户端进行增删改查并没有发现有什么问题。

于是在集成了iceberg的时候也采用了这种方式,并且配置文件也开启了iceberg的支持,但是执行MR就报错了。

新增配置文件

xml 复制代码
<property>
    <name>iceberg.engine.hive.enabled</name>
    <value>true</value>
</property>

报错信息

通过 yarn logs -applicatinId 查看到了具体报错,说是缺少某个类。

复制代码
2024-12-30 21:21:24,687 ERROR [CommitterEvent Processor #1] org.apache.hadoop.hive.metastore.RetryingHMSHandler: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 50 more

往下翻,提示无法链接Hive的元数据信息

复制代码
2024-12-30 21:21:25,128 INFO [Thread-71] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Job commit failed: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:62)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: MetaException(message:org/datanucleus/NucleusContext)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	... 23 more
Caused by: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
	... 34 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 50 more

解决方案

最开始,以为是版本不适配,把hadoop升级从3.1.3 升级到了3.1.4,无济于事;而后,通过调整iceberg-runtime.jar的版本,也无济于事。

通过网上查阅相关的资料,发现别人在执行的过程中需要增加Hive的元数据链接的配置,并且开启hive的元数据信息 ,具体操作如下:

1、新增元数据链接配置信息

xml 复制代码
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop101:9083</value>
</property>

2、先在后台启动hive元数据信息

shell 复制代码
hive --service metastore &

最后,完美解决。

相关推荐
奋斗的蛋黄4 小时前
HDFS与Yarn深入剖析
大数据·运维·hadoop
core5127 小时前
Hive实战(三)
数据仓库·hive·hadoop
BYSJMG7 小时前
计算机毕设推荐:基于Hadoop+Spark物联网网络安全数据分析系统 物联网威胁分析系统【源码+文档+调试】
大数据·hadoop·python·物联网·spark·django·课程设计
陈天伟教授7 小时前
Hadoop Windows客户端配置与实践指南
大数据·hadoop·windows
lifallen8 小时前
Hadoop MapOutputBuffer:Map高性能核心揭秘
java·大数据·数据结构·hadoop·算法·apache
程序员小羊!10 小时前
大数据电商流量分析项目实战:Hive 数据仓库(三)
大数据·数据仓库·hive
IT毕设梦工厂1 天前
大数据毕业设计选题推荐-基于大数据的国家医用消耗选品采集数据可视化分析系统-Hadoop-Spark-数据可视化-BigData
大数据·hadoop·信息可视化·spark·毕业设计·数据可视化·bigdata
core5121 天前
Hive实战(一)
数据仓库·hive·hadoop·架构·实战·配置·场景
智海观潮1 天前
Spark SQL解析查询parquet格式Hive表获取分区字段和查询条件
hive·sql·spark
isfox1 天前
Hadoop简介:分布式系统的基石与核心架构详解
hadoop