Hive集成Iceberg碰到的问题

背景

想基于尚硅谷的数仓的环境(完成了hadoop组件和hive的安装)做实时流的开发,于是在集成Iceberg的时候,就复用了原始的环境,但是在成功创建iceberg表,对其插入数据,执行MR的过程中报错了。

报错情况

因为在之前数仓使用hive的时候,都是配置好Hive的环境变量,通过hive 的命令直接进入客户端进行增删改查并没有发现有什么问题。

于是在集成了iceberg的时候也采用了这种方式,并且配置文件也开启了iceberg的支持,但是执行MR就报错了。

新增配置文件

xml 复制代码
<property>
    <name>iceberg.engine.hive.enabled</name>
    <value>true</value>
</property>

报错信息

通过 yarn logs -applicatinId 查看到了具体报错,说是缺少某个类。

复制代码
2024-12-30 21:21:24,687 ERROR [CommitterEvent Processor #1] org.apache.hadoop.hive.metastore.RetryingHMSHandler: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 50 more

往下翻,提示无法链接Hive的元数据信息

复制代码
2024-12-30 21:21:25,128 INFO [Thread-71] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Job commit failed: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:62)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
	at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
	at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: MetaException(message:org/datanucleus/NucleusContext)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
	at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
	at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
	... 23 more
Caused by: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
	... 34 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 50 more

解决方案

最开始,以为是版本不适配,把hadoop升级从3.1.3 升级到了3.1.4,无济于事;而后,通过调整iceberg-runtime.jar的版本,也无济于事。

通过网上查阅相关的资料,发现别人在执行的过程中需要增加Hive的元数据链接的配置,并且开启hive的元数据信息 ,具体操作如下:

1、新增元数据链接配置信息

xml 复制代码
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop101:9083</value>
</property>

2、先在后台启动hive元数据信息

shell 复制代码
hive --service metastore &

最后,完美解决。

相关推荐
亲爱的非洲野猪1 天前
SpringBoot启动流程深度剖析:从@SpringBootApplication到Servlet容器就绪
hive·spring boot·servlet
星火开发设计1 天前
深入浅出HDFS:分布式文件系统核心原理与实践解析
大数据·数据库·hadoop·学习·hdfs·分布式数据库·知识
`林中水滴`1 天前
Hive系列:Hive 整合 HBase
hive·hbase
Hello.Reader1 天前
Hadoop Formats 在 Flink 里复用 Hadoop InputFormat(flink-hadoop-compatibility)
大数据·hadoop·flink
s***87271 天前
TCP/IP协议栈深度解析技术文章大纲
hive·spring boot
橙露1 天前
大数据分析入门:Hadoop 生态系统与 Python 结合的分布式数据处理实践
hadoop·分布式·数据分析
CoookeCola1 天前
从人脸检测到音频偏移:基于SyncNet的音视频偏移计算与人脸轨迹追踪技术解析
数据仓库·人工智能·目标检测·计算机视觉·数据挖掘
zgl_200537792 天前
ZGLanguage 解析SQL数据血缘 之 Python + Echarts 显示SQL结构图
大数据·数据库·数据仓库·hadoop·sql·代码规范·源代码管理
飞Link2 天前
【Sqoop】Sqoop 使用教程:从原理到实战的完整指南
数据库·hadoop·sqoop
SelectDB技术团队2 天前
驾驭 CPU 与编译器:Apache Doris 实现极致性能的底层逻辑
数据库·数据仓库·人工智能·sql·apache