背景
想基于尚硅谷的数仓的环境(完成了hadoop组件和hive的安装)做实时流的开发,于是在集成Iceberg的时候,就复用了原始的环境,但是在成功创建iceberg表,对其插入数据,执行MR的过程中报错了。
报错情况
因为在之前数仓使用hive的时候,都是配置好Hive的环境变量,通过hive
的命令直接进入客户端进行增删改查并没有发现有什么问题。
于是在集成了iceberg的时候也采用了这种方式,并且配置文件也开启了iceberg的支持,但是执行MR就报错了。
新增配置文件
xml
<property>
<name>iceberg.engine.hive.enabled</name>
<value>true</value>
</property>
报错信息
通过 yarn logs -applicatinId
查看到了具体报错,说是缺少某个类。
2024-12-30 21:21:24,687 ERROR [CommitterEvent Processor #1] org.apache.hadoop.hive.metastore.RetryingHMSHandler: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 50 more
往下翻,提示无法链接Hive的元数据信息
2024-12-30 21:21:25,128 INFO [Thread-71] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Job commit failed: org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:62)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:32)
at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:118)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:49)
at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:181)
at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)
at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:77)
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:93)
at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:280)
at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJob$2(HiveIcebergOutputCommitter.java:193)
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405)
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
at org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitJob(HiveIcebergOutputCommitter.java:188)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: MetaException(message:org/datanucleus/NucleusContext)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:137)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:60)
at org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:73)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:53)
... 23 more
Caused by: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.metastore.utils.JavaUtils.getClass(JavaUtils.java:52)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
... 34 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 50 more
解决方案
最开始,以为是版本不适配,把hadoop升级从3.1.3 升级到了3.1.4,无济于事;而后,通过调整iceberg-runtime.jar的版本,也无济于事。
通过网上查阅相关的资料,发现别人在执行的过程中需要增加Hive的元数据链接的配置,并且开启hive的元数据信息 ,具体操作如下:
1、新增元数据链接配置信息
xml
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop101:9083</value>
</property>
2、先在后台启动hive元数据信息
shell
hive --service metastore &
最后,完美解决。