Hudi 0.10.0 集成 Hive 时的异常及解决方法: java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
异常信息
使用 Hive CLI 连接 Hive 3.1.2 并查询对应的 Hudi 映射的 Hive 表,发现如下异常:
hive (flk_hive)> select * from status_h2h limit 10;
22/10/24 15:22:07 INFO conf.HiveConf: Using the default value passed in for log id: 0f8a42a6-8195-413a-90dc-a31f7f96f1f0
22/10/24 15:22:07 INFO session.SessionState: Updating thread name to 0f8a42a6-8195-413a-90dc-a31f7f96f1f0 main
22/10/24 15:22:07 INFO ql.Driver: Compiling command(queryId=hadoop_20221024152207_133658b2-28c5-4a69-9b2f-b4b2ce99994a): select * from status_h2h limit 10
22/10/24 15:22:07 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
22/10/24 15:22:07 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
22/10/24 15:22:07 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
22/10/24 15:22:07 INFO parse.SemanticAnalyzer: Get metadata for source tables
FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
22/10/24 15:22:08 ERROR ql.Driver: FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:324)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2191)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2075)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12033)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12129)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11676)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321)
... 24 more
错误原因推断
根据报错信息 Caused by: java.lang.ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat
推断时缺少相应的 Jar 包所导致的异常。
翻看 Hudi 0.10.0 集成 Hive 的文档,文档链接,可以看到需要将 hudi-hadoop-mr-bundle
的相应 Jar 包放至 $HIVE_HOME/auxlib
中。如果没有相应 auxlib
路径则需要新建 (注意权限问题)。
注意如果需要使用 Hudi 集成 Hive312 则需要重新编译 Hudi 包,并在编译时使用 -Pflink-bundle-shade-hive3
参数。
解决方法
-
修改 Hudi 的顶级 pom.xml 文件中的 Hive 版本为对应版本
<hive.groupid>org.apache.hive</hive.groupid>
<hive.version>3.1.2</hive.version> <hive.exec.classifier>core</hive.exec.classifier>
-
使用命令配合新参数重新编译 Hudi 项目
mvn clean install -^DskipTests -^Dmaven.test.skip=true -^DskipITs -^Dcheckstyle.skip=true -^Drat.skip=true -^Dhadoop.version=3.0.0-cdh6.3.2 -^Pflink-bundle-shade-hive3 -^Dscala-2.12 -^Pspark-shade-unbundle-avro
-
将新编编译出的
hudi-hadoop-mr-bundle-0.10.0.jar
移动至$HIVE_HOME/auxlib
中[root@p0-tklcdh-nn03 auxlib]# pwd
/opt/cloudera/parcels/CDH/lib/hive/auxlib
[root@p0-tklcdh-nn03 auxlib]# ls -l
total 16892
-rw-r--r-- 1 appadmin appadmin 17294810 Oct 24 16:41 hudi-hadoop-mr-bundle-0.10.0.jar
特别注意: 编译时的参数和 Hive 版本一定要指定正确,否则相应 Jar 包的大小不同,会出现各种奇奇怪怪的问题。
# 这是 Hive2 相关的 hudi-hadoop-mr-bundle-0.10.0.jar 大小
[hadoop@p0-tklfrna-tklrna-device02 auxlib]$ ls -l ../../../jars/hudi-hadoop-mr-bundle-0.10.0.jar
-rw-r--r-- 1 root root 17289727 Mar 28 2022 ../../../jars/hudi-hadoop-mr-bundle-0.10.0.jar
# 这是 Hive3 相关的 hudi-hadoop-mr-bundle-0.10.0.jar 大小
[root@p0-tklcdh-nn03 auxlib]# ls -l
total 16892
-rw-r--r-- 1 appadmin appadmin 17294810 Oct 24 16:41 hudi-hadoop-mr-bundle-0.10.0.jar
- 重启 hivemetastore 和 hiveserver2 使之载入新添加的 Jar 包。
异常解决
将相应的 Hudi 依赖放至 $HIVE_HOME/auxlib 下,重启 hivemetastore 和 hiveserver2 再次使用 Hive CLI 进行查询
22/10/24 17:12:39 INFO ql.Driver: Executing command(queryId=root_20221024171238_96b1962b-b9b4-44b2-a554-2c7055fdf253): select * from status_h2h limit 10
22/10/24 17:12:39 INFO ql.Driver: Completed executing command(queryId=root_20221024171238_96b1962b-b9b4-44b2-a554-2c7055fdf253); Time taken: 0.001 seconds
OK
22/10/24 17:12:39 INFO ql.Driver: OK
22/10/24 17:12:39 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
22/10/24 17:12:39 INFO utils.HoodieInputFormatUtils: Reading hoodie metadata from path hdfs://10.132.62.2/hudi/flk_hudi/status_hudi
22/10/24 17:12:39 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://10.132.62.2/hudi/flk_hudi/status_hudi
22/10/24 17:12:39 INFO table.HoodieTableConfig: Loading table properties from hdfs://10.132.62.2/hudi/flk_hudi/status_hudi/.hoodie/hoodie.properties
22/10/24 17:12:39 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://10.132.62.2/hudi/flk_hudi/status_hudi
22/10/24 17:12:39 INFO utils.HoodieInputFormatUtils: Found a total of 1 groups
22/10/24 17:12:40 INFO timeline.HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20221024171048577__commit__INFLIGHT]}
22/10/24 17:12:40 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://10.132.62.2/hudi/flk_hudi/status_hudi
22/10/24 17:12:40 INFO view.AbstractTableFileSystemView: Took 4 ms to read 0 instants, 0 replaced file groups
22/10/24 17:12:40 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
22/10/24 17:12:40 INFO view.AbstractTableFileSystemView: Building file system view for partition ()
22/10/24 17:12:40 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=10297, NumFileGroups=16, FileGroupsCreationTime=449, StoreTimeTaken=2
22/10/24 17:12:40 INFO utils.HoodieInputFormatUtils: Total paths to process after hoodie filter 16
22/10/24 17:12:40 INFO view.AbstractTableFileSystemView: Took 1 ms to read 0 instants, 0 replaced file groups
22/10/24 17:12:40 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
22/10/24 17:12:40 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
22/10/24 17:12:41 INFO hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 16680 records.
22/10/24 17:12:41 INFO hadoop.InternalParquetRecordReader: at row 0. reading next block
22/10/24 17:12:41 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
22/10/24 17:12:41 INFO compress.CodecPool: Got brand-new decompressor [.gz]
22/10/24 17:12:41 INFO hadoop.InternalParquetRecordReader: block read in memory in 42 ms. row count = 16680
Done