通过源代码修改使 Apache Hudi 支持 Kerberos 访问 Hive 的功能

Hudi 0.10.0 Kerberos-support 适配文档

文档说明

本文档主要用于阐释如何基于 Hudi 0.10.0 添加支持 Kerberos 认证权限的功能。

主要贡献:

  1. 针对正在使用的 Hudi 源代码进行 Kerberos-support 功能扩展,总修改规模囊括了 12 个文件约 20 处代码共计 约 200 行代码;
  2. 对 Hudi 0.10.0 的源代码进行了在保持所有自定义特性的基础上,支持了基于 Kerberos 权限认证并同步 Hive 表的功能;
  3. 对此类工作大致的工作思路进行主要流程的汇总并予以实例

主要思路及操作如下所示:

  1. 根据博客 将hudi同步到配置kerberos的hive3 中的阐述添加 Kerberos 认证功能并检验其可行性
  2. 根据目前未被 Merge 到 Main Branch 的 Hudi 官方的代码 PR Hudi-2402xiaozhch5 的分支 基于本地分支进行代码比对、梳理和修改
  3. 添加了从 Hudi 表同步 Hive 表时支持 Kerberos 的功能及相应各配置项
  4. 根据修改后的代码添加了 pom 文件中的各种依赖关系及配置项

分支目录

如下为进行修改的分支编号。查看 commit 内容并找到自定义分支与 Hudi 0.10.1 主分支的 commit 时间线差距。

可以定位到当前所在 commit 的 hashcode_id 为 4c65ca544,而 Hudi 0.10.1 主分支的 hashcode_id 为 84fb390e4

commit 4c65ca544b91e828462419bbc12e116bfe1dbc2c (origin/0.10.1-release-hive3-kerberos-enabled)
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Wed Mar 2 00:15:05 2022 +0800

    新增krb5.conf文件路径,模式使用/etc/krb5.conf

commit 116352beb2e028357d0ffca385dd2f11a9cef72b
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 23:30:08 2022 +0800

    添加编译命令

commit ffc26256ba4cbb52ea653551ea88d109bc26e315
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 23:14:21 2022 +0800

    适配hdp3.1.4编译,解决找不到包的问题

commit fbc53aa29e63dc5b097a3014d05f6b82cfcf2a70
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 22:20:03 2022 +0800

    [MINOR] Remove org.apache.directory.api.util.Strings import

commit 05fee3608d17abbd0217818a6bf02e4ead8f6de8
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 21:07:34 2022 +0800

    添加flink引擎支持将hudi同步到配置kerberos的hive3 metastore,仅针对Flink 1.13引擎,其他引擎未修改

commit 84fb390e42cbbb72d1aaf4cf8f44cd6fba049595 (tag: release-0.10.1, origin/release-0.10.1)
Author: sivabalan <n.siva.b@gmail.com>
Date:   Tue Jan 25 20:15:31 2022 -0500

    [MINOR] Update release version to reflect published version 0.10.1

对比两版本之间的修改信息

在定位到 commit 版本差异后,使用 git diff 命令比对代码内容差异,并将对比结果输出至文件。

具体命令为 git diff 4c65ca544 84fb390e4 >> commit.diff

diff 复制代码
diff --git a/compile-command.sh b/compile-command.sh
deleted file mode 100644
index c5536c86c..000000000
--- a/compile-command.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-mvn clean install -DskipTests \
--Dhadoop.version=3.1.1.3.1.4.0-315 \
--Dhive.version=3.1.0.3.1.4.0-315 \
--Dscala.version=2.12.10 \
--Dscala.binary.version=2.12 \
--Dspark.version=3.0.1 \
--Dflink.version=1.13.5 \
--Pflink-bundle-shade-hive3 \
--Pspark3
\ No newline at end of file
diff --git a/hudi-aws/pom.xml b/hudi-aws/pom.xml
index 8c7f6dc73..d853690c0 100644
--- a/hudi-aws/pom.xml
+++ b/hudi-aws/pom.xml
@@ -116,12 +116,6 @@
             <artifactId>mockito-junit-jupiter</artifactId>
             <scope>test</scope>
         </dependency>
-
-        <dependency>
-            <groupId>com.google.code.findbugs</groupId>
-            <artifactId>jsr305</artifactId>
-            <version>3.0.0</version>
-        </dependency>
     </dependencies>
 
     <build>
diff --git a/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java b/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
index b7d6adf38..1968ef422 100644
--- a/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
+++ b/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
@@ -19,6 +19,7 @@
 
 package org.apache.hudi.common.testutils;
 
+import org.apache.directory.api.util.Strings;
 import org.apache.hudi.avro.model.HoodieCleanMetadata;
 import org.apache.hudi.avro.model.HoodieCleanerPlan;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
@@ -72,8 +73,6 @@ public class FileCreateUtils {
 
   private static final String WRITE_TOKEN = "1-0-1";
   private static final String BASE_FILE_EXTENSION = HoodieTableConfig.BASE_FILE_FORMAT.defaultValue().getFileExtension();
-  /** An empty byte array */
-  public static final byte[] EMPTY_BYTES = new byte[0];
 
   public static String baseFileName(String instantTime, String fileId) {
     return baseFileName(instantTime, fileId, BASE_FILE_EXTENSION);
@@ -222,7 +221,7 @@ public class FileCreateUtils {
   }
 
   public static void createCleanFile(String basePath, String instantTime, HoodieCleanMetadata metadata, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.CLEAN_EXTENSION, isEmpty ? EMPTY_BYTES : serializeCleanMetadata(metadata).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.CLEAN_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeCleanMetadata(metadata).get());
   }
 
   public static void createRequestedCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan) throws IOException {
@@ -230,7 +229,7 @@ public class FileCreateUtils {
   }
 
   public static void createRequestedCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.REQUESTED_CLEAN_EXTENSION, isEmpty ? EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.REQUESTED_CLEAN_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
   }
 
   public static void createInflightCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan) throws IOException {
@@ -238,7 +237,7 @@ public class FileCreateUtils {
   }
 
   public static void createInflightCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.INFLIGHT_CLEAN_EXTENSION, isEmpty ? EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.INFLIGHT_CLEAN_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
   }
 
   public static void createRequestedRollbackFile(String basePath, String instantTime, HoodieRollbackPlan plan) throws IOException {
@@ -250,7 +249,7 @@ public class FileCreateUtils {
   }
 
   public static void createRollbackFile(String basePath, String instantTime, HoodieRollbackMetadata hoodieRollbackMetadata, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.ROLLBACK_EXTENSION, isEmpty ? EMPTY_BYTES : serializeRollbackMetadata(hoodieRollbackMetadata).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.ROLLBACK_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeRollbackMetadata(hoodieRollbackMetadata).get());
   }
 
   public static void createRestoreFile(String basePath, String instantTime, HoodieRestoreMetadata hoodieRestoreMetadata) throws IOException {
diff --git a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
index 621aea3d2..77c3f15e5 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
@@ -653,36 +653,6 @@ public class FlinkOptions extends HoodieConfig {
       .withDescription("INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp type.\n"
           + "Disabled by default for backward compatibility.");
 
-  public static final ConfigOption<Boolean> HIVE_SYNC_KERBEROS_ENABLE = ConfigOptions
-      .key("hive_sync.kerberos.enable")
-      .booleanType()
-      .defaultValue(false)
-      .withDescription("Whether hive is configured with kerberos");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KRB5CONF = ConfigOptions
-      .key("hive_sync.kerberos.krb5.conf")
-      .stringType()
-      .defaultValue("")
-      .withDescription("kerberos krb5.conf file path");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_PRINCIPAL = ConfigOptions
-      .key("hive_sync.kerberos.principal")
-      .stringType()
-      .defaultValue("")
-      .withDescription("hive metastore kerberos principal");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_FILE = ConfigOptions
-      .key("hive_sync.kerberos.keytab.file")
-      .stringType()
-      .defaultValue("")
-      .withDescription("Hive metastore keytab file path");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_NAME = ConfigOptions
-      .key("hive_sync.kerberos.keytab.name")
-      .stringType()
-      .defaultValue("")
-      .withDescription("Hive metastore keytab file name");
-
   // -------------------------------------------------------------------------
   //  Utilities
   // -------------------------------------------------------------------------
diff --git a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
index bedc20f9b..1c051c8cd 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
@@ -86,11 +86,6 @@ public class HiveSyncContext {
     hiveSyncConfig.skipROSuffix = conf.getBoolean(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX);
     hiveSyncConfig.assumeDatePartitioning = conf.getBoolean(FlinkOptions.HIVE_SYNC_ASSUME_DATE_PARTITION);
     hiveSyncConfig.withOperationField = conf.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
-    hiveSyncConfig.enableKerberos = conf.getBoolean(FlinkOptions.HIVE_SYNC_KERBEROS_ENABLE);
-    hiveSyncConfig.krb5Conf = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KRB5CONF);
-    hiveSyncConfig.principal = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_PRINCIPAL);
-    hiveSyncConfig.keytabFile = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_FILE);
-    hiveSyncConfig.keytabName = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_NAME);
     return hiveSyncConfig;
   }
 }
diff --git a/hudi-hadoop-mr/pom.xml b/hudi-hadoop-mr/pom.xml
index ef0ea945a..7283d74f0 100644
--- a/hudi-hadoop-mr/pom.xml
+++ b/hudi-hadoop-mr/pom.xml
@@ -67,17 +67,6 @@
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-jdbc</artifactId>
-      <exclusions>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
-      </exclusions>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
index 2701820b8..9b6385120 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
@@ -123,21 +123,6 @@ public class HiveSyncConfig implements Serializable {
   @Parameter(names = {"--conditional-sync"}, description = "If true, only sync on conditions like schema change or partition change.")
   public Boolean isConditionalSync = false;
 
-  @Parameter(names = {"--enable-kerberos"}, description = "Whether hive configs kerberos")
-  public Boolean enableKerberos = false;
-
-  @Parameter(names = {"--krb5-conf"}, description = "krb5.conf file path")
-  public String krb5Conf = "/etc/krb5.conf";
-
-  @Parameter(names = {"--principal"}, description = "hive metastore principal")
-  public String principal = "hive/_HOST@EXAMPLE.COM";
-
-  @Parameter(names = {"--keytab-file"}, description = "hive metastore keytab file path")
-  public String keytabFile;
-
-  @Parameter(names = {"--keytab-name"}, description = "hive metastore keytab name")
-  public String keytabName;
-
   // enhance the similar function in child class
   public static HiveSyncConfig copy(HiveSyncConfig cfg) {
     HiveSyncConfig newConfig = new HiveSyncConfig();
@@ -162,11 +147,6 @@ public class HiveSyncConfig implements Serializable {
     newConfig.sparkSchemaLengthThreshold = cfg.sparkSchemaLengthThreshold;
     newConfig.withOperationField = cfg.withOperationField;
     newConfig.isConditionalSync = cfg.isConditionalSync;
-    newConfig.enableKerberos = cfg.enableKerberos;
-    newConfig.krb5Conf = cfg.krb5Conf;
-    newConfig.principal = cfg.principal;
-    newConfig.keytabFile = cfg.keytabFile;
-    newConfig.keytabName = cfg.keytabName;
     return newConfig;
   }
 
@@ -199,11 +179,6 @@ public class HiveSyncConfig implements Serializable {
       + ", sparkSchemaLengthThreshold=" + sparkSchemaLengthThreshold
       + ", withOperationField=" + withOperationField
       + ", isConditionalSync=" + isConditionalSync
-      + ", enableKerberos=" + enableKerberos
-      + ", krb5Conf=" + krb5Conf
-      + ", principal=" + principal
-      + ", keytabFile=" + keytabFile
-      + ", keytabName=" + keytabName
       + '}';
   }
 }
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
index 56553f1ed..b37b28ed2 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
@@ -23,7 +23,6 @@ import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.HoodieFileFormat;
 import org.apache.hudi.common.model.HoodieTableType;
@@ -44,7 +43,6 @@ import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.schema.PrimitiveType;
 import org.apache.parquet.schema.Type;
 
-import java.io.IOException;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -77,20 +75,8 @@ public class HiveSyncTool extends AbstractSyncTool {
     super(configuration.getAllProperties(), fs);
 
     try {
-      if (cfg.enableKerberos) {
-        System.setProperty("java.security.krb5.conf", cfg.krb5Conf);
-        Configuration conf = new Configuration();
-        conf.set("hadoop.security.authentication", "kerberos");
-        conf.set("kerberos.principal", cfg.principal);
-        UserGroupInformation.setConfiguration(conf);
-        UserGroupInformation.loginUserFromKeytab(cfg.keytabName, cfg.keytabFile);
-        configuration.set(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL.varname, "true");
-        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname, cfg.principal);
-        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_KEYTAB_FILE.varname, cfg.keytabFile);
-      }
-
       this.hoodieHiveClient = new HoodieHiveClient(cfg, configuration, fs);
-    } catch (RuntimeException | IOException e) {
+    } catch (RuntimeException e) {
       if (cfg.ignoreExceptions) {
         LOG.error("Got runtime exception when hive syncing, but continuing as ignoreExceptions config is set ", e);
       } else {
diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index ad32458d2..474e0499d 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -352,18 +352,8 @@
           <groupId>org.eclipse.jetty.orbit</groupId>
           <artifactId>javax.servlet</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
-
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-service</artifactId>
diff --git a/packaging/hudi-flink-bundle/pom.xml b/packaging/hudi-flink-bundle/pom.xml
index c4ee23017..640c71d68 100644
--- a/packaging/hudi-flink-bundle/pom.xml
+++ b/packaging/hudi-flink-bundle/pom.xml
@@ -138,7 +138,6 @@
                   <include>org.apache.hive:hive-service-rpc</include>
                   <include>org.apache.hive:hive-exec</include>
                   <include>org.apache.hive:hive-metastore</include>
-                  <include>org.apache.hive:hive-standalone-metastore</include>
                   <include>org.apache.hive:hive-jdbc</include>
                   <include>org.datanucleus:datanucleus-core</include>
                   <include>org.datanucleus:datanucleus-api-jdo</include>
@@ -445,7 +444,6 @@
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
-      <scope>${flink.bundle.hive.scope}</scope>
     </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
@@ -489,17 +487,8 @@
           <groupId>org.eclipse.jetty</groupId>
           <artifactId>*</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-common</artifactId>
@@ -694,12 +683,6 @@
           <version>${hive.version}</version>
           <scope>${flink.bundle.hive.scope}</scope>
         </dependency>
-        <dependency>
-          <groupId>org.apache.hive</groupId>
-          <artifactId>hive-standalone-metastore</artifactId>
-          <version>${hive.version}</version>
-          <scope>${flink.bundle.hive.scope}</scope>
-        </dependency>
       </dependencies>
     </profile>
   </profiles>
diff --git a/packaging/hudi-integ-test-bundle/pom.xml b/packaging/hudi-integ-test-bundle/pom.xml
index ee2605de3..30704c8c9 100644
--- a/packaging/hudi-integ-test-bundle/pom.xml
+++ b/packaging/hudi-integ-test-bundle/pom.xml
@@ -408,10 +408,6 @@
           <groupId>org.pentaho</groupId>
           <artifactId>*</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
 
@@ -428,19 +424,9 @@
           <groupId>javax.servlet</groupId>
           <artifactId>servlet-api</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
 
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-common</artifactId>
diff --git a/packaging/hudi-kafka-connect-bundle/pom.xml b/packaging/hudi-kafka-connect-bundle/pom.xml
index d2cc84df7..bf395a411 100644
--- a/packaging/hudi-kafka-connect-bundle/pom.xml
+++ b/packaging/hudi-kafka-connect-bundle/pom.xml
@@ -306,17 +306,6 @@
             <artifactId>hive-jdbc</artifactId>
             <version>${hive.version}</version>
             <scope>${utilities.bundle.hive.scope}</scope>
-            <exclusions>
-                <exclusion>
-                    <groupId>org.apache.hadoop</groupId>
-                    <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-                </exclusion>
-            </exclusions>
-        </dependency>
-        <dependency>
-            <groupId>org.apache.hadoop</groupId>
-            <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-            <version>${hadoop.version}</version>
         </dependency>
 
         <dependency>
diff --git a/packaging/hudi-spark-bundle/pom.xml b/packaging/hudi-spark-bundle/pom.xml
index 44f424540..d8d1a1d2d 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -289,17 +289,6 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${spark.bundle.hive.scope}</scope>
-      <exclusions>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
-      </exclusions>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/packaging/hudi-utilities-bundle/pom.xml b/packaging/hudi-utilities-bundle/pom.xml
index 9384c4f01..360e8c7f1 100644
--- a/packaging/hudi-utilities-bundle/pom.xml
+++ b/packaging/hudi-utilities-bundle/pom.xml
@@ -308,18 +308,6 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${utilities.bundle.hive.scope}</scope>
-      <exclusions>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
-      </exclusions>
-    </dependency>
-
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/pom.xml b/pom.xml
index 36aed4785..470f7db2d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1164,10 +1164,6 @@
       <id>confluent</id>
       <url>https://packages.confluent.io/maven/</url>
     </repository>
-    <repository>
-      <id>hdp</id>
-      <url>https://repo.hortonworks.com/content/repositories/releases/</url>
-    </repository>
   </repositories>
 
   <profiles>

代码阅读及理解

针对上述进行修改的各个 java 文件及相应类或方法,详细阅读并理解其整体代码结构及思维逻辑,并针对改动理解其改动意义。

针对正在使用的 Hudi 版本进行代码修改

针对我们正在使用的 Hudi 源代码进行 Kerberos-support 功能扩展,总修改规模囊括了 12 个文件约 20 处代码共计 约 200 行代码。

具体修改结果如下:

diff 复制代码
diff --git a/README.md b/README.md
index 2b32591..f31070b 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,17 @@
 
 # 寮€鍙戞棩蹇? 
+## November/28th/2022
+
+1. 鏍规嵁鍗氬 [灏唄udi鍚屾鍒伴厤缃甼erberos鐨刪ive3](https://cloud.tencent.com/developer/article/1949358) 涓殑闃愯堪娣诲姞 Kerberos 璁よ瘉鍔熻兘骞舵楠屽叾鍙鎬?+2. 鏍规嵁鐩墠鏈 Merge 鍒?Main Branch 鐨?Hudi 瀹樻柟鐨勪唬鐮?PR [Hudi-2402](https://github.com/apache/hudi/pull/3771) 鍜?[xiaozhch5 鐨勫垎鏀痌(https://github.com/xiaozhch5/hudi/tree/0.10.1-release-hive3-kerberos-enabled) 鍩轰簬鏈湴鍒嗘敮杩涜浠g爜姣斿銆佹⒊鐞嗗拰淇敼
+3. 娣诲姞浜嗕粠 Hudi 琛ㄥ悓姝?Hive 琛ㄦ椂鏀寔 Kerberos 鐨勫姛鑳藉強鐩稿簲鍚勯厤缃」
+4. 鏍规嵁淇敼鍚庣殑浠g爜娣诲姞浜?pom 鏂囦欢涓殑鍚勭渚濊禆鍏崇郴鍙婇厤缃」
+
+Ps: 鏈浣跨敤鐨勭紪璇戝懡浠や负 `mvn clean install -^DskipTests -^Dcheckstyle.skip=true -^Dmaven.test.skip=true -^DskipITs -^Dhadoop.version=3.0.0-cdh6.3.2 -^Dhive.version=3.1.2 -^Dscala.version=2.12.10 -^Dscala.binary.version=2.12 -^Dflink.version=1.13.2 -^Pflink-bundle-shade-hive3`
+
+// Ps: 鏈浣跨敤鐨勭紪璇戝懡浠や负 `mvn clean install -^DskipTests -^Dmaven.test.skip=true -^DskipITs -^Dcheckstyle.skip=true -^Drat.skip=true -^Dhadoop.version=3.0.0-cdh6.3.2  -^Pflink-bundle-shade-hive2 -^Dscala-2.12 -^Pspark-shade-unbundle-avro`
+
 ## August/2nd/2022
 
 1. 淇敼浜?Hudi 涓?Flink 鏁版嵁娌夐檷鐩稿叧鐨勪富瑕佽繍琛屾祦绋嬶紝骞舵坊鍔犱簡璇稿鐢ㄤ簬杈呭姪鏂板姛鑳藉疄鐜扮殑绫诲睘鎬с€佺被鏂规硶鍜屽姛鑳藉嚱鏁帮紱
diff --git a/hudi-aws/pom.xml b/hudi-aws/pom.xml
index 34114fc..636b29c 100644
--- a/hudi-aws/pom.xml
+++ b/hudi-aws/pom.xml
@@ -116,6 +116,13 @@
             <artifactId>mockito-junit-jupiter</artifactId>
             <scope>test</scope>
         </dependency>
+
+        <dependency>
+            <groupId>com.google.code.findbugs</groupId>
+            <artifactId>jsr305</artifactId>
+            <version>3.0.0</version>
+        </dependency>
+
     </dependencies>
 
     <build>
diff --git a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
index e704a34..413e9ed 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
@@ -653,6 +653,40 @@ public class FlinkOptions extends HoodieConfig {
       .withDescription("INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp type.\n"
           + "Disabled by default for backward compatibility.");
 
+  // ------------------------------------------------------------------------
+  //  Kerberos Related Options
+  // ------------------------------------------------------------------------
+
+  public static final ConfigOption<Boolean> HIVE_SYNC_KERBEROS_ENABLE = ConfigOptions
+          .key("hive_sync.kerberos.enable")
+          .booleanType()
+          .defaultValue(false)
+          .withDescription("Whether hive is configured with kerberos");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KRB5CONF = ConfigOptions
+          .key("hive_sync.kerberos.krb5.conf")
+          .stringType()
+          .defaultValue("")
+          .withDescription("kerberos krb5.conf file path");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_PRINCIPAL = ConfigOptions
+          .key("hive_sync.kerberos.principal")
+          .stringType()
+          .defaultValue("")
+          .withDescription("hive metastore kerberos principal");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_FILE = ConfigOptions
+          .key("hive_sync.kerberos.keytab.file")
+          .stringType()
+          .defaultValue("")
+          .withDescription("Hive metastore keytab file path");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_NAME = ConfigOptions
+          .key("hive_sync.kerberos.keytab.name")
+          .stringType()
+          .defaultValue("")
+          .withDescription("Hive metastore keytab file name");
+
   // ------------------------------------------------------------------------
   //  Custom Flush related logic
   // ------------------------------------------------------------------------
diff --git a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
index 1c051c8..a1e1da3 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
@@ -86,6 +86,13 @@ public class HiveSyncContext {
     hiveSyncConfig.skipROSuffix = conf.getBoolean(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX);
     hiveSyncConfig.assumeDatePartitioning = conf.getBoolean(FlinkOptions.HIVE_SYNC_ASSUME_DATE_PARTITION);
     hiveSyncConfig.withOperationField = conf.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    // Kerberos Related Configurations
+    hiveSyncConfig.enableKerberos = conf.getBoolean(FlinkOptions.HIVE_SYNC_KERBEROS_ENABLE);
+    hiveSyncConfig.krb5Conf = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KRB5CONF);
+    hiveSyncConfig.principal = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_PRINCIPAL);
+    hiveSyncConfig.keytabFile = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_FILE);
+    hiveSyncConfig.keytabName = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_NAME);
+    // Kerberos Configs END
     return hiveSyncConfig;
   }
 }
diff --git a/hudi-hadoop-mr/pom.xml b/hudi-hadoop-mr/pom.xml
index df2a23b..e61dbd4 100644
--- a/hudi-hadoop-mr/pom.xml
+++ b/hudi-hadoop-mr/pom.xml
@@ -67,6 +67,17 @@
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-jdbc</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
index 9b63851..624300f 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
@@ -123,6 +123,22 @@ public class HiveSyncConfig implements Serializable {
   @Parameter(names = {"--conditional-sync"}, description = "If true, only sync on conditions like schema change or partition change.")
   public Boolean isConditionalSync = false;
 
+  // Kerberos Related Configuration
+  @Parameter(names = {"--enable-kerberos"}, description = "Whether hive configs kerberos")
+  public Boolean enableKerberos = false;
+
+  @Parameter(names = {"--krb5-conf"}, description = "krb5.conf file path")
+  public String krb5Conf = "/etc/krb5.conf";
+
+  @Parameter(names = {"--principal"}, description = "hive metastore principal")
+  public String principal = "hive/_HOST@EXAMPLE.COM";
+
+  @Parameter(names = {"--keytab-file"}, description = "hive metastore keytab file path")
+  public String keytabFile;
+
+  @Parameter(names = {"--keytab-name"}, description = "hive metastore keytab name")
+  public String keytabName;
+
   // enhance the similar function in child class
   public static HiveSyncConfig copy(HiveSyncConfig cfg) {
     HiveSyncConfig newConfig = new HiveSyncConfig();
@@ -147,6 +163,13 @@ public class HiveSyncConfig implements Serializable {
     newConfig.sparkSchemaLengthThreshold = cfg.sparkSchemaLengthThreshold;
     newConfig.withOperationField = cfg.withOperationField;
     newConfig.isConditionalSync = cfg.isConditionalSync;
+    // Kerberos Related Configs
+    newConfig.enableKerberos = cfg.enableKerberos;
+    newConfig.krb5Conf = cfg.krb5Conf;
+    newConfig.principal = cfg.principal;
+    newConfig.keytabFile = cfg.keytabFile;
+    newConfig.keytabName = cfg.keytabName;
+    // Kerberos Related Configs END
     return newConfig;
   }
 
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
index 3bbaee1..2fa0e86 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
@@ -38,6 +38,7 @@ import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 import org.apache.parquet.schema.GroupType;
@@ -45,6 +46,7 @@ import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.schema.PrimitiveType;
 import org.apache.parquet.schema.Type;
 
+import java.io.IOException;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -77,8 +79,23 @@ public class HiveSyncTool extends AbstractSyncTool {
     super(configuration.getAllProperties(), fs);
 
     try {
+
+      // Start Kerberos Processing Logic
+      if (cfg.enableKerberos) {
+        System.setProperty("java.security.krb5.conf", cfg.krb5Conf);
+        Configuration conf = new Configuration();
+        conf.set("hadoop.security.authentication", "kerberos");
+        conf.set("kerberos.principal", cfg.principal);
+        UserGroupInformation.setConfiguration(conf);
+        UserGroupInformation.loginUserFromKeytab(cfg.keytabName, cfg.keytabFile);
+        configuration.set(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL.varname, "true");
+        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname, cfg.principal);
+        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_KEYTAB_FILE.varname, cfg.keytabFile);
+      }
+
       this.hoodieHiveClient = new HoodieHiveClient(cfg, configuration, fs);
-    } catch (RuntimeException e) {
+    } catch (RuntimeException | IOException e) {
+      // Support IOException e
       if (cfg.ignoreExceptions) {
         LOG.error("Got runtime exception when hive syncing, but continuing as ignoreExceptions config is set ", e);
       } else {
diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index 470ad47..5b95ffb 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -352,8 +352,19 @@
           <groupId>org.eclipse.jetty.orbit</groupId>
           <artifactId>javax.servlet</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
+
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-service</artifactId>
diff --git a/packaging/hudi-flink-bundle/pom.xml b/packaging/hudi-flink-bundle/pom.xml
index fc8d183..27b52d3 100644
--- a/packaging/hudi-flink-bundle/pom.xml
+++ b/packaging/hudi-flink-bundle/pom.xml
@@ -139,6 +139,7 @@
                   <include>org.apache.hive:hive-service-rpc</include>
                   <include>org.apache.hive:hive-exec</include>
                   <include>org.apache.hive:hive-metastore</include>
+                  <include>org.apache.hive:hive-standalone-metastore</include>
                   <include>org.apache.hive:hive-jdbc</include>
                   <include>org.datanucleus:datanucleus-core</include>
                   <include>org.datanucleus:datanucleus-api-jdo</include>
@@ -442,6 +443,7 @@
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
+      <scope>${flink.bundle.hive.scope}</scope>
       <exclusions>
         <exclusion>
           <groupId>javax.mail</groupId>
@@ -503,8 +505,17 @@
           <groupId>org.eclipse.jetty</groupId>
           <artifactId>*</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-common</artifactId>
@@ -706,6 +717,12 @@
           <version>${hive.version}</version>
           <scope>${flink.bundle.hive.scope}</scope>
         </dependency>
+        <dependency>
+          <groupId>org.apache.hive</groupId>
+          <artifactId>hive-standalone-metastore</artifactId>
+          <version>${hive.version}</version>
+          <scope>${flink.bundle.hive.scope}</scope>
+        </dependency>
       </dependencies>
     </profile>
   </profiles>
diff --git a/packaging/hudi-kafka-connect-bundle/pom.xml b/packaging/hudi-kafka-connect-bundle/pom.xml
index d5f90db..8d1e1a4 100644
--- a/packaging/hudi-kafka-connect-bundle/pom.xml
+++ b/packaging/hudi-kafka-connect-bundle/pom.xml
@@ -306,6 +306,18 @@
             <artifactId>hive-jdbc</artifactId>
             <version>${hive.version}</version>
             <scope>${utilities.bundle.hive.scope}</scope>
+            <exclusions>
+                <exclusion>
+                    <groupId>org.apache.hadoop</groupId>
+                    <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+                </exclusion>
+            </exclusions>
+        </dependency>
+
+        <dependency>
+            <groupId>org.apache.hadoop</groupId>
+            <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+            <version>${hadoop.version}</version>
         </dependency>
 
         <dependency>
diff --git a/packaging/hudi-spark-bundle/pom.xml b/packaging/hudi-spark-bundle/pom.xml
index 3544e31..8dd216f 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -293,6 +293,18 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${spark.bundle.hive.scope}</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/packaging/hudi-utilities-bundle/pom.xml b/packaging/hudi-utilities-bundle/pom.xml
index a3da0a8..d5e944a 100644
--- a/packaging/hudi-utilities-bundle/pom.xml
+++ b/packaging/hudi-utilities-bundle/pom.xml
@@ -312,6 +312,18 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${utilities.bundle.hive.scope}</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/pom.xml b/pom.xml
index 58f6130..ff760bb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -44,20 +44,20 @@
     <module>hudi-timeline-service</module>
     <module>hudi-utilities</module>
     <module>hudi-sync</module>
-    <!--<module>packaging/hudi-hadoop-mr-bundle</module>-->
-    <!--<module>packaging/hudi-hive-sync-bundle</module>-->
-    <!--<module>packaging/hudi-spark-bundle</module>-->
-    <!--<module>packaging/hudi-presto-bundle</module>-->
-    <!--<module>packaging/hudi-utilities-bundle</module>-->
-    <!--<module>packaging/hudi-timeline-server-bundle</module>-->
-    <!--<module>docker/hoodie/hadoop</module>-->
-    <!--<module>hudi-integ-test</module>-->
-    <!--<module>packaging/hudi-integ-test-bundle</module>-->
-    <!--<module>hudi-examples</module>-->
+    <module>packaging/hudi-hadoop-mr-bundle</module>
+    <module>packaging/hudi-hive-sync-bundle</module>
+    <module>packaging/hudi-spark-bundle</module>
+    <module>packaging/hudi-presto-bundle</module>
+    <module>packaging/hudi-utilities-bundle</module>
+    <module>packaging/hudi-timeline-server-bundle</module>
+    <module>docker/hoodie/hadoop</module>
+<!--    <module>hudi-integ-test</module>-->
+<!--    <module>packaging/hudi-integ-test-bundle</module>-->
+    <module>hudi-examples</module>
     <module>hudi-flink</module>
     <module>hudi-kafka-connect</module>
     <module>packaging/hudi-flink-bundle</module>
-    <!--<module>packaging/hudi-kafka-connect-bundle</module>-->
+    <module>packaging/hudi-kafka-connect-bundle</module>
   </modules>
 
   <licenses>
@@ -1084,6 +1084,10 @@
       <id>confluent</id>
       <url>https://packages.confluent.io/maven/</url>
     </repository>
+    <repository>
+      <id>hdp</id>
+      <url>https://repo.hortonworks.com/content/repositories/releases/</url>
+    </repository>
   </repositories>
 
   <profiles>

编译命令

使用如下命令进行编译,并将编译出的包放置到集群的相应位置。

编译命令如下:

mvn clean install -^DskipTests -^Dcheckstyle.skip=true -^Dmaven.test.skip=true -^DskipITs -^Dhadoop.version=3.0.0-cdh6.3.2 -^Dhive.version=3.1.2 -^Dscala.version=2.12.10 -^Dscala.binary.version=2.12 -^Dflink.version=1.13.2 -^Pflink-bundle-shade-hive3

与 Hudi, Flink 和 Hive 相关的 Jar 包如下所示。共计三个:

packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.10.0.jar
packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.10.0.jar
packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-0.10.0.jar

环境部署

分别放置于集群环境的如下位置:

  1. 于 $HIVE_HOME/auxlib 中放置 hudi-hive-sync-bundle-0.10.0.jarhudi-hadoop-mr-bundle-0.10.0.jar

    cd $HIVE_HOME/auxlib
    ls ./
    hudi-hive-sync-bundle-0.10.0.jar
    hudi-hadoop-mr-bundle-0.10.0.jar

  2. 于 $FLINK_HOME/lib 中放置 hudi-flink-bundle_2.12-0.10.0.jar

    cd $FLINK_HOME/lib
    ls ./
    ...
    hudi-flink-bundle_2.12-0.10.0.jar
    ...

相关推荐
Ase5gqe2 小时前
大数据-259 离线数仓 - Griffin架构 修改配置 pom.xml sparkProperties 编译启动
xml·大数据·架构
村口蹲点的阿三2 小时前
Spark SQL 中对 Map 类型的操作函数
javascript·数据库·hive·sql·spark
史嘉庆2 小时前
Pandas 数据分析(二)【股票数据】
大数据·数据分析·pandas
唯余木叶下弦声3 小时前
PySpark之金融数据分析(Spark RDD、SQL练习题)
大数据·python·sql·数据分析·spark·pyspark
重生之Java再爱我一次4 小时前
Hadoop集群搭建
大数据·hadoop·分布式
豪越大豪6 小时前
2024年智慧消防一体化安全管控年度回顾与2025年预测
大数据·科技·运维开发
互联网资讯6 小时前
详解共享WiFi小程序怎么弄!
大数据·运维·网络·人工智能·小程序·生活
AI2AGI8 小时前
天天AI-20250121:全面解读 AI 实践课程:动手学大模型(含PDF课件)
大数据·人工智能·百度·ai·文心一言
雪芽蓝域zzs8 小时前
JavaWeb开发(十五)实战-生鲜后台管理系统(二)注册、登录、记住密码
数据仓库·hive·hadoop
贾贾20238 小时前
配电自动化中的进线监控技术
大数据·运维·网络·自动化·能源·制造·信息与通信