大数据环境搭建@Hive编译

Hive3.1.3编译

各组件版本选择:
hadoop-3.3.2
hive-3.1.3
spark-3.3.4 Scala version 2.12.15 (spark-3.3.4依赖hadoop-3.3.2)

1.编译原因

1.1Guava依赖冲突

shell 复制代码
tail -200 /tmp/root/hive.log > /home/log/hive-200.log

hive的github地址

https://github.com/apache/hive

查询guava依赖

https://github.com/apache/hive/blob/rel/release-3.1.3/pom.xml

<guava.version>19.0</guava.version>

hadoop的github地址

https://github.com/apache/hadoop

查询guava依赖

https://github.com/apache/hadoop/blob/rel/release-3.3.2/hadoop-project/pom.xml

<guava.version>27.0-jre</guava.version>

1.2开启MetaStore后运行有StatsTask报错

复制代码
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
MapReduce Jobs Launched: 

1.3Spark版本过低

Hive3.1.3默认支持Spark2.3.0,版本过低很多新的高效方法都没用到,所以替换为spark-3.3.4(Hadoop3.3.2支持的最高spark版本)

xml 复制代码
<spark.version>2.3.0</spark.version>

2.环境部署

2.1jdk安装

已安装1.8版本

shell 复制代码
(base) [root@bigdata01 opt]# java -version
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

2.2maven部署

下载3.6.3安装包 https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/

shell 复制代码
(base) [root@bigdata01 ~]# cd /opt
(base) [root@bigdata01 opt]# tar -zxvf apache-maven-3.6.3-bin.tar.gz
(base) [root@bigdata01 opt]# mv apache-maven-3.6.3 maven
(base) [root@bigdata01 opt]# vim /etc/profile
# 增加MAVEN_HOME
export MAVEN_HOME=/opt/maven
export PATH=$MAVEN_HOME/bin:$PATH
(base) [root@bigdata01 opt]# source /etc/profile

监测 maven 是否安装成功

shell 复制代码
(base) [root@bigdata01 opt]# mvn -version
Apache Maven 3.6.3 ()
Maven home: /opt/maven
Java version: 1.8.0_301, vendor: Oracle Corporation, runtime: /opt/jdk/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "4.18.0-365.el8.x86_64", arch: "amd64", family: "unix"

配置仓库镜像,阿里云公共仓库

vim /opt/maven/conf/settings.xml

xml 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <localRepository>/repo</localRepository>
  <mirrors>
	<mirror>
		<id>aliyunmaven</id>
		<mirrorOf>central</mirrorOf>
		<name>阿里云公共仓库</name>
		<url>https://maven.aliyun.com/repository/public</url>
	</mirror>
  </mirrors>
</settings>

2.3安装图形化桌面

采用带图形界面的Centos,卸载多余的jdk,避免版本冲突。这里的操作非常重要,不处理会报稀奇古怪的错误。例如:Fatal error compiling: 无效的目标发行版: 1.11 ,报错java1.11没有这个版本,即使升了java11也没有用。

找到多余安装的jdk

shell 复制代码
(base) [root@bigdata01 ~]# yum list installed |grep jdk
copy-jdk-configs.noarch                            4.0-2.el8                                                  @appstream        
java-1.8.0-openjdk.x86_64                          1:1.8.0.362.b08-3.el8                                      @appstream        
java-1.8.0-openjdk-devel.x86_64                    1:1.8.0.362.b08-3.el8                                      @appstream        
java-1.8.0-openjdk-headless.x86_64                 1:1.8.0.362.b08-3.el8                                      @appstream

卸载多余安装的jdk

shell 复制代码
(base) [root@bigdata01 ~]# yum remove -y copy-jdk-configs.noarch java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64 java-1.8.0-openjdk-headless.x86_64

验证当前的jdk

shell 复制代码
(base) [root@bigdata01 ~]# java -version
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

2.4安装Git

安装第三方仓库

shell 复制代码
(base) [root@bigdata01 opt]# yum install https://repo.ius.io/ius-release-el7.rpm https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

安装Git

shell 复制代码
(base) [root@bigdata01 opt]# yum install -y git

git版本检查

shell 复制代码
(base) [root@bigdata01 ~]# git -v
git version 2.43.0

2.5安装IDEA

https://download.jetbrains.com.cn/idea/ideaIU-2021.1.3.tar.gz 下载 linux版

shell 复制代码
(base) [root@bigdata01 opt]# tar -zxvf ideaIU-2021.1.3.tar.gz

启动IDEA,启动图形化界面要在VMware中

shell 复制代码
cd /opt/idea-IU-211.7628.21
./bin/idea.sh

这里试用30天

配置 maven,settings.xml中已配置阿里云公共仓库地址

设置 idea 快捷图标(这里的 bluetooth-sendto.desktop 是随便复制了一个,可以任意换)

shell 复制代码
(base) [root@bigdata01 bin]# cd /usr/share/applications
(base) [root@bigdata01 applications]# cp bluetooth-sendto.desktop idea.desktop
(base) [root@bigdata01 applications]# vim idea.desktop
# 删掉原有的,补充这个内容
[Desktop Entry]
Name=idea
Exec=sh /opt/idea-IU-211.7628.21/bin/idea.sh
Terminal=false
Type=Application
Icon=/opt/idea-IU-211.7628.21/bin/idea.png
Comment=idea
Categories=Application;

3.拉取Hive源码

Get from VCS拉取hive源码,拉取的全过程大约需要1小时

配置URL https://gitee.com/apache/hive.git,并设置文件地址

信任项目 后注意配置,这里按图填,否则容易jdk版本异常造成错误(-Xmx2048m)

加载hive3.1.3

并建立分支 slash-hive-3.1.3

4.Hive源码编译

4.1环境测试

1.测试方法------编译

https://hive.apache.org/development/gettingstarted/ 点击Getting Started Guide

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource 点击Building Hive from Source

获得编码方式,执行在 Terminal 终端执行,运行成功的7min左右

shell 复制代码
mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

2.问题及解决方案

💥问题1:下载不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar

ERROR\] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3 Downloading from conjars: http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom 下载不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar > 这个问题是一个已知问题,它是由于Pentaho公司的Maven存储库服务器已被永久关闭,所以无法从该仓库获取它的依赖项导致的。 > > 解决方案,先修改 /opt/maven/conf/setting.xml 文件如下 > > ```xml > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> > /repo > > > > > aliyunmaven > * > spring-plugin > https://maven.aliyun.com/repository/spring-plugin > > > aliyunmaven > central > 阿里云公共仓库 > https://maven.aliyun.com/repository/public > > > > ``` > > 成功下载 /org/pentaho/ 相关内容后再改回去!!!! ##### 💥问题2:阿里云镜像没有被使用 \[ERROR\] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3 Downloading from conjars: https://maven.glassfish.org/content/groups/glassfish/asm/asm/3.1/asm-3.1.jar 下载不到 asm-3.1.jar > 修改/opt/maven/conf/settings.xml文件,之前的阿里云镜像没有被使用。复制如下内容,覆盖整个settings.xml文件 > > ```xml > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> > /repo > > > aliyunmaven > central > 阿里云公共仓库 > https://maven.aliyun.com/repository/public > > > > ``` > > 配置后重启服务,阿里云镜像被成功使用 ##### 💥问题3:jdk版本冲突,\<2.环境部署\>做的不细致 \[ERROR\] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project hive-upgrade-acid: Fatal error compiling: 无效的目标发行版: 1.11 -\> \[Help 1

报这个错误是jdk版本冲突了,Linux版尽管 java -version 都显示了 1.8版本,但图形化、IDEA没做处理就会有很多jdk存在,需要做的就是重新做<2.环境部署><3.拉取Hive源码>

4.2解决Guava版本冲突问题

1.修改内容

修改pom.xml中的guava.version的版本为 27.0-jre

xml 复制代码
# 原来版本
<guava.version>19.0</guava.version>
# 修改后版本
<guava.version>27.0-jre</guava.version>

修改版本后执行编译 mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

结果保存:/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz

2.问题及解决方案

💥问题1:Futures.addCallback()方法27.0-jre中3个参数,19.0中2个参数
复制代码
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-llap-common: Compilation failure: Compilation failure: 
[ERROR] /home/slash/hive/llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java:[173,16] 无法将类 com.google.common.util.concurrent.Futures中的方法 addCallback应用到给定类型;
[ERROR]   需要: com.google.common.util.concurrent.ListenableFuture<V>,com.google.common.util.concurrent.FutureCallback<? super V>,java.util.concurrent.Executor
[ERROR]   找到: com.google.common.util.concurrent.ListenableFuture<U>,org.apache.hadoop.hive.llap.AsyncPbRpcProxy.ResponseCallback<U>
[ERROR]   原因: 无法推断类型变量 V
[ERROR]     (实际参数列表和形式参数列表长度不同)

修改 Futures.addCallback(),为其增加第3个参数,MoreExecutors.directExecutor(),这个修改大概15处,方法相同

java 复制代码
// 原来的
@VisibleForTesting
      <T extends Message , U extends Message> void submitToExecutor(
          CallableRequest<T, U> request, LlapNodeId nodeId) {
        ListenableFuture<U> future = executor.submit(request);
        Futures.addCallback(future, new ResponseCallback<U>(
            request.getCallback(), nodeId, this));
      }

// 修改后的
@VisibleForTesting
      <T extends Message , U extends Message> void submitToExecutor(
          CallableRequest<T, U> request, LlapNodeId nodeId) {
        ListenableFuture<U> future = executor.submit(request);
        Futures.addCallback(future, new ResponseCallback<U>(
            request.getCallback(), nodeId, this),MoreExecutors.directExecutor());
      }

过程中如果出现 "找不到MoreExecutors方法"的问题可以手动 import 这个方法,具体方法可以拷贝其他文件中的 import

💥问题2:Iterators的 emptyIterator 方法过时了
复制代码
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-druid-handler: Compilation failure
[ERROR] /home/slash/hive/druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidScanQueryRecordReader.java:[46,61] <T>emptyIterator()在com.google.common.collect.Iterators中不是公共的; 无法从外部程序包中对其进行访问

修改Iterators中的emptyIterator方法

shell 复制代码
# org.apache.hadoop.hive.druid.serde.DruidScanQueryRecordReader
# 原始代码
  private Iterator<List<Object>> compactedValues = Iterators.emptyIterator();
# 修改后代码
  private Iterator<List<Object>> compactedValues = ImmutableSet.<List<Object>>of().iterator();

4.3开启MetaStore之后StatsTask报错

1.修改内容

shell 复制代码
# 错误信息
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
MapReduce Jobs Launched: 

# 错误日志 /tmp/root/hive.log
exec.StatsTask: Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransp
ortException

错误分析见 https://issues.apache.org/jira/browse/HIVE-19316

IDEA点击 Cherry-pick,将StatsTask fails due to ClassCastException的补丁合并到当前分支

修改版本后执行编译 mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

结果保存:/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz

2.问题及解决方案

💥问题1:cherry-pick失败

Cherry-pick failed

3d21bc38 HIVE-19316: StatsTask fails due to ClassCastException (Jaume Marhuenda, reviewed by Jesus Camacho Rodriguez)

Committer identity unknown

*** Please tell me who you are.

Run

git config --global user.email "you@example.com"

git config --global user.name "Your Name"

to set your account's default identity.

Omit --global to set the identity only in this repository.

unable to auto-detect email address (got 'root@bigdata01.(none)')

需要提交修复的版本信息

Cherry-pick failed

3d21bc38 HIVE-19316: StatsTask fails due toClassCastException (Jaume Marhuenda, reviewedby Jesus Camacho Rodriguez)your local changes would be overwritten bycherry-pick.hint: commit your changes or stash them toproceed.cherry-pick failed

工作目录中已经存在一些未提交的更改。git 不允许在未提交更改的情况下进行 cherry-pick

shell 复制代码
# 提交修复的版本信息
git config --global user.email "写个邮箱地址"
git config --global user.name "slash"

# 添加并commit提交
git add .
git commit -m "resolve conflict  guava"

4.4Spark兼容问题

1.修改内容

修改pom.xml中的 spark.version、scala.version、hadoop.version

xml 复制代码
<!-- 原始代码 -->
<spark.version>2.3.0</spark.version>
<scala.binary.version>2.11</scala.binary.version>
<scala.version>2.11.8</scala.version>

<!-- 修改后代码 -->
<spark.version>3.3.4</spark.version>
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.15</scala.version>

spark中消除部分hadoop依赖,hive3.1.3依赖的是hadoop3.1.0,不同于spark-3.3.4依赖hadoop3.3.2,不用改hive pom的hadoop依赖

xml 复制代码
<!-- 修改后代码 -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
    <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client-api</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client-runtime</artifactId>
    </exclusion>
</exclusions>

2.问题及解决方案

💥问题1:SparkCounter中方法过时,需要替换
复制代码
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-spark-client: Compilation failure: Compilation failure: 
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[22,24] 找不到符号
[ERROR]   符号:   类 Accumulator
[ERROR]   位置: 程序包 org.apache.spark
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[23,24] 找不到符号
[ERROR]   符号:   类 AccumulatorParam
[ERROR]   位置: 程序包 org.apache.spark
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[30,11] 找不到符号
[ERROR]   符号:   类 Accumulator
[ERROR]   位置: 类 org.apache.hive.spark.counter.SparkCounter
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[91,41] 找不到符号
[ERROR]   符号:   类 AccumulatorParam
[ERROR]   位置: 类 org.apache.hive.spark.counter.SparkCounter

移除无用的方法,并修改相关内容,最终结果如下

java 复制代码
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 * <p/>
 * http://www.apache.org/licenses/LICENSE-2.0
 * <p/>
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hive.spark.counter;

import java.io.Serializable;

import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.util.LongAccumulator;

public class SparkCounter implements Serializable {

  private String name;
  private String displayName;
  private LongAccumulator accumulator;

  // Values of accumulators can only be read on the SparkContext side. This field is used when
  // creating a snapshot to be sent to the RSC client.
  private long accumValue;

  public SparkCounter() {
    // For serialization.
  }

  private SparkCounter(
      String name,
      String displayName,
      long value) {
    this.name = name;
    this.displayName = displayName;
    this.accumValue = value;
  }

  public SparkCounter(
    String name,
    String displayName,
    String groupName,
    long initValue,
    JavaSparkContext sparkContext) {

    this.name = name;
    this.displayName = displayName;
    String accumulatorName = groupName + "_" + name;
    this.accumulator = sparkContext.sc().longAccumulator(accumulatorName);
    this.accumulator.setValue(initValue);
  }

  public long getValue() {
    if (accumulator != null) {
      return accumulator.value();
    } else {
      return accumValue;
    }
  }

  public void increment(long incr) {
    accumulator.add(incr);
  }

  public String getName() {
    return name;
  }

  public String getDisplayName() {
    return displayName;
  }

  public void setDisplayName(String displayName) {
    this.displayName = displayName;
  }

  SparkCounter snapshot() {
    return new SparkCounter(name, displayName, accumulator.value());
  }

}
💥问题2:ShuffleWriteMetrics中方法过时,需要替换
复制代码
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-spark-client: Compilation failure: Compilation failure: 
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java:[50,39] 找不到符号
[ERROR]   符号:   方法 shuffleBytesWritten()
[ERROR]   位置: 类 org.apache.spark.executor.ShuffleWriteMetrics
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java:[51,36] 找不到符号
[ERROR]   符号:   方法 shuffleWriteTime()
[ERROR]   位置: 类 org.apache.spark.executor.ShuffleWriteMetrics

修改相关方法

java 复制代码
// 原始代码
public ShuffleWriteMetrics(TaskMetrics metrics) {
    this(metrics.shuffleWriteMetrics().shuffleBytesWritten(),
      metrics.shuffleWriteMetrics().shuffleWriteTime());
  }
    
    
// 修改后
public ShuffleWriteMetrics(TaskMetrics metrics) {
    this(metrics.shuffleWriteMetrics().bytesWritten(),
      metrics.shuffleWriteMetrics().writeTime());
  }
💥问题3:TestStatsUtils中方法过时,需要替换
复制代码
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:testCompile (default-testCompile) on project hive-exec: Compilation failure
[ERROR] /home/slash/hive/ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUtils.java:[34,39] 程序包org.spark_project.guava.collect不存在

修改相关方法

java 复制代码
// 原始代码
import org.spark_project.guava.collect.Sets;

// 修改后
import org.sparkproject.guava.collect.Sets;

4.5编译成功

mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

Hive3.1.3-spark-3.3.4-hadoop-3.3.2编译成功,结果保存:/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz


声明:本文所载信息不保证准确性和完整性。文中所述内容和意见仅供参考,不构成实际商业建议,可收藏可转发但请勿转载,如有雷同纯属巧合

相关推荐
老蒋新思维7 小时前
创客匠人峰会深度解析:知识变现的 “IP 资产化” 革命 —— 从 “运营流量” 到 “沉淀资产” 的长期增长逻辑
大数据·人工智能·网络协议·tcp/ip·创始人ip·创客匠人·知识变现
老蒋新思维7 小时前
创客匠人峰会洞察:IP 信任为基,AI 效率为翼,知识变现的可持续增长模型
大数据·网络·人工智能·网络协议·tcp/ip·创始人ip·创客匠人
玖日大大8 小时前
ModelEngine 可视化编排实战:从智能会议助手到企业级 AI 应用构建全指南
大数据·人工智能·算法
TDengine (老段)8 小时前
TDengine 数据缓存架构及使用详解
大数据·物联网·缓存·架构·时序数据库·tdengine·涛思数据
hans汉斯8 小时前
【软件工程与应用】基于大数据的应急救援云平台构建应用研究
大数据·数据库·人工智能·物联网·系统架构·云计算·汉斯出版社
秋刀鱼 ..8 小时前
2026生物神经网络与智能优化国际研讨会(BNNIO 2026)
大数据·python·计算机网络·数学建模·制造
AI优秘企业大脑8 小时前
增长智能体助力企业智慧转型
大数据·人工智能
正在走向自律8 小时前
时序数据库选型指南,从大数据视角看新一代列式存储引擎的核心优势
大数据·时序数据库·iotdb·国产数据库
艾莉丝努力练剑8 小时前
【Linux基础开发工具 (七)】Git 版本管理全流程与 GDB / CGDB 调试技巧
大数据·linux·运维·服务器·git·安全·elasticsearch
yuguo.im8 小时前
Elasticsearch 的倒排索引原理
大数据·elasticsearch·搜索引擎