国产化适配鲲鹏arm环境:hive on tez 单节点部署实践总结

国产化适配鲲鹏arm环境:hive on tez 单节点部署实践总结

本文详细指导如何在已安装Java的鲲鹏服务器环境下,从下载Hadoop开始,依次配置环境变量、YARN和HDFS设置,然后安装并配置Hive数据库和tez单机版,最后验证服务运行。涉及关键步骤如配置文件和启动服务。

准备

下载相关包

环境要求

硬件要求

最低配置:任意CPU、一根内存(大小不限)、一块硬盘(大小不限)。

具体配置视实际应用场景而定。

操作系统和软件要求

项目 版本 说明 管理节点(Server) 计算/存储节点
操作系统 - openEuler 22.03 LTS SP1 -
JDK BiSheng JDK 1.8(优选BiSheng JDK 1.8.0_342) openEuler 22.03 LTS SP1与BiSheng JDK 1.8.0_262不兼容,需更换为BiSheng JDK 1.8.0_342

部署hadoop

环境变量

shell 复制代码
#hadoop
export HADOOP_HOME=/home/kpnre/software/hadoop-3.2.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
#java
export JAVA_HOME=/home/kpnre/software/bisheng-jdk1.8.0_452
export PATH=$PATH:$JAVA_HOME/bin

配置文件

/hadoop-3.2.0/etc/hadoop/hdfs-site.xml

xml 复制代码
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <!-- namenode web端访问地址 -->
<property>
    <name>dfs.namenode.http-address</name>
    <value>10.x.x.29:9870</value>
</property>
<property>
   <name>hadoop.proxyuser.kpnre.hosts</name>
   <value>*</value>
</property>
<property>
   <name>hadoop.proxyuser.kpnre.groups</name>
   <value>*</value>
</property>
</configuration>

hadoop-3.2.0/etc/hadoop/yarn-site.xml

xml 复制代码
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>10.19.198.29</value>
    </property>
<property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>16384</value>
    </property>

        <property>
                <name>yarn.nodemanager.pmem-check-enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>yarn.nodemanager.vmem-check-enabled</name>
                <value>false</value>
        </property>
 <!-- 启用 YARN 日志聚合 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <!-- 日志保留时间(单位:秒) -->
    <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>604800</value>
    </property>
<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>1024</value>
</property>
<property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>16384</value>
</property>
</configuration>

启动hadoop

shell 复制代码
start-all.sh

验证

在浏览器中输入URL地址,访问Hadoop Web页面,URL格式为"http://server1:50070"。

部署hive

前置条件

mysql安装完成

  • 将mysql-connector-java放进lib
shell 复制代码
cp /usr/share/java/mysql-connector-java-5.1.48.jar /home/kpnre/software/apache-hive-3.1.2-bin/lib && cd /home/kpnre/software/apache-hive-3.1.2-bin/lib && sudo chown hadoop:hadoop mysql-connector-java-5.1.48.jar
  • 配置hive云数据库
mysql 复制代码
create database hive character set utf8;
grant all on hiveb.* to 'root'@'hive' identified by 'passwd';
flush privileges;
  • 初始化数据库
shell 复制代码
cd /home/kpnre/software/apache-hive-3.1.2-bin
bin/schematool --dbType mysql --initSchema

环境变量

shell 复制代码
export HIVE_HOME=/home/kpnre/software/apache-hive-3.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin

配置文件

apache-hive-3.1.1-bin/conf/hive-site.xml

xml 复制代码
<configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
  <!-- Hive Execution Parameters -->
  <property>
    <name>hive.exec.script.wrapper</name>
    <value/>
    <description/>
  </property>
  <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>******</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.ds.connection.url.hook</name>
    <value/>
    <description>Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used</description>
  </property>
  <property>
    <name>javax.jdo.option.Multithreaded</name>
    <value>true</value>
    <description>Set this to true if multiple threads access metastore through JDO concurrently.</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://10.x.x.29:3306/hive?useSSL=false</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>

  <property>
    <name>hive.server2.thrift.client.connect.retry.limit</name>
    <value>10</value>
    <description>Number of retries while opening a connection to HiveServe2</description>
  </property>
<property>
    <name>tez.queue.name</name>
    <value>default</value> <!-- 根据你的 YARN 队列名称进行设置 -->
</property>
<property>
    <name>tez.session.am.dag.submit.timeout.secs</name>
    <value>0</value> <!-- 根据你的资源需求设置 -->
</property>
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>
</configuration>

启动hive

shell 复制代码
hive --service metastore &

验证

  1. 查看端口。默认hiveserver2的端口号为10000。

    perl 复制代码
    netstat -anp|grep 10000

    如下所示即为启动成功。

    bash 复制代码
    tcp6 0 0 :::10000 :::* LISTEN 27800/java
  2. 在server1使用beeline连接。

    bash 复制代码
    ```
    beeline -u jdbc:hive2://server1:10000 -n root
    ```
    
    回显信息如下。
    
    ```
    SLF4J: Class path contains multiple SLF4J bindings.
     SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
     SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
     SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
     SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
     Connecting to jdbc:hive2://server1:10000
     Connected to: Apache Hive (version 3.1.0)
     Driver: Hive JDBC (version 3.1.0)
     Transaction isolation: TRANSACTION_REPEATABLE_READ
     Beeline version 3.1.0 by Apache Hive
    0: jdbc:hive2://server1:10000>
    ```
  3. 查看已创建的数据库。

    ini 复制代码
    show databases;

    如下所示,即Hive启动正常成功。

    ini 复制代码
    0: jdbc:hive2://server1:10000> show databases;
     INFO : Compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases
     INFO : Concurrency mode is disabled, not creating a lock manager
     INFO : Semantic Analysis Completed (retrial = false)
     INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
     INFO : Completed compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.903 seconds
     INFO : Concurrency mode is disabled, not creating a lock manager
     INFO : Executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases
     INFO : Starting task [Stage-0:DDL] in serial mode
     INFO : Completed executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.029 seconds
     INFO : OK
     INFO : Concurrency mode is disabled, not creating a lock manager
     +----------------+
     | database_name |
     +----------------+
     | default |
     +----------------+
     1 row selected (1.248 seconds)

部署tez

配置文件

hadoop-3.2.0/etc/hadoop/hadoop-env.sh

shell 复制代码
export TEZ_CONF_DIR=/home/kpnre/software/tez-0.10.0/conf/tez-site.xml
export TEZ_JARS=/home/kpnre/software/tez-0.10.0
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

tez-0.10.0/conf/tez-site.xml

xml 复制代码
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 注意你的路径以及文件名是否和我的一样 -->
<property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/tez/tez.tar.gz</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>false</value>
</property>
<property>
     <name>tez.am.resource.cpu.vcores</name>
     <value>1</value>
</property>
<property>
     <name>tez.task.resource.memory.mb</name>
     <value>2048</value>
</property>
<property>
     <name>tez.task.resource.cpu.vcores</name>
     <value>1</value>
</property>
<property>
     <name>hive.tez.container.size</name>
     <value>2048</value>
</property>
<property>
     <name>tez.am.resource.memory.mb</name>
     <value>2048</value>
</property>
</configuration>

验证

  1. 使用Tez引擎。

    ini 复制代码
    set hive.execution.engine=tez;
  2. 创建student表。

    c 复制代码
    create table student(id int, name string);
  3. 向表中插入数据。

    sql 复制代码
    insert into student values(1,"zhangsan");
    insert into student values(2,"lisi");
  4. 查询student表,如下图所示,查询结果无误即安装成功。

    csharp 复制代码
    select * from student;

参考

hadoop+hive单机部署 鲲鹏BoostKit大数据使能套件

遇到问题及解决方法

国产化适配鲲鹏arm环境:hive on tez 单节点部署实践问题总结

相关推荐
Doris_LMS2 小时前
在Linux下安装nginx(保姆级别)
linux·运维·nginx·centos
超級二蓋茨4 小时前
在 CentOS 上安装 FFmpeg
linux·ffmpeg·centos
obboda5 小时前
Linux基础复习:字符输入与输出
linux·运维·服务器
*wj6 小时前
【linux驱动开发】Vscode + Remote SSH + clangd + bear=内核源码阅读环境搭建
linux·驱动开发·vscode
FED_AF8 小时前
Linux救援模式之应用篇
linux·运维
拾心218 小时前
【运维基础】Linux 进程调度管理
linux·运维·服务器
橘颂TA8 小时前
【Linux】的起源 and 3秒学习11个基本指令
linux·指令·linux的历史
哈哈很哈哈9 小时前
Hadoop JMX 配置的完整文档
大数据·hadoop·分布式
~狂想家~9 小时前
Ubuntu20.04安装和配置Samba实现Win11下共享文件夹
linux·ubuntu·samba
ansondroider9 小时前
Ubuntu 抽取系统制作便于chroot的镜像文件
linux·ubuntu·chroot