一、环境与软件准备
说明:服务器已用主机名代替,可根据自己的需求,改为IP地址
环境
服务器 | 组件 |
---|---|
master | NameNode、DataNode、Nodemanager、ResourceManager、Hive、Hive的metastore、Hive的hiveserver2、mysql |
Secondary | SecondaryNameNode、DataNode、NodeManager |
Datanode | DataNode、NodeManager、Hive的beeline访问方式 |
1、java版本1.8
powershell
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
linux$:] cd /soft
linux$:] tar -zxvf jdk-8u321-linux-x64.tar.gz
linux$:] cp -r jdk1.8.0_321 /usr/bin/jdk
linux$:] vi /etc/profile
export JAVA_HOME=/usr/bin/jdk # jdk1.8.0_311为解压缩的目录名称
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib
linux$:] source /etc/profile
2、Rsync CentOS中默认存在
3、zstd、openssl、autoconf、automake、libtool、ca-certificates安装
powershell
linux$:] yum -y install zstd,yum -y install openssl-devel autoconf automake libtool ca-certificates
4、ISA-L
powershell
下载地址:https://github.com/intel/isa-l
linux$:] cd /soft
linux$:] unzip master.zip
linux$:] cd master
linux$:] ./autogen.sh
linux$:] ./configure
linux$:] make && make install && make -f Makefile.unx
其它操作,可省略(后面有解释)
make check : create and run tests
make tests : create additional unit tests
make perfs : create included performance tests
make ex : build examples
make other : build other utilities such as compression file tests
make doc : build API manual
5、nasm与yasm
powershell
yasm组件
linux$:] curl -O -L http://www.tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz
linux$:] tar -zxvf yasm-1.3.0.tar.gz
linux$:] cd yasm
linux$:] ./configure;make -j 8;make install
nasm组件
linux$:] wget http://www.nasm.us/pub/nasm/releasebuilds/2.14.02/nasm-2.14.02.tar.xz
linux$:] cd nasm
linux$:] tar xf nasm-2.14.02.tar.xz
linux$:] ./configure;make -j 8;make install
6、ssh
powershell
linux$:] ssh-keygen -t rsa
所有主机之间互通后,本机与本机间也需要进行
linux$:] ssh-copy-id -i ~/.ssh/id_rsa.pub root@IP
7、hadoop
powershell
官网地址:https://hadoop.apache.org/
【Getting started】=>【Download】=>【Apache Download Mirrors】=>【HTTP】
linux$:] cd /soft
linux$:] wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
linux$:] tar -zxvf hadoop-3.3.1.tar.gz
linux$:] mv hadoop-3.3.1 hadoop
8、Linux环境变量配置
powershell
linux$:] vi /etc/hosts
IP地址 Master
IP地址 Secondary
IP地址 Datanode
linux$:] vi /etc/profile
export JAVA_HOME=/usr/bin/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib
export HADOOP_HOME=/soft/hadoop #配置Hadoop安装路径
export PATH=$HADOOP_HOME/bin:$PATH #配置Hadoop的hdfs命令路径
export PATH=$HADOOP_HOME/sbin:$PATH #配置Hadoop的命令路径
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
linux$:] source /etc/profile
9、hadoop的各类文件配置
powershell
配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/bin/jdk
配置文件信息【可一条命令启动以下全部机器start-all.sh/stop-all.sh】
linux$:] vi /soft/hadoop/etc/hadoop/workers
Master
Secondary
Datanode
配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/core-site.xml
<configuration>
<!-- hdfs访问地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<!-- hadoop运行时临时文件存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<!-- hadoop验证 -->
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<!-- hadoop代理用户,主机用户是root,可自定义 -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<!-- hadoop代理用户组,主机用户组是root,可自定义 -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<!-- namenode Linux本地信息存储路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop/namenodedata</value>
</property>
<!-- 定义块大小 -->
<property>
<name>dfs.blocksize</name>
<value>256M</value>
</property>
<!-- namenode能处理的来之datanode 节点的Threads -->
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<!-- datanode Linux 本地存储路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop/datanodedata</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- hdfs启动时,不启动的机器 -->
<property>
<name>dfs.hosts.exclude</name>
<value>/soft/hadoop/etc/hadoop/workers.exclude</value>
</property>
<!-- 指定Secondary服务器,不指定则默认有NodeName同一主机 -->
<property>
<name>dfs.secondary.http.address</name>
<value>econdary:50070</value>
</property>
<!-- hdfs权限验证 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>125</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx512M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx512M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>125</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/hadoop/hislog</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/hadoop/hisloging</value>
</property>
配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.admin.acl</name>
<value>*</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>4</value>
</property>
<property>
<name>yarn.scheduler.maxmum-allocation-mb</name>
<value>125</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/temppackage</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retian-check-interval-seconds</name>
<value> -1 </value>
</property>
<property>
<name>yarn.resourcemanage.node.exclude-path</name>
<value>/soft/hadoop/etc/hadoop/workers.exclude</value>
</property>
</configuration>
二、启动hadoop集群
powershell
$HADOOP_HOME/bin/hdfs namenode -format
start-all.sh
$HADOOP_HOME/bin/yarn --daemon start proxyserver
$HADOOP_HOME/bin/mapred --daemon start historyserver
四、webapp访问
hdfs
http://Master:9870/
yarn_node
http://Master:8088/
三、Hive的安装
1、Mysql的安装
powershell
linux$:] touch /etc/yum.repos.d/mysql.repo
linux$:] cat >/etc/yum.repos.d/mysql.repo <<EOF
[mysql57-community]
name=MySQL 5.7 Community Server
baseurl=https://mirrors.cloud.tencent.com/mysql/yum/mysql-5.7-community-el7-x86_64/
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql
EOF
linux$:] yum clean all
linux$:] yum makecache
linux$:] yum -y install mysql-community-server
linux$:] systemctl start mysqld
linux$:] systemctl enable mysqld
linux$:] grep "temporary password is generated" /var/log/mysqld.log
linux$:] mysql -uroot -p
Mysql 5.7.6以后的版本用下面的命令进行账号密码初始化
SQL>ALTER USER USER() IDENTIFIED BY 'Twcx@2023';
SQL>FLUSH PRIVILEGES;
linux$:] systemctl restart mysqld
linux$:] ystemctl enable mysqld
2、Hive安装
powershell
linux$:] cd /soft
linux$:] wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
linux$:] tar -zxvf apache-hive-3.1.3-bin.tar.gz
linux$:] mv apache-hive-3.1.3-bin hive
linux$:] cd /soft/hive/conf
linux$:] mv hive-env.sh.template hive-env.sh
linux$:] echo '' > hive-env.sh
linux$:] mv hive-default.xml.template hive-site.xml
linux$:] echo '' > hive-site.xml
解决hadoop与hive包之间jar冲突的问题
linux$:] cd /soft/hive/lib
linux$:] rm -rf guava-19.0.jar
linux$:] cp /soft/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./
解决Mysql 关联,依赖包
mysql驱动下载地址
https://dev.mysql.com/downloads/connector/j/
mysql 8.0驱动下载地址
linux$:] wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.11.tar.gz
linux$:] tar -zxvf mysql-connector-java-8.0.11.tar.gz
linux$:] cd mysql-connector-java-8.0.11
linux$:] cp mysql-connector-java-8.0.11.jar /soft/hive/lib/
mysql 5.7驱动下载地址[当前用的此驱动]
linux$:] wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar
linux$:] cp mysql-connector-java-6.0.6.jar /soft/hive/lib/
3、Hive配置
powershell
配置文件
linux$:] vi /soft/hive/conf/hive-env.sh
export HADOOP_HOME=/soft/hadoop
export HIVE_CONF_DIR=/soft/hive/conf
export HIVE_AUX_JARS_PATH=/soft/hive/lib
配置日志文件,可以更改级别为DEBUG,用于调试
linux$:] vi /soft/hive/conf/hive-log4j2.properties
linux$:] cp hive-log4j2.properties.template hive-log4j2.properties
linux$:] vi hive-log4j2.properties
property.hive.log.dir = /user/hive/log
配置文件:
注意:配置mysql访问的时候,就算是指定了字符集,mysql初始化时的字符集依然为latin
linux$:] vi /soft/hive/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://Master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>pyroot</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Twcx@2023</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://Master:9083</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>Master</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>beeline.hs2.connection.user</name>
<value>root</value>
</property>
<property>
<name>beeline.hs2.connection.password</name>
<value>root</value>
</property>
</configuration>
4、启动Hive
说明:
命令行客户端:
bin/hive 不推荐使用,是shell客户端
bin/beeline
强烈推荐使用,是jdbc的客户端,可以在嵌入式与远程客户端使用,且访问的hiveServer2,通过hiveServer2访问metastore,再Hive mysql数据。
HiveServer2支持多客户端的并发和身份证认证,旨在为开发API客户端如JDBC,ODBC提供更好的支持
powershell
重启hdfs
linux$:] stop-all.sh
linux$:] start-all.sh
初始化hive元数据信息到mysql中
linux$:] schematool -dbType mysql -initSchema #初始化schema
检查mysql是否存在hive库,hive库的74张表
linux$:] mysql -uroot -p
SQL> show databases;
SQL> use hive
SQL> show tables;
启动metastore
linux$:] mkdir -p /soft/hive/metastorelog
linux$:] cd /soft/hive/metastorelog
linux$:] nohup hive --service metastore --hiveconf hive.root.logger=DEBUG,console &
启动hiveserver2
linux$:] mkdir -p /soft/hive/hiveserver2log
linux$:] cd /soft/hive/hiveserver2log
linux$:] nohup $HIVE_HOME/bin/hive --service hiveserver2 &
5、远程测试metastore与hiveserver2【可在Datanode主机上搭建客户端】
powershell
安装Hive软件
linux$:] cd /soft
linux$:] wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
linux$:] tar -zxvf apache-hive-3.1.3-bin.tar.gz
linux$:] mv apache-hive-3.1.3-bin hive
解决hadoop与hive包之间jar冲突的问题
linux$:] cd /soft/hive/lib
linux$:] rm -rf guava-19.0.jar
linux$:] cp /soft/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./
驱动部署,远程可不需要
linux$:] wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar
linux$:] cp mysql-connector-java-6.0.6.jar /soft/hive/lib/
配置Hive文件
配置文件
linux$:] vi /soft/hive/conf/hive-env.sh
export HADOOP_HOME=/soft/hadoop
export HIVE_CONF_DIR=/soft/hive/conf
export HIVE_AUX_JARS_PATH=/soft/hive/lib
配置文件
linux$:] vi /soft/hive/conf/hive-site.xml
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://Master:9083</value>
</property>
</configuration>
测试metastore,不加主机与IP,默认是访问的metastore的暴露端口 9083
linux$:] beeline -u jdbc:hive2://
> show databases;
测试hiveserver2,端口10000,是访问的是hiverserver2的暴露端口
linux$:] beeline -u jdbc:hive2://Master:10000
> show databases;
其它测试:
win 环境,下载DBeaver,通过10000号进行访问链接。账号默认为hive,密码为空或者填入hive。
6、webapp的访问
powershell
http://Master:10002/