大数据技术-Hadoop(一)Hadoop集群的安装与配置

目录

一、准备工作

1、安装jdk(每个节点都执行)

[2、修改主机配置 (每个节点都执行)](#2、修改主机配置 (每个节点都执行))

[3、配置ssh无密登录 (每个节点都执行)](#3、配置ssh无密登录 (每个节点都执行))

二、安装Hadoop(每个节点都执行)

三、集群启动配置(每个节点都执行)

1、core-site.xml

2、hdfs-site.xml

3、yarn-site.xml

4、mapred-site.xml

5、workers

四、启动集群和测试(每个节点都执行)

1、配置java环境

2、指定root启动用户

3、启动

3.1、如果集群是第一次启动

[3.2、启动HDFS 在hadoop1节点](#3.2、启动HDFS 在hadoop1节点)

3.3、启动YARN在配置ResourceManager的hadoop2节点

[3.4、查看 HDFS的NameNode](#3.4、查看 HDFS的NameNode)

3.5、查看YARN的ResourceManager

[4、 测试](#4、 测试)

4.1、测试

4.2、文件存储路径

4.3、统计文本个数

五、配置Hadoop脚本

1、启动脚本hadoop.sh

2、查看进程脚本jpsall.sh

3、拷贝到其他服务器


一、准备工作

|------|-------------------|-----------------------------|----------------------------|
| | hadoop1 | hadoop2 | hadoop3 |
| IP | 192.168.139.176 | 192.168.139.214 | 192.168.139.215 |
| HDFS | NameNode DataNode | DataNode | SecondaryNameNode DataNode |
| YARN | NodeManager | ResourceManager NodeManager | NodeManager |

1、安装jdk(每个节点都执行)

bash 复制代码
tar -zxf jdk-8u431-linux-x64.tar.gz
mv jdk1.8.0_431 /usr/local/java

#进入/etc/profile.d目录
vim java_env.sh

#编辑环境变量
#java
JAVA_HOME=/usr/local/java
JRE_HOME=/usr/local/java/jre
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME CLASSPATH

#刷新
source /etc/profile

2、修改主机配置 (每个节点都执行)

bash 复制代码
vim /etc/hosts

192.168.139.176 hadoop1
192.168.139.214 hadoop2
192.168.139.215 hadoop3

#修改主机名(每个节点对应修改)
vim /etc/hostname 
hadoop1

注意:这里本地的host文件也要修改一下 ,后面访问配置的是主机名,如果不配置,需修改为ip

3、配置ssh无密登录 (每个节点都执行)

bash 复制代码
#生成密钥
ssh-keygen -t rsa

#复制到其他节点
ssh-copy-id hadoop1
ssh-copy-id hadoop2
ssh-copy-id hadoop3

二、安装Hadoop(每个节点都执行)

bash 复制代码
tar -zxf hadoop-3.4.0.tar.gz
mv hadoop-3.4.0 /usr/local/

#配置环境变量进入/etc/profile.d目录

vim hadoop_env.sh

#添加如下内容
#hadoop
export HADOOP_HOME=/usr/local/hadoop-3.4.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

#查看版本
hadoop version

三、集群启动配置(每个节点都执行)

修改**/usr/local/hadoop-3.4.0/etc/hadoop**目录下

1、core-site.xml

XML 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>


<!-- 指定NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop1:8020</value>
    </property>

    <!-- 指定hadoop数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop-3.4.0/data</value>
    </property>

    <!-- 配置HDFS网页登录使用的静态用户为root ,实际生产请创建新用户-->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>
	
</configuration>

2、hdfs-site.xml

XML 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- nn web端访问地址-->
	<property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop1:9870</value>
    </property>
	<!-- 2nn web端访问地址-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop3:9868</value>
    </property>

</configuration>

3、yarn-site.xml

XML 复制代码
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
    <!-- Site specific YARN configuration properties -->
    <!-- 指定MR走shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- 指定ResourceManager的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop2</value>
    </property>
    <!-- 环境变量的继承 -->
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
    <!-- 开启日志聚集功能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <!-- 设置日志聚集服务器地址 -->
    <property>
        <name>yarn.log.server.url</name>
        <value>http://hadoop102:19888/jobhistory/logs</value>
    </property>
    <!-- 设置日志保留时间为7天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>

4、mapred-site.xml

XML 复制代码
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- 指定MapReduce程序运行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!-- 历史服务器端地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop1:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop1:19888</value>
    </property>
</configuration>

5、workers

XML 复制代码
hadoop1
hadoop2
hadoop3


注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行

四、启动集群和测试(每个节点都执行)

1、配置java环境

bash 复制代码
#修改这个文件/usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/local/java

2、指定root启动用户

bash 复制代码
#在start-dfs.sh,stop-dfs.sh 添加如下内容 方法上面

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

在 start-yarn.sh stop-yarn.sh 添加如下内容 方法上面
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

注:hadoop默认情况下的是不支持root账户启动的,在实际生产请创建用户组和用户,并且授予该用户root的权限

3、启动

3.1、 如果集群是第一次启动

需要在hadoop1节点格式化NameNode(注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。如果集群在运行过程中报错,需要重新格式化NameNode的话,一定要先停止namenode和datanode进程,并且要删除所有机器的data和logs目录,然后再进行格式化。

bash 复制代码
hdfs namenode -format

3.2、启动HDFS 在hadoop1节点

bash 复制代码
/usr/local/hadoop-3.4.0/sbin/start-dfs.sh

3.3、启动YARN在配置ResourceManager的hadoop2节点

bash 复制代码
/usr/local/hadoop-3.4.0/sbin/start-yarn.sh

3.4、查看 HDFS的NameNode

bash 复制代码
http://192.168.139.176:9870/

3.5、查看YARN的ResourceManager

bash 复制代码
http://192.168.139.214:8088

4、 测试

4.1、测试

bash 复制代码
#创建文件
hadoop fs -mkdir /input

#创建文件
touch text.txt

#上传文件
hadoop fs -put  text.txt /input

#删除
hadoop fs -rm -r /output

4.2、文件存储路径

bash 复制代码
/usr/local/hadoop-3.4.0/data/dfs/data/current/BP-511066843-192.168.139.176-1734965488199/current/finalized/subdir0/subdir0

4.3、统计文本个数

bash 复制代码
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount /input  /output

五、配置Hadoop脚本

1、启动脚本hadoop.sh

bash 复制代码
#!/bin/bash

if [ $# -lt 1 ]
then
    echo "No Args Input..."
    exit ;
fi

case $1 in
"start")
        echo " =================== 启动 hadoop集群 ==================="

        echo " --------------- 启动 hdfs ---------------"
        ssh hadoop1 "/usr/local/hadoop-3.4.0/sbin/start-dfs.sh"
        echo " --------------- 启动 yarn ---------------"
        ssh hadoop2 "/usr/local/hadoop-3.4.0/sbin/start-yarn.sh"
        echo " --------------- 启动 historyserver ---------------"
        ssh hadoop1 "/usr/local/hadoop-3.4.0/bin/mapred --daemon start historyserver"
;;
"stop")
        echo " =================== 关闭 hadoop集群 ==================="

        echo " --------------- 关闭 historyserver ---------------"
        ssh hadoop1 "/usr/local/hadoop-3.4.0/bin/mapred --daemon stop historyserver"
        echo " --------------- 关闭 yarn ---------------"
        ssh hadoop2 "/usr/local/hadoop-3.4.0/sbin/stop-yarn.sh"
        echo " --------------- 关闭 hdfs ---------------"
        ssh hadoop1 "/usr/local/hadoop-3.4.0/sbin/stop-dfs.sh"
;;
*)
    echo "Input Args Error..."
;;
esac
bash 复制代码
#授权
chmod +x hadoop.sh

2、查看进程脚本jpsall.sh

bash 复制代码
#!/bin/bash

for host in hadoop1 hadoop2 hadoop3
do
        echo =============== $host ===============
        ssh $host jps 
done

3、拷贝到其他服务器

bash 复制代码
scp root@hadoop1:/usr/local/hadoop-3.4.0 hadoop.sh jpsall.sh root@hadoop2:/usr/local/hadoop-3.4.0/

scp root@hadoop1:/usr/local/hadoop-3.4.0 hadoop.sh jpsall.sh root@hadoop3:/usr/local/hadoop-3.4.0/
相关推荐
无心水13 分钟前
【OpenClaw:应用与协同】23、OpenClaw生产环境安全指南——Token管理/沙箱隔离/权限最小化
大数据·人工智能·安全·ai·性能优化·openclaw
思码逸研发效能40 分钟前
代码度量分析入门:从0到1掌握核心指标
大数据·人工智能·研发效能·研发管理
云境筑桃源哇42 分钟前
亿迈跨境分销商城启航
大数据·人工智能
Sylvia33.1 小时前
OpenClaw + 数眼智能:Windows/Mac 双系统部署与特价模型接入实战指南
大数据·人工智能
瑞通软件源头厂家2 小时前
瑞通酒店管理系统:开启酒店成本控制智能新篇
大数据·人工智能
搭贝3 小时前
长沙韶光芯材|精准管控工时,夯实高端制造数字化管理根基
大数据·人工智能·低代码·自动化·sass
yhdata3 小时前
281.3亿元!医疗保健提供商数据管理软件市场稳步扩容,2032年有望冲刺468.5亿元
大数据·人工智能·物联网
放下华子我只抽RuiKe53 小时前
AI大模型开发-实战精讲:从零构建 RFM 会员价值模型(再进阶版:模拟数据 + 动态打分 + 策略落地)
大数据·人工智能·深度学习·elasticsearch·机器学习·搜索引擎·全文检索
Deepoch4 小时前
Deepoc具身模型开发板:为机械臂清洁机器人注入“智慧灵魂”
大数据·科技·机器人·机械臂·清洁机器人·具身模型·deepoc
Eward-an4 小时前
华为ModelEngine全流程评测:从智能体开发到应用编排,解锁企业级AI开发新范式
大数据·人工智能