Hadoop集群部署流程

前置要求

  • 需要3台虚拟机,系统为Centos7,分别host命名为node1,node2,node3,密码均为root
  • 请确保这三台虚拟机已经完成了JDK、SSH免密、关闭防火墙、配置主机名映射等前置操作

在3台虚拟机的/etc/hosts文件中,填入如下内容:(同时这也是三台虚拟机的ip地址)

192.168.88.131 node1
192.168.88.132 node2
192.168.88.133 node3

请在VMware中,对:

  1. node1设置4GB或以上内存
  2. node2和node3设置2GB或以上内存

大数据的软件本身就是集群化(一堆服务器)一起运行的。

现在我们在一台电脑中以多台虚拟机来模拟集群,确实会有很大的内存压力哦。

角色分配如下:

  1. node1:Namenode、Datanode、ResourceManager、NodeManager、HistoryServer、WebProxyServer、QuorumPeerMain
  2. node2:Datanode、NodeManager、QuorumPeerMain
  3. node3:Datanode、NodeManager、QuorumPeerMain

Hadoop集群部署

  1. 下载Hadoop安装包、解压、配置软链接

    shell 复制代码
    # 1. 下载
    wget http://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
    
    # 2. 解压
    # 请确保目录/export/server存在
    tar -zxvf hadoop-3.3.0.tar.gz -C /export/server/
    
    # 3. 构建软链接
    ln -s /export/server/hadoop-3.3.0 /export/server/hadoop
  2. 修改配置文件:hadoop-env.sh

    Hadoop的配置文件要修改的地方很多,请细心

    cd 进入到/export/server/hadoop/etc/hadoop,文件夹中,配置文件都在这里

    修改hadoop-env.sh文件

    此文件是配置一些Hadoop用到的环境变量

    这些是临时变量,在Hadoop运行时有用

    如果要永久生效,需要写到/etc/profile中

    shell 复制代码
    # 在文件开头加入:
    # 配置Java安装路径
    export JAVA_HOME=/export/server/jdk
    # 配置Hadoop安装路径
    export HADOOP_HOME=/export/server/hadoop
    # Hadoop hdfs配置文件路径
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    # Hadoop YARN配置文件路径
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    # Hadoop YARN 日志文件夹
    export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
    # Hadoop hdfs 日志文件夹
    export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
    
    # Hadoop的使用启动用户配置
    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root
    export YARN_PROXYSERVER_USER=root
  3. 修改配置文件:core-site.xml

    如下,清空文件,填入如下内容

    xml 复制代码
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:8020</value>
        <description></description>
      </property>
    
      <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
        <description></description>
      </property>
    </configuration>
  4. 配置:hdfs-site.xml文件

    xml 复制代码
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.datanode.data.dir.perm</name>
            <value>700</value>
        </property>
    
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>/data/nn</value>
        <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
      </property>
    
      <property>
        <name>dfs.namenode.hosts</name>
        <value>node1,node2,node3</value>
        <description>List of permitted DataNodes.</description>
      </property>
    
      <property>
        <name>dfs.blocksize</name>
        <value>268435456</value>
        <description></description>
      </property>
    
    
      <property>
        <name>dfs.namenode.handler.count</name>
        <value>100</value>
        <description></description>
      </property>
    
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>/data/dn</value>
      </property>
    </configuration>
  5. 配置:mapred-env.sh文件

    shell 复制代码
    # 在文件的开头加入如下环境变量设置
    export JAVA_HOME=/export/server/jdk
    export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
    export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
  6. 配置:mapred-site.xml文件

    xml 复制代码
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description></description>
      </property>
    
      <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node1:10020</value>
        <description></description>
      </property>
    
    
      <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node1:19888</value>
        <description></description>
      </property>
    
    
      <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/data/mr-history/tmp</value>
        <description></description>
      </property>
    
    
      <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/data/mr-history/done</value>
        <description></description>
      </property>
    <property>
      <name>yarn.app.mapreduce.am.env</name>
      <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
    </property>
    <property>
      <name>mapreduce.map.env</name>
      <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
    </property>
    <property>
      <name>mapreduce.reduce.env</name>
      <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
    </property>
    </configuration>
  7. 配置:yarn-env.sh文件

    shell 复制代码
    # 在文件的开头加入如下环境变量设置
    export JAVA_HOME=/export/server/jdk
    export HADOOP_HOME=/export/server/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
    export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs
  8. 配置:yarn-site.xml文件

    xml 复制代码
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.log.server.url</name>
        <value>http://node1:19888/jobhistory/logs</value>
        <description></description>
    </property>
    
      <property>
        <name>yarn.web-proxy.address</name>
        <value>node1:8089</value>
        <description>proxy server hostname and port</description>
      </property>
    
    
      <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
        <description>Configuration to enable or disable log aggregation</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
        <description>Configuration to enable or disable log aggregation</description>
      </property>
    
    
    <!-- Site specific YARN configuration properties -->
      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
        <description></description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
        <description></description>
      </property>
    
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/nm-local</value>
        <description>Comma-separated list of paths on the local filesystem where intermediate data is written.</description>
      </property>
    
    
      <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/nm-log</value>
        <description>Comma-separated list of paths on the local filesystem where logs are written.</description>
      </property>
    
    
      <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
        <description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.</description>
      </property>
    
    
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>Shuffle service that needs to be set for Map Reduce applications.</description>
      </property>
    </configuration>
  9. 修改workers文件

    shell 复制代码
    # 全部内容如下
    node1
    node2
    node3
  10. 分发hadoop到其它机器

shell 复制代码
# 在node1执行
cd /export/server

scp -r hadoop-3.3.0 node2:`pwd`/
scp -r hadoop-3.3.0 node3:`pwd`/
  1. 在node2、node3执行

    shell 复制代码
    # 创建软链接
    ln -s /export/server/hadoop-3.3.0 /export/server/hadoop
  2. 创建所需目录

    • 在node1执行:

      shell 复制代码
      mkdir -p /data/nn
      mkdir -p /data/dn
      mkdir -p /data/nm-log
      mkdir -p /data/nm-local
    • 在node2执行:

      shell 复制代码
      mkdir -p /data/dn
      mkdir -p /data/nm-log
      mkdir -p /data/nm-local
    • 在node3执行:

      shell 复制代码
      mkdir -p /data/dn
      mkdir -p /data/nm-log
      mkdir -p /data/nm-local
  3. 配置环境变量

    在node1、node2、node3修改/etc/profile

    shell 复制代码
    export HADOOP_HOME=/export/server/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    执行source /etc/profile生效

  4. 格式化NameNode,在node1执行

    shell 复制代码
    hadoop namenode -format

    hadoop这个命令来自于:$HADOOP_HOME/bin中的程序

    由于配置了环境变量PATH,所以可以在任意位置执行hadoop命令哦

  5. 启动hadoop的hdfs集群,在node1执行即可

    shell 复制代码
    start-dfs.sh
    
    # 如需停止可以执行
    stop-dfs.sh

    start-dfs.sh这个命令来自于:$HADOOP_HOME/sbin中的程序

    由于配置了环境变量PATH,所以可以在任意位置执行start-dfs.sh命令哦

  6. 启动hadoop的yarn集群,在node1执行即可

    shell 复制代码
    start-yarn.sh
    
    # 如需停止可以执行
    stop-yarn.sh
  7. 启动历史服务器

    shell 复制代码
    mapred --daemon start historyserver
    
    # 如需停止将start更换为stop
  8. 启动web代理服务器

    shell 复制代码
    yarn-daemon.sh start proxyserver
    
    # 如需停止将start更换为stop
验证Hadoop集群运行情况
  1. 在node1、node2、node3上通过jps验证进程是否都启动成功

  2. 验证HDFS,浏览器打开:http://node1:9870

    创建文件test.txt,随意填入内容,并执行:

    shell 复制代码
    hadoop fs -put test.txt /test.txt
    
    hadoop fs -cat /test.txt
  3. 验证YARN,浏览器打开:http://node1:8088

    执行:

    shell 复制代码
    # 创建文件words.txt,填入如下内容
    example osc hadoop
    osc hadoop hadoop
    osc hadoop
    
    # 将文件上传到HDFS中
    hadoop fs -put words.txt /words.txt
    
    # 执行如下命令验证YARN是否正常
    hadoop jar /export/server/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount -Dmapred.job.queue.name=root.root /words.txt /output

​ 在web界面能看到任务并且没有报错,则集群部署成功!

相关推荐
小扳12 分钟前
微服务篇-深入了解 XXL-JOB 分布式任务调度的具体使用(XXL-JOB 的工作流程、框架搭建)
数据库·分布式·spring·spring cloud·微服务·架构
Ftrans1 小时前
Ftrans数据摆渡系统 搭建安全便捷跨网文件传输通道
大数据·安全
学术会议1 小时前
【火热征稿中-稳定检索】2025年计算机视觉、图像与数据管理国际会议 (CVIDM 2025)
大数据·人工智能·安全·计算机视觉
Allen Bright2 小时前
RabbitMQ中的异步Confirm模式:提升消息可靠性的利器
分布式·rabbitmq
我来试试3 小时前
【超详细】Windows安装Npcap
大数据·python
BabyFish133 小时前
hive中的四种排序类型
数据仓库·hive·hadoop·排序·order by·sorted
小刘鸭!3 小时前
Flink如何处理迟到数据?
大数据·flink
撕得失败的标签3 小时前
使用 Docker 搭建 Hadoop 集群
hadoop·docker·容器·debian·wsl
B站计算机毕业设计超人4 小时前
计算机毕业设计hadoop+spark+hive民宿推荐系统 酒店推荐系统 民宿价格预测 酒店价格 预测 机器学习 深度学习 Python爬虫 HDFS集群
大数据·python·机器学习·spark·课程设计·数据可视化·推荐算法
AIGC大时代5 小时前
如何判断一个学术论文是否具有真正的科研价值?ChatGPT如何提供帮助?
大数据·人工智能·物联网·chatgpt·aigc