Ubuntu26.04 搭建 Hadoop 3.5.0 完全分布式

Ubuntu26.04 搭建 Hadoop 3.5.0 完全分布式

lihaozhe01 lihaozhe02 lihaozhe03
192.168.10.101 192.168.10.102 192.168.10.103
zookeeper zookeeper zookeeper
namenode namenode
recource manager recource manager
journalnode journalnode journalnode
datanode datanode datanode
nodemanager nodemanager nodemanager
job history server
job log job log job log

1. 准备

nodejs python scala maven 是为 其他集群和开发环境准备的 如果只是搭建hadoop集群可以忽略

1.1 升级操作系统和软件

1.1.1 备份 apt 源配置文件
bash 复制代码
sudo mv /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list.d/ubuntu.sources.bak
1.1.2 编写 apt 源配置文件
bash 复制代码
sudo vim /etc/apt/sources.list.d/ubuntu.sources

内容如下:

bash 复制代码
Types: deb
URIs: https://mirrors.aliyun.com/ubuntu
Suites: resolute resolute-updates resolute-backports
Components: main universe restricted multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg

Types: deb
URIs: https://mirrors.aliyun.com/ubuntu
Suites: resolute-security
Components: main universe restricted multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
1.1.3 更新APT源
bash 复制代码
sudo apt update
1.1.4 升级软件
bash 复制代码
sudo apt -y dist-upgrade

1.2 设置时区

bash 复制代码
sudo timedatectl set-timezone Asia/Shanghai

1.3 修改主机名

bash 复制代码
hostnamectl set-hostname lihaozhe01
bash 复制代码
hostnamectl set-hostname lihaozhe02
bash 复制代码
hostnamectl set-hostname lihaozhe03

1.4 修改IP地址

lihaozhe01

bash 复制代码
sudo tee /etc/netplan/00-installer-config.yaml > /dev/null << 'EOF'
network:
  version: 2
  renderer: networkd
  ethernets:
    ens32:
      dhcp4: no
      addresses:
        - 192.168.10.101/24
      routes:
        - to: default
          via: 192.168.10.2
      nameservers:
        addresses:
          - 192.168.10.2
EOF
bash 复制代码
sudo chmod 600 /etc/netplan/00-installer-config.yaml
bash 复制代码
sudo netplan apply

lihaozhe02

bash 复制代码
sudo tee /etc/netplan/00-installer-config.yaml > /dev/null << 'EOF'
network:
  version: 2
  renderer: networkd
  ethernets:
    ens32:
      dhcp4: no
      addresses:
        - 192.168.10.102/24
      routes:
        - to: default
          via: 192.168.10.2
      nameservers:
        addresses:
          - 192.168.10.2
EOF
bash 复制代码
sudo chmod 600 /etc/netplan/00-installer-config.yaml
bash 复制代码
sudo netplan apply

lihaozhe03

bash 复制代码
sudo tee /etc/netplan/00-installer-config.yaml > /dev/null << 'EOF'
network:
  version: 2
  renderer: networkd
  ethernets:
    ens32:
      dhcp4: no
      addresses:
        - 192.168.10.103/24
      routes:
        - to: default
          via: 192.168.10.2
      nameservers:
        addresses:
          - 192.168.10.2
EOF
bash 复制代码
sudo chmod 600 /etc/netplan/00-installer-config.yaml
bash 复制代码
sudo netplan apply

1.5 关闭防火墙

bash 复制代码
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
systemctl stop firewalld
systemctl disable firewalld

1.6 修改hosts配置文件

bash 复制代码
vim /etc/hosts

修改内容如下:

bash 复制代码
192.168.10.101	lihaozhe01
192.168.10.102	lihaozhe02
192.168.10.103	lihaozhe03

1.7 上传软件配置环境变量

创建软件下载目录
bash 复制代码
mkdir ~/opt
安装python
  1. 安装python

    bash 复制代码
    sudo apt -y install python3 python3-venv python3-pip
  2. 配置pip国内镜像源

    bash 复制代码
    pip config set global.index-url https://mirrors.aliyun.com/pypi/simple
安装nodejs
  1. 下载

    bash 复制代码
    wget -P ~/opt https://nodejs.org/dist/v24.18.0/node-v24.18.0-linux-x64.tar.xz
  2. 解压并修改目录名称

    bash 复制代码
    tar -xvf ~/opt/node-v24.18.0-linux-x64.tar.xz -C ~/opt
    mv ~/opt/node-v24.18.0-linux-x64 ~/opt/node-v2
  3. 配置环境变量,在 ~/.proffle 末尾追加:

    bash 复制代码
    export NODE_HOME=$HOME/opt/node-v24
    export PATH=$PATH:$NODE_HOME/bin
  4. 激活编辑变量

    bash 复制代码
    source ~/.proffle
  5. 设置 npm 国内镜像源

    bash 复制代码
    npm config set registry https://registry.npmmirror.com
  6. 使用 npx nrm 快速切换镜像源

    bash 复制代码
    npx nrm use taobao 
  7. 全局安装升级npm pnpm yarn

    bash 复制代码
    npm install -g npm
    npm install -g pnpm
    npm install -g cnpm
    bash 复制代码
    npm config set allow-scripts=yarn --location=user
    npm install -g yarn
  8. 设置 pnpm yarn 国内镜像源

    bash 复制代码
    pnpm config set registry https://registry.npmmirror.com
    bash 复制代码
    yarn config set registry https://registry.npmmirror.com
安装java
  1. 下载

    bash 复制代码
    wget -P ~/opt https://download.oracle.com/java/25/latest/jdk-25_linux-x64_bin.tar.gz
  2. 解压并修改目录名称

    bash 复制代码
    tar -zxvf ~/opt/jdk-25_linux-x64_bin.tar.gz -C ~/opt
    mv ~/opt/jdk-25.0.3 ~/opt/jdk-25
  3. 配置环境变量,在 ~/.proffle 末尾追加:

    bash 复制代码
    export JAVA_HOME=$HOME/opt/jdk-25
    export PATH=$PATH:$JAVA_HOME/bin
  4. 激活编辑变量

    bash 复制代码
    source ~/.proffle
安装maven
  1. 下载

    bash 复制代码
    wget -P ~/opt https://dlcdn.apache.org/maven/maven-3/3.9.16/binaries/apache-maven-3.9.16-bin.tar.gz
  2. 解压并修改目录名称

    bash 复制代码
    tar -zxvf ~/opt/apache-maven-3.9.16-bin.tar.gz -C ~/opt
    mv ~/opt/apache-maven-3.9.16 ~/opt/maven
  3. 配置环境变量,在 ~/.proffle 末尾追加:

    bash 复制代码
    export MAVEN_HOME=$HOME/opt/maven
    export PATH=$PATH:$MAVEN_HOME/bin
  4. 修改配置文件

    $HOME/opt/maven/conf/setting.xml

    xml 复制代码
    <?xml version="1.0" encoding="UTF-8"?>
    
    <!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
    -->
    
    <!--
     | This is the configuration file for Maven. It can be specified at two levels:
     |
     |  1. User Level. This settings.xml file provides configuration for a single user,
     |                 and is normally provided in ${user.home}/.m2/settings.xml.
     |
     |                 NOTE: This location can be overridden with the CLI option:
     |
     |                 -s /path/to/user/settings.xml
     |
     |  2. Global Level. This settings.xml file provides configuration for all Maven
     |                 users on a machine (assuming they're all using the same Maven
     |                 installation). It's normally provided in
     |                 ${maven.conf}/settings.xml.
     |
     |                 NOTE: This location can be overridden with the CLI option:
     |
     |                 -gs /path/to/global/settings.xml
     |
     | The sections in this sample file are intended to give you a running start at
     | getting the most out of your Maven installation. Where appropriate, the default
     | values (values used when the setting is not specified) are provided.
     |
     |-->
    <settings xmlns="http://maven.apache.org/SETTINGS/1.2.0"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd">
      <!-- localRepository
       | The path to the local repository maven will use to store artifacts.
       |
       | Default: ${user.home}/.m2/repository
      <localRepository>/path/to/local/repo</localRepository>
      -->
    
      <!-- interactiveMode
       | This will determine whether maven prompts you when it needs input. If set to false,
       | maven will use a sensible default value, perhaps based on some other setting, for
       | the parameter in question.
       |
       | Default: true
      <interactiveMode>true</interactiveMode>
      -->
    
      <!-- offline
       | Determines whether maven should attempt to connect to the network when executing a build.
       | This will have an effect on artifact downloads, artifact deployment, and others.
       |
       | Default: false
      <offline>false</offline>
      -->
    
      <!-- pluginGroups
       | This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e.
       | when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers
       | "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list.
       |-->
      <pluginGroups>
        <!-- pluginGroup
         | Specifies a further group identifier to use for plugin lookup.
        <pluginGroup>com.your.plugins</pluginGroup>
        -->
      </pluginGroups>
    
      <!-- TODO Since when can proxies be selected as depicted? -->
      <!-- proxies
       | This is a list of proxies which can be used on this machine to connect to the network.
       | Unless otherwise specified (by system property or command-line switch), the first proxy
       | specification in this list marked as active will be used.
       |-->
      <proxies>
        <!-- proxy
         | Specification for one proxy, to be used in connecting to the network.
         |
        <proxy>
          <id>optional</id>
          <active>true</active>
          <protocol>http</protocol>
          <username>proxyuser</username>
          <password>proxypass</password>
          <host>proxy.host.net</host>
          <port>80</port>
          <nonProxyHosts>local.net|some.host.com</nonProxyHosts>
        </proxy>
        -->
      </proxies>
    
      <!-- servers
       | This is a list of authentication profiles, keyed by the server-id used within the system.
       | Authentication profiles can be used whenever maven must make a connection to a remote server.
       |-->
      <servers>
        <!-- server
         | Specifies the authentication information to use when connecting to a particular server, identified by
         | a unique name within the system (referred to by the 'id' attribute below).
         |
         | NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are
         |       used together.
         |
        <server>
          <id>deploymentRepo</id>
          <username>repouser</username>
          <password>repopwd</password>
        </server>
        -->
    
        <!-- Another sample, using keys to authenticate.
        <server>
          <id>siteServer</id>
          <privateKey>/path/to/private/key</privateKey>
          <passphrase>optional; leave empty if not used.</passphrase>
        </server>
        -->
      </servers>
    
      <!-- mirrors
       | This is a list of mirrors to be used in downloading artifacts from remote repositories.
       |
       | It works like this: a POM may declare a repository to use in resolving certain artifacts.
       | However, this repository may have problems with heavy traffic at times, so people have mirrored
       | it to several places.
       |
       | That repository definition will have a unique id, so we can create a mirror reference for that
       | repository, to be used as an alternate download site. The mirror site will be the preferred
       | server for that repository.
       |-->
      <mirrors>
        <!-- mirror
         | Specifies a repository mirror site to use instead of a given repository. The repository that
         | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
         | for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
         |
        <mirror>
          <id>mirrorId</id>
          <mirrorOf>repositoryId</mirrorOf>
          <name>Human Readable Name for this Mirror.</name>
          <url>http://my.repository.com/repo/path</url>
        </mirror>
         -->
        <!-- 阿里云中央仓库 -->
        <mirror>
          <!-- 镜像的唯一标识,maven 内部用,随便写但别重复 -->
          <id>aliyunmaven</id>
          <!-- 把 Maven 自带的"central"仓库(repo1.maven.org)全部重定向到阿里云 -->
          <mirrorOf>central</mirrorOf>
          <!-- 可读性名字,列表或报错时给人看 -->
          <name>阿里云公共仓库</name>
          <!-- 真实的镜像地址;注意你贴的那行有嵌套 <url> 标签,实际要写成纯文本 -->
          <url>https://maven.aliyun.com/repository/public</url>
        </mirror>
      </mirrors>
    
      <!-- profiles
       | This is a list of profiles which can be activated in a variety of ways, and which can modify
       | the build process. Profiles provided in the settings.xml are intended to provide local machine-
       | specific paths and repository locations which allow the build to work in the local environment.
       |
       | For example, if you have an integration testing plugin - like cactus - that needs to know where
       | your Tomcat instance is installed, you can provide a variable here such that the variable is
       | dereferenced during the build process to configure the cactus plugin.
       |
       | As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles
       | section of this document (settings.xml) - will be discussed later. Another way essentially
       | relies on the detection of a property, either matching a particular value for the property,
       | or merely testing its existence. Profiles can also be activated by JDK version prefix, where a
       | value of '1.4' might activate a profile when the build is executed on a JDK version of '1.4.2_07'.
       | Finally, the list of active profiles can be specified directly from the command line.
       |
       | NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact
       |       repositories, plugin repositories, and free-form properties to be used as configuration
       |       variables for plugins in the POM.
       |
       |-->
      <profiles>
        <!-- profile
         | Specifies a set of introductions to the build process, to be activated using one or more of the
         | mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/>
         | or the command line, profiles have to have an ID that is unique.
         |
         | An encouraged best practice for profile identification is to use a consistent naming convention
         | for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 'user-brett', etc.
         | This will make it more intuitive to understand what the set of introduced profiles is attempting
         | to accomplish, particularly when you only have a list of profile id's for debug.
         |
         | This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo.
        <profile>
          <id>jdk-1.4</id>
    
          <activation>
            <jdk>1.4</jdk>
          </activation>
    
          <repositories>
            <repository>
              <id>jdk14</id>
              <name>Repository for JDK 1.4 builds</name>
              <url>http://www.myhost.com/maven/jdk14</url>
              <layout>default</layout>
              <snapshotPolicy>always</snapshotPolicy>
            </repository>
          </repositories>
        </profile>
        -->
    
        <!--
         | Here is another profile, activated by the property 'target-env' with a value of 'dev', which
         | provides a specific path to the Tomcat instance. To use this, your plugin configuration might
         | hypothetically look like:
         |
         | ...
         | <plugin>
         |   <groupId>org.myco.myplugins</groupId>
         |   <artifactId>myplugin</artifactId>
         |
         |   <configuration>
         |     <tomcatLocation>${tomcatPath}</tomcatLocation>
         |   </configuration>
         | </plugin>
         | ...
         |
         | NOTE: If you just wanted to inject this configuration whenever someone set 'target-env' to
         |       anything, you could just leave off the <value/> inside the activation-property.
         |
        <profile>
          <id>env-dev</id>
    
          <activation>
            <property>
              <name>target-env</name>
              <value>dev</value>
            </property>
          </activation>
    
          <properties>
            <tomcatPath>/path/to/tomcat/instance</tomcatPath>
          </properties>
        </profile>
        -->
        <profile>     
          <id>jdk-25</id>   
          <activation>        
            <activeByDefault>true</activeByDefault>    
            <jdk>25</jdk>      
          </activation>  
          <properties>  
            <maven.compiler.source>25</maven.compiler.source> 
            <maven.compiler.target>25</maven.compiler.target> 
            <maven.compiler.compilerVersion>25</maven.compiler.compilerVersion>   
            <maven.compiler.encoding>utf-8</maven.compiler.encoding>
            <project.build.sourceEncoding>utf-8</project.build.sourceEncoding>
            <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding>
            <maven.test.failure.ignore>true</maven.test.failure.ignore>
            <maven.test.skip>true</maven.test.skip>
          </properties>
        </profile>
        <profile>     
          <id>jdk-21</id>   
          <activation>        
            <activeByDefault>true</activeByDefault>    
            <jdk>21</jdk>      
          </activation>  
          <properties>  
            <maven.compiler.source>21</maven.compiler.source> 
            <maven.compiler.target>21</maven.compiler.target> 
            <maven.compiler.compilerVersion>21</maven.compiler.compilerVersion>   
            <maven.compiler.encoding>utf-8</maven.compiler.encoding>
            <project.build.sourceEncoding>utf-8</project.build.sourceEncoding>
            <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding>
            <maven.test.failure.ignore>true</maven.test.failure.ignore>
            <maven.test.skip>true</maven.test.skip>
          </properties>
        </profile>
        <profile>     
          <id>jdk-17</id>   
          <activation>        
            <activeByDefault>true</activeByDefault>    
            <jdk>17</jdk>      
          </activation>  
          <properties>  
            <maven.compiler.source>17</maven.compiler.source> 
            <maven.compiler.target>17</maven.compiler.target> 
            <maven.compiler.compilerVersion>17</maven.compiler.compilerVersion>   
            <maven.compiler.encoding>utf-8</maven.compiler.encoding>
            <project.build.sourceEncoding>utf-8</project.build.sourceEncoding>
            <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding>
            <maven.test.failure.ignore>true</maven.test.failure.ignore>
            <maven.test.skip>true</maven.test.skip>
          </properties>
        </profile>
        <profile>     
          <id>jdk-8</id>   
          <activation>        
            <activeByDefault>true</activeByDefault>    
            <jdk>8</jdk>      
          </activation>  
          <properties>  
            <maven.compiler.source>8</maven.compiler.source> 
            <maven.compiler.target>8</maven.compiler.target> 
            <maven.compiler.compilerVersion>8</maven.compiler.compilerVersion>   
            <maven.compiler.encoding>utf-8</maven.compiler.encoding>
            <project.build.sourceEncoding>utf-8</project.build.sourceEncoding>
            <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding>
            <maven.test.failure.ignore>true</maven.test.failure.ignore>
            <maven.test.skip>true</maven.test.skip>
          </properties>
        </profile>
      </profiles>
    
      <!-- activeProfiles
       | List of profiles that are active for all builds.
       |
      <activeProfiles>
        <activeProfile>alwaysActiveProfile</activeProfile>
        <activeProfile>anotherAlwaysActiveProfile</activeProfile>
      </activeProfiles>
      -->
    </settings>
  5. 当前用户 maven 配置文件

    bash 复制代码
    mkdir $HOME/.m2
    bash 复制代码
    cp -v $HOME/opt/maven/conf/setting.xml $HOME/.m2
  6. 激活编辑变量

    bash 复制代码
    source ~/.proffle
安装scala
  1. 下载

    bash 复制代码
    wget -P ~/opt https://github.com/scala/scala/releases/download/v2.13.18/scala-2.13.18.tgz
bash 复制代码
tar -zxvf ~/opt/scala-2.13.18.tgz -C ~/opt
mv ~/opt/scala-2.13.18 ~/opt/scala-2
  1. 配置环境变量,在 ~/.proffle 末尾追加:

    bash 复制代码
    export SCALA_HOME=$HOME/opt/jdk-25
    export PATH=$PATH:$SCALA_HOME/bin
  2. 激活编辑变量

    bash 复制代码
    source ~/.proffle
安装zookeeper
  1. 下载

    bash 复制代码
    wget -P ~/opt https://dlcdn.apache.org/zookeeper/zookeeper-3.9.5/apache-zookeeper-3.9.5-bin.tar.gz
  2. 解压并修改目录名称

    bash 复制代码
    tar -zxvf ~/opt/apache-zookeeper-3.9.5-bin.tar.gz -C ~/opt
    mv ~/opt/apache-zookeeper-3.9.5 ~/opt/zookeeper-3
  3. 配置环境变量,在 ~/.proffle 末尾追加:

    bash 复制代码
    export ZOOKEEPER_HOME=$HOME/opt/zookeeper-3
    export PATH=$PATH:$ZOOKEEPER_HOME/bin
  4. 激活编辑变量

    bash 复制代码
    source ~/.proffle
安装hadoop
  1. 下载

    bash 复制代码
    wget -P ~/opt https://dlcdn.apache.org/hadoop/common/hadoop-3.5.0/hadoop-3.5.0.tar.gz
  2. 解压并修改目录名称

    bash 复制代码
    tar -zxvf ~/opt/hadoop-3.5.0.tar.gz -C ~/opt
    mv ~/opt/hadoop-3.5.0 ~/opt/hadoop-3
  3. 配置环境变量,在 ~/.proffle 末尾追加:

    bash 复制代码
    export HDFS_NAMENODE_USER=lhz
    export HDFS_SECONDARYNAMENODE_USER=lhz
    export HDFS_DATANODE_USER=lhz
    export HDFS_ZKFC_USER=lhz
    export HDFS_JOURNALNODE_USER=lhz
    export HADOOP_SHELL_EXECNAME=lhz
    
    export YARN_RESOURCEMANAGER_USER=lhz
    export YARN_NODEMANAGER_USER=lhz
    
    export HADOOP_HOME=$HOME/opt/hadoop-3
    export HADOOP_INSTALL=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
    
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  4. 激活编辑变量

    bash 复制代码
    source ~/.proffle

1.8 完整环境变量

bash 复制代码
export NODE_HOME=$HOME/opt/node-v24

export JAVA_HOME=$HOME/opt/jdk-25

export MAVEN_HOME=$HOME/opt/maven

export SCALA_HOME=$HOME/opt/jdk-25

export ZOOKEEPER_HOME=$HOME/opt/zookeeper-3

export HDFS_NAMENODE_USER=lhz
export HDFS_SECONDARYNAMENODE_USER=lhz
export HDFS_DATANODE_USER=lhz
export HDFS_ZKFC_USER=lhz
export HDFS_JOURNALNODE_USER=lhz
export HADOOP_SHELL_EXECNAME=lhz

export YARN_RESOURCEMANAGER_USER=lhz
export YARN_NODEMANAGER_USER=lhz

export HADOOP_HOME=$HOME/opt/hadoop-3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

export PATH=$PATH:$NODE_HOME/bin:$JAVA_HOME/bin:$MAVEN_HOME/bin:$SCALA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

2. zookeeper

2.1 编辑配置文件

bash 复制代码
cd $ZOOKEEPER_HOME/conf
bash 复制代码
vim zoo.cfg

注释版:不推荐使用

bash 复制代码
# 心跳单位,2s
tickTime=2000
# zookeeper-3初始化的同步超时时间,10个心跳单位,也即20s
initLimit=10
# 普通同步:发送一个请求并得到响应的超时时间,5个心跳单位也即10s
syncLimit=5
# 内存快照数据的存储位置
dataDir=/home/lhz/data/zookeeper-3/data
# 事务日志的存储位置
dataLogDir=/home/lhz/data/zookeeper-3/datalog
# 当前zookeeper-3节点的端口 
clientPort=2181
# 单个客户端到集群中单个节点的并发连接数,通过ip判断是否同一个客户端,默认60
maxClientCnxns=1000
# 保留7个内存快照文件在dataDir中,默认保留3个
autopurge.snapRetainCount=7
# 清除快照的定时任务,默认1小时,如果设置为0,标识关闭清除任务
autopurge.purgeInterval=1
#允许客户端连接设置的最小超时时间,默认2个心跳单位
minSessionTimeout=4000
#允许客户端连接设置的最大超时时间,默认是20个心跳单位,也即40s,
maxSessionTimeout=300000
#zookeeper-3 3.5.5启动默认会把AdminService服务启动,这个服务默认是8080端口
admin.serverPort=9001
#集群地址配置
server.1=lihaozhe01:2888:3888
server.2=lihaozhe02:2888:3888
server.3=lihaozhe03:2888:3888

纯净版:推荐使用

bash 复制代码
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/lhz/data/zookeeper-3/data
dataLogDir=/home/lhz/data/zookeeper-3/datalog 
clientPort=2181
maxClientCnxns=1000
autopurge.snapRetainCount=7
autopurge.purgeInterval=1
minSessionTimeout=4000
maxSessionTimeout=300000
admin.serverPort=9001
server.1=lihaozhe01:2888:3888
server.2=lihaozhe02:2888:3888
server.3=lihaozhe03:2888:3888

2.2 保存后根据配置文件创建目录

在每台服务器上执行

bash 复制代码
mkdir -p /home/lhz/data/zookeeper/data /home/lhz/data/zookeeper/datalog

2.3 myid

lihaozhe01

bash 复制代码
echo 1 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid

lihaozhe02

bash 复制代码
echo 2 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid

lihaozhe03

bash 复制代码
echo 3 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid

2.4 编写zookeeper-3开机启动脚本

在/etc/systemd/system/文件夹下创建一个启动脚本zookeeper-3.service

注意:在每台服务器上编写

bash 复制代码
cd /etc/systemd/system
vim zookeeper.service

内容如下:

bash 复制代码
[Unit]
Description=zookeeper
After=syslog.target network.target

[Service]
Type=forking
Environment=ZOO_LOG_DIR=/home/lhz/data/zookeeper/datalog 
Environment=JAVA_HOME=/home/lhz/opt/jdk-21
ExecStart=/home/lhz/opt/zookeeper-3/bin/zkServer.sh start
ExecStop=/home/lhz/opt/zookeeper-3/bin/zkServer.sh stop
Restart=always
User=lhz
Group=lhz

[Install]
WantedBy=multi-user.target`
bash 复制代码
systemctl daemon-reload
# 等所有主机配置好后再执行以下命令
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper

3. hadoop

修改配置文件

bash 复制代码
cd  $HADOOP_HOME/etc/hadoop
  • hadoop-env.sh
  • core-site.xml
  • hdfs-site.xml
  • workers
  • mapred-site.xml
  • yarn-site.xml

hadoop-env.sh 文件末尾追加

bash 复制代码
# =========================================================================== #
#  以下为用户自定义环境变量
# =========================================================================== #

# --- Java ---
export JAVA_HOME=/home/lhz/opt/jdk-25
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

# --- 通用 ---
export HADOOP_LOG_DIR=/home/lhz/data/hadoop/logs

# --- HDFS 守护进程运行用户 (无 Kerberos 时必设) ---
export HDFS_NAMENODE_USER=lhz
export HDFS_SECONDARYNAMENODE_USER=lhz
export HDFS_DATANODE_USER=lhz
export HDFS_ZKFC_USER=lhz
export HDFS_JOURNALNODE_USER=lhz
export HADOOP_SHELL_EXECNAME=lhz

# --- YARN 守护进程运行用户 ---
export YARN_RESOURCEMANAGER_USER=lhz
export YARN_NODEMANAGER_USER=lhz

core-site.xml

xml 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- ======== 文件系统 ======== -->

  <!-- 默认文件系统: HDFS NameNode 地址 (RPC 端口默认 8020) -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://lihaozhe:8020</value>
  </property>

  <!-- ======== 本地目录 ======== -->

  <!-- Hadoop 运行时临时目录 (NameNode/DataNode 元数据、日志、pid 等的根目录) -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/lhz/data/hadoop</value>
  </property>
    
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>lihaozhe01:2181,lihaozhe02:2181,lihaozhe03:2181</value>
  </property>
    
  <!-- ======== 安全与权限 ======== -->

  <!-- 禁用 HDFS 权限检查 (测试环境简化操作) -->
  <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>

  <!-- Web UI 静态用户: 访问 HDFS/MapReduce Web 页面时显示的用户名 -->
  <property>
    <name>hadoop.http.staticuser.user</name>
    <value>lhz</value>
  </property>

  <!-- ======== 代理用户 (Hive doAs 依赖) ======== -->

  <!-- 用户 lhz 可代理的主机: * 表示所有节点 -->
  <property>
    <name>hadoop.proxyuser.lhz.hosts</name>
    <value>*</value>
  </property>

  <!-- 用户 lhz 可代理的用户组: * 表示所有组 -->
  <property>
    <name>hadoop.proxyuser.lhz.groups</name>
    <value>*</value>
  </property>

  <!-- 用户 lhz 可代理的用户: * 表示所有用户 -->
  <property>
    <name>hadoop.proxyuser.lhz.users</name>
    <value>*</value>
  </property>
</configuration>

hdfs-site.xml

xml 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>lihaozhe</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.lihaozhe</name>
    <value>nn1,nn2</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.lihaozhe.nn1</name>
    <value>lihaozhe01:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.lihaozhe.nn2</name>
    <value>lihaozhe02:8020</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.lihaozhe.nn1</name>
    <value>lihaozhe01:9870</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.lihaozhe.nn2</name>
    <value>lihaozhe02:9870</value>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://lihaozhe01:8485;lihaozhe02:8485;lihaozhe03:8485/lihaozhe</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.lihaozhe</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/lhz/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/lhz/data/hadoop/journalnode/data</value>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.safemode.threshold.pct</name>
    <value>1</value>
  </property>
</configuration>

workers

bash 复制代码
lihaozhe01
lihaozhe02
lihaozhe03

mapred-site.xml

xml 复制代码
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- ======== 执行框架 ======== -->

  <!-- 执行引擎: 使用 YARN 作为资源管理器 -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

  <!-- MR 应用 classpath: YARN 容器加载 MR 相关 JAR 的路径 -->
  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
  </property>

  <!-- ======== 任务内存 ======== -->

  <!-- Map Task JVM 最大堆内存 (MB), 需 < yarn.scheduler.maximum-allocation-mb -->
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>1024</value>
  </property>

  <!-- Reduce Task JVM 最大堆内存 (MB) -->
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>1024</value>
  </property>

  <!-- ======== JobHistory 服务 ======== -->

  <!-- JobHistory Server RPC 监听地址 -->
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>lihaozhe01:10020</value>
  </property>

  <!-- JobHistory Web UI 地址 (浏览器访问 http://lihaozhe:19888) -->
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>lihaozhe01:19888</value>
  </property>
</configuration>

yarn-site.xml

xml 复制代码
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
  <!-- ======== YARN 守护进程配置 ======== -->

  <!-- YARN 容器使用的 Java 可执行文件绝对路径 -->
  <property>
    <name>yarn.nodemanager.java.path</name>
    <value>/home/lhz/opt/jdk-25/bin/java</value>
  </property>

    <!-- Site specific YARN configuration properties -->
    <property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
    </property>
    <property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>cluster1</value>
    </property>
    <property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
    </property>
    <property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>lihaozhe01</value>
    </property>
    <property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>lihaozhe02</value>
    </property>
    <property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>lihaozhe01:8088</value>
    </property>
    <property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>lihaozhe02:8088</value>
    </property>
    <property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>lihaozhe01:2181,lihaozhe02:2181,lihaozhe03:2181</value>
    </property>
    
  <!-- ======== NodeManager 环境变量 ======== -->

  <!-- 环境变量白名单: 允许从 NM 传递到容器的环境变量 -->
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>
      JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
    </value>
  </property>

  <!-- Shuffle 辅助服务: MapReduce 中间数据混洗所必需 -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <!-- ======== MapReduce 容器环境变量 ======== -->

  <!-- ApplicationMaster JVM 环境变量 -->
  <property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>JAVA_HOME=/home/lhz/opt/jdk-25,HADOOP_MAPRED_HOME=/home/lhz/opt/hadoop-3</value>
  </property>

  <!-- ApplicationMaster JVM 启动参数 (-Xmx) -->
  <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx1024m</value>
  </property>

  <!-- Map Task 容器 JVM 环境变量 -->
  <property>
    <name>mapreduce.map.env</name>
    <value>JAVA_HOME=/home/lhz/opt/jdk-25,HADOOP_MAPRED_HOME=/home/lhz/opt/hadoop-3</value>
  </property>

  <!-- Reduce Task 容器 JVM 环境变量 -->
  <property>
    <name>mapreduce.reduce.env</name>
    <value>JAVA_HOME=/home/lhz/opt/jdk-25,HADOOP_MAPRED_HOME=/home/lhz/opt/hadoop-3</value>
  </property>

  <!-- ======== 资源调度 (内存) ======== -->

  <!-- NodeManager 可分配给所有容器的物理内存总量 (MB) -->
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
  </property>

  <!-- Scheduler 单个容器允许请求的最小内存 (MB) -->
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
  </property>

  <!-- Scheduler 单个容器允许请求的最大内存 (MB) -->
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>4096</value>
  </property>

  <!-- ======== 内存检查 (测试环境关闭) ======== -->

  <!-- 禁用物理内存超限检查, 避免测试时容器被 YARN kill -->
  <property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
  </property>

  <!-- 禁用虚拟内存超限检查 -->
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>

  <!-- ======== 日志聚集 ======== -->

  <!-- 开启日志聚集: 容器日志在任务结束后上传到 HDFS -->
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <!-- 日志保留时长 (秒), 604800 = 7 天 -->
  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
  </property>

  <!-- JobHistory Server 日志查询地址 -->
  <property>
    <name>yarn.log.server.url</name>
    <value>http://lihaozhe:19888/jobhistory/logs</value>
  </property>
</configuration>

4. 配置ssh免密钥登录

创建本地秘钥并将公共秘钥写入认证文件

bash 复制代码
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
bash 复制代码
ssh-copy-id lhz@lihaozhe01
bash 复制代码
ssh-copy-id lhz@lihaozhe02
bash 复制代码
ssh-copy-id lhz@lihaozhe03

5. 分发密钥

bash 复制代码
scp -r ~/.ssh/ lhz@lihaozhe02:~/
scp -r ~/.ssh/ lhz@lihaozhe03:~/

6. 分发并激活环境变量

bash 复制代码
scp -r ~/.profile lhz@lihaozhe02:~/
scp -r ~/.profile lhz@lihaozhe03:~/

在各节点执行以下命令

bash 复制代码
source ~/.profile

7. 启动zookeeper

7.1 myid

lihaozhe01

bash 复制代码
echo 1 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid

lihaozhe02

bash 复制代码
echo 2 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid

lihaozhe03

bash 复制代码
echo 3 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid

7.2 启动服务

在各节点执行以下命令

bash 复制代码
systemctl daemon-reload
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper

7.3 验证

bash 复制代码
jps
bash 复制代码
zkServer.sh status

8. Hadoop初始化

bash 复制代码
1.	启动三个zookeeper:zkServer.sh start
2.	启动三个JournalNode:
	hadoop-daemon.sh start journalnode 或者 hdfs --daemon start journalnode
3.	在其中一个namenode上格式化:hdfs namenode -format
4.	把刚刚格式化之后的元数据拷贝到另外一个namenode上
    a)	启动刚刚格式化的namenode :
    	hadoop-daemon.sh start namenode 或者 hdfs --daemon start namenode
    b)	在没有格式化的namenode上执行:hdfs namenode -bootstrapStandby
    c)	启动第二个namenode: 
    	hadoop-daemon.sh start namenode 或者 hdfs --daemon start namenode
5.	在其中一个namenode上初始化 hdfs zkfc -formatZK
6.	停止上面节点:stop-dfs.sh
7.	全面启动:start-all.sh
8. 启动resourcemanager节点 
	yarn-daemon.sh start resourcemanager 或者	start-yarn.sh

http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.5.0.tar

不需要执行第 8 步

9. 启动历史服务
mapred --daemon start historyserver
10 11 12 不需要执行
10、安全模式

hdfs dfsadmin -safemode enter  
hdfs dfsadmin -safemode leave


11、查看哪些节点是namenodes并获取其状态
hdfs getconf -namenodes
hdfs haadmin -getServiceState lihaozhe01

12、强制切换状态
hdfs haadmin -transitionToActive --forcemanual lihaozhe01

重点提示:

bash 复制代码
# 关机之前 依关闭服务
stop-yarn.sh
stop-dfs.sh
# 开机后 依次开启服务
start-dfs.sh
start-yarn.sh

或者

bash 复制代码
# 关机之前关闭服务
stop-all.sh
# 开机后开启服务
start-all.sh
bash 复制代码
#jps 检查进程正常后开启胡哦关闭在再做其它操作