Ubuntu26.04 搭建 Hadoop 3.5.0 完全分布式
| lihaozhe01 | lihaozhe02 | lihaozhe03 |
|---|---|---|
| 192.168.10.101 | 192.168.10.102 | 192.168.10.103 |
| zookeeper | zookeeper | zookeeper |
| namenode | namenode | |
| recource manager | recource manager | |
| journalnode | journalnode | journalnode |
| datanode | datanode | datanode |
| nodemanager | nodemanager | nodemanager |
| job history server | ||
| job log | job log | job log |
1. 准备
nodejs python scala maven 是为 其他集群和开发环境准备的 如果只是搭建hadoop集群可以忽略
1.1 升级操作系统和软件
1.1.1 备份 apt 源配置文件
bash
sudo mv /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list.d/ubuntu.sources.bak
1.1.2 编写 apt 源配置文件
bash
sudo vim /etc/apt/sources.list.d/ubuntu.sources
内容如下:
bash
Types: deb
URIs: https://mirrors.aliyun.com/ubuntu
Suites: resolute resolute-updates resolute-backports
Components: main universe restricted multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
Types: deb
URIs: https://mirrors.aliyun.com/ubuntu
Suites: resolute-security
Components: main universe restricted multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
1.1.3 更新APT源
bash
sudo apt update
1.1.4 升级软件
bash
sudo apt -y dist-upgrade
1.2 设置时区
bash
sudo timedatectl set-timezone Asia/Shanghai
1.3 修改主机名
bash
hostnamectl set-hostname lihaozhe01
bash
hostnamectl set-hostname lihaozhe02
bash
hostnamectl set-hostname lihaozhe03
1.4 修改IP地址
lihaozhe01
bash
sudo tee /etc/netplan/00-installer-config.yaml > /dev/null << 'EOF'
network:
version: 2
renderer: networkd
ethernets:
ens32:
dhcp4: no
addresses:
- 192.168.10.101/24
routes:
- to: default
via: 192.168.10.2
nameservers:
addresses:
- 192.168.10.2
EOF
bash
sudo chmod 600 /etc/netplan/00-installer-config.yaml
bash
sudo netplan apply
lihaozhe02
bash
sudo tee /etc/netplan/00-installer-config.yaml > /dev/null << 'EOF'
network:
version: 2
renderer: networkd
ethernets:
ens32:
dhcp4: no
addresses:
- 192.168.10.102/24
routes:
- to: default
via: 192.168.10.2
nameservers:
addresses:
- 192.168.10.2
EOF
bash
sudo chmod 600 /etc/netplan/00-installer-config.yaml
bash
sudo netplan apply
lihaozhe03
bash
sudo tee /etc/netplan/00-installer-config.yaml > /dev/null << 'EOF'
network:
version: 2
renderer: networkd
ethernets:
ens32:
dhcp4: no
addresses:
- 192.168.10.103/24
routes:
- to: default
via: 192.168.10.2
nameservers:
addresses:
- 192.168.10.2
EOF
bash
sudo chmod 600 /etc/netplan/00-installer-config.yaml
bash
sudo netplan apply
1.5 关闭防火墙
bash
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
systemctl stop firewalld
systemctl disable firewalld
1.6 修改hosts配置文件
bash
vim /etc/hosts
修改内容如下:
bash
192.168.10.101 lihaozhe01
192.168.10.102 lihaozhe02
192.168.10.103 lihaozhe03
1.7 上传软件配置环境变量
创建软件下载目录
bash
mkdir ~/opt
安装python
-
安装python
bashsudo apt -y install python3 python3-venv python3-pip -
配置pip国内镜像源
bashpip config set global.index-url https://mirrors.aliyun.com/pypi/simple
安装nodejs
-
下载
bashwget -P ~/opt https://nodejs.org/dist/v24.18.0/node-v24.18.0-linux-x64.tar.xz -
解压并修改目录名称
bashtar -xvf ~/opt/node-v24.18.0-linux-x64.tar.xz -C ~/opt mv ~/opt/node-v24.18.0-linux-x64 ~/opt/node-v2 -
配置环境变量,在
~/.proffle末尾追加:bashexport NODE_HOME=$HOME/opt/node-v24 export PATH=$PATH:$NODE_HOME/bin -
激活编辑变量
bashsource ~/.proffle -
设置 npm 国内镜像源
bashnpm config set registry https://registry.npmmirror.com -
使用 npx nrm 快速切换镜像源
bashnpx nrm use taobao -
全局安装升级npm pnpm yarn
bashnpm install -g npm npm install -g pnpm npm install -g cnpmbashnpm config set allow-scripts=yarn --location=user npm install -g yarn -
设置 pnpm yarn 国内镜像源
bashpnpm config set registry https://registry.npmmirror.combashyarn config set registry https://registry.npmmirror.com
安装java
-
下载
bashwget -P ~/opt https://download.oracle.com/java/25/latest/jdk-25_linux-x64_bin.tar.gz -
解压并修改目录名称
bashtar -zxvf ~/opt/jdk-25_linux-x64_bin.tar.gz -C ~/opt mv ~/opt/jdk-25.0.3 ~/opt/jdk-25 -
配置环境变量,在
~/.proffle末尾追加:bashexport JAVA_HOME=$HOME/opt/jdk-25 export PATH=$PATH:$JAVA_HOME/bin -
激活编辑变量
bashsource ~/.proffle
安装maven
-
下载
bashwget -P ~/opt https://dlcdn.apache.org/maven/maven-3/3.9.16/binaries/apache-maven-3.9.16-bin.tar.gz -
解压并修改目录名称
bashtar -zxvf ~/opt/apache-maven-3.9.16-bin.tar.gz -C ~/opt mv ~/opt/apache-maven-3.9.16 ~/opt/maven -
配置环境变量,在
~/.proffle末尾追加:bashexport MAVEN_HOME=$HOME/opt/maven export PATH=$PATH:$MAVEN_HOME/bin -
修改配置文件
$HOME/opt/maven/conf/setting.xml
xml<?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <!-- | This is the configuration file for Maven. It can be specified at two levels: | | 1. User Level. This settings.xml file provides configuration for a single user, | and is normally provided in ${user.home}/.m2/settings.xml. | | NOTE: This location can be overridden with the CLI option: | | -s /path/to/user/settings.xml | | 2. Global Level. This settings.xml file provides configuration for all Maven | users on a machine (assuming they're all using the same Maven | installation). It's normally provided in | ${maven.conf}/settings.xml. | | NOTE: This location can be overridden with the CLI option: | | -gs /path/to/global/settings.xml | | The sections in this sample file are intended to give you a running start at | getting the most out of your Maven installation. Where appropriate, the default | values (values used when the setting is not specified) are provided. | |--> <settings xmlns="http://maven.apache.org/SETTINGS/1.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.2.0 https://maven.apache.org/xsd/settings-1.2.0.xsd"> <!-- localRepository | The path to the local repository maven will use to store artifacts. | | Default: ${user.home}/.m2/repository <localRepository>/path/to/local/repo</localRepository> --> <!-- interactiveMode | This will determine whether maven prompts you when it needs input. If set to false, | maven will use a sensible default value, perhaps based on some other setting, for | the parameter in question. | | Default: true <interactiveMode>true</interactiveMode> --> <!-- offline | Determines whether maven should attempt to connect to the network when executing a build. | This will have an effect on artifact downloads, artifact deployment, and others. | | Default: false <offline>false</offline> --> <!-- pluginGroups | This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e. | when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers | "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list. |--> <pluginGroups> <!-- pluginGroup | Specifies a further group identifier to use for plugin lookup. <pluginGroup>com.your.plugins</pluginGroup> --> </pluginGroups> <!-- TODO Since when can proxies be selected as depicted? --> <!-- proxies | This is a list of proxies which can be used on this machine to connect to the network. | Unless otherwise specified (by system property or command-line switch), the first proxy | specification in this list marked as active will be used. |--> <proxies> <!-- proxy | Specification for one proxy, to be used in connecting to the network. | <proxy> <id>optional</id> <active>true</active> <protocol>http</protocol> <username>proxyuser</username> <password>proxypass</password> <host>proxy.host.net</host> <port>80</port> <nonProxyHosts>local.net|some.host.com</nonProxyHosts> </proxy> --> </proxies> <!-- servers | This is a list of authentication profiles, keyed by the server-id used within the system. | Authentication profiles can be used whenever maven must make a connection to a remote server. |--> <servers> <!-- server | Specifies the authentication information to use when connecting to a particular server, identified by | a unique name within the system (referred to by the 'id' attribute below). | | NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are | used together. | <server> <id>deploymentRepo</id> <username>repouser</username> <password>repopwd</password> </server> --> <!-- Another sample, using keys to authenticate. <server> <id>siteServer</id> <privateKey>/path/to/private/key</privateKey> <passphrase>optional; leave empty if not used.</passphrase> </server> --> </servers> <!-- mirrors | This is a list of mirrors to be used in downloading artifacts from remote repositories. | | It works like this: a POM may declare a repository to use in resolving certain artifacts. | However, this repository may have problems with heavy traffic at times, so people have mirrored | it to several places. | | That repository definition will have a unique id, so we can create a mirror reference for that | repository, to be used as an alternate download site. The mirror site will be the preferred | server for that repository. |--> <mirrors> <!-- mirror | Specifies a repository mirror site to use instead of a given repository. The repository that | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used | for inheritance and direct lookup purposes, and must be unique across the set of mirrors. | <mirror> <id>mirrorId</id> <mirrorOf>repositoryId</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>http://my.repository.com/repo/path</url> </mirror> --> <!-- 阿里云中央仓库 --> <mirror> <!-- 镜像的唯一标识,maven 内部用,随便写但别重复 --> <id>aliyunmaven</id> <!-- 把 Maven 自带的"central"仓库(repo1.maven.org)全部重定向到阿里云 --> <mirrorOf>central</mirrorOf> <!-- 可读性名字,列表或报错时给人看 --> <name>阿里云公共仓库</name> <!-- 真实的镜像地址;注意你贴的那行有嵌套 <url> 标签,实际要写成纯文本 --> <url>https://maven.aliyun.com/repository/public</url> </mirror> </mirrors> <!-- profiles | This is a list of profiles which can be activated in a variety of ways, and which can modify | the build process. Profiles provided in the settings.xml are intended to provide local machine- | specific paths and repository locations which allow the build to work in the local environment. | | For example, if you have an integration testing plugin - like cactus - that needs to know where | your Tomcat instance is installed, you can provide a variable here such that the variable is | dereferenced during the build process to configure the cactus plugin. | | As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles | section of this document (settings.xml) - will be discussed later. Another way essentially | relies on the detection of a property, either matching a particular value for the property, | or merely testing its existence. Profiles can also be activated by JDK version prefix, where a | value of '1.4' might activate a profile when the build is executed on a JDK version of '1.4.2_07'. | Finally, the list of active profiles can be specified directly from the command line. | | NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact | repositories, plugin repositories, and free-form properties to be used as configuration | variables for plugins in the POM. | |--> <profiles> <!-- profile | Specifies a set of introductions to the build process, to be activated using one or more of the | mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/> | or the command line, profiles have to have an ID that is unique. | | An encouraged best practice for profile identification is to use a consistent naming convention | for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 'user-brett', etc. | This will make it more intuitive to understand what the set of introduced profiles is attempting | to accomplish, particularly when you only have a list of profile id's for debug. | | This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo. <profile> <id>jdk-1.4</id> <activation> <jdk>1.4</jdk> </activation> <repositories> <repository> <id>jdk14</id> <name>Repository for JDK 1.4 builds</name> <url>http://www.myhost.com/maven/jdk14</url> <layout>default</layout> <snapshotPolicy>always</snapshotPolicy> </repository> </repositories> </profile> --> <!-- | Here is another profile, activated by the property 'target-env' with a value of 'dev', which | provides a specific path to the Tomcat instance. To use this, your plugin configuration might | hypothetically look like: | | ... | <plugin> | <groupId>org.myco.myplugins</groupId> | <artifactId>myplugin</artifactId> | | <configuration> | <tomcatLocation>${tomcatPath}</tomcatLocation> | </configuration> | </plugin> | ... | | NOTE: If you just wanted to inject this configuration whenever someone set 'target-env' to | anything, you could just leave off the <value/> inside the activation-property. | <profile> <id>env-dev</id> <activation> <property> <name>target-env</name> <value>dev</value> </property> </activation> <properties> <tomcatPath>/path/to/tomcat/instance</tomcatPath> </properties> </profile> --> <profile> <id>jdk-25</id> <activation> <activeByDefault>true</activeByDefault> <jdk>25</jdk> </activation> <properties> <maven.compiler.source>25</maven.compiler.source> <maven.compiler.target>25</maven.compiler.target> <maven.compiler.compilerVersion>25</maven.compiler.compilerVersion> <maven.compiler.encoding>utf-8</maven.compiler.encoding> <project.build.sourceEncoding>utf-8</project.build.sourceEncoding> <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding> <maven.test.failure.ignore>true</maven.test.failure.ignore> <maven.test.skip>true</maven.test.skip> </properties> </profile> <profile> <id>jdk-21</id> <activation> <activeByDefault>true</activeByDefault> <jdk>21</jdk> </activation> <properties> <maven.compiler.source>21</maven.compiler.source> <maven.compiler.target>21</maven.compiler.target> <maven.compiler.compilerVersion>21</maven.compiler.compilerVersion> <maven.compiler.encoding>utf-8</maven.compiler.encoding> <project.build.sourceEncoding>utf-8</project.build.sourceEncoding> <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding> <maven.test.failure.ignore>true</maven.test.failure.ignore> <maven.test.skip>true</maven.test.skip> </properties> </profile> <profile> <id>jdk-17</id> <activation> <activeByDefault>true</activeByDefault> <jdk>17</jdk> </activation> <properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> <maven.compiler.compilerVersion>17</maven.compiler.compilerVersion> <maven.compiler.encoding>utf-8</maven.compiler.encoding> <project.build.sourceEncoding>utf-8</project.build.sourceEncoding> <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding> <maven.test.failure.ignore>true</maven.test.failure.ignore> <maven.test.skip>true</maven.test.skip> </properties> </profile> <profile> <id>jdk-8</id> <activation> <activeByDefault>true</activeByDefault> <jdk>8</jdk> </activation> <properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> <maven.compiler.compilerVersion>8</maven.compiler.compilerVersion> <maven.compiler.encoding>utf-8</maven.compiler.encoding> <project.build.sourceEncoding>utf-8</project.build.sourceEncoding> <project.reporting.outputEncoding>utf-8</project.reporting.outputEncoding> <maven.test.failure.ignore>true</maven.test.failure.ignore> <maven.test.skip>true</maven.test.skip> </properties> </profile> </profiles> <!-- activeProfiles | List of profiles that are active for all builds. | <activeProfiles> <activeProfile>alwaysActiveProfile</activeProfile> <activeProfile>anotherAlwaysActiveProfile</activeProfile> </activeProfiles> --> </settings> -
当前用户 maven 配置文件
bashmkdir $HOME/.m2bashcp -v $HOME/opt/maven/conf/setting.xml $HOME/.m2 -
激活编辑变量
bashsource ~/.proffle
安装scala
-
下载
bashwget -P ~/opt https://github.com/scala/scala/releases/download/v2.13.18/scala-2.13.18.tgz
bash
tar -zxvf ~/opt/scala-2.13.18.tgz -C ~/opt
mv ~/opt/scala-2.13.18 ~/opt/scala-2
-
配置环境变量,在
~/.proffle末尾追加:bashexport SCALA_HOME=$HOME/opt/jdk-25 export PATH=$PATH:$SCALA_HOME/bin -
激活编辑变量
bashsource ~/.proffle
安装zookeeper
-
下载
bashwget -P ~/opt https://dlcdn.apache.org/zookeeper/zookeeper-3.9.5/apache-zookeeper-3.9.5-bin.tar.gz -
解压并修改目录名称
bashtar -zxvf ~/opt/apache-zookeeper-3.9.5-bin.tar.gz -C ~/opt mv ~/opt/apache-zookeeper-3.9.5 ~/opt/zookeeper-3 -
配置环境变量,在
~/.proffle末尾追加:bashexport ZOOKEEPER_HOME=$HOME/opt/zookeeper-3 export PATH=$PATH:$ZOOKEEPER_HOME/bin -
激活编辑变量
bashsource ~/.proffle
安装hadoop
-
下载
bashwget -P ~/opt https://dlcdn.apache.org/hadoop/common/hadoop-3.5.0/hadoop-3.5.0.tar.gz -
解压并修改目录名称
bashtar -zxvf ~/opt/hadoop-3.5.0.tar.gz -C ~/opt mv ~/opt/hadoop-3.5.0 ~/opt/hadoop-3 -
配置环境变量,在
~/.proffle末尾追加:bashexport HDFS_NAMENODE_USER=lhz export HDFS_SECONDARYNAMENODE_USER=lhz export HDFS_DATANODE_USER=lhz export HDFS_ZKFC_USER=lhz export HDFS_JOURNALNODE_USER=lhz export HADOOP_SHELL_EXECNAME=lhz export YARN_RESOURCEMANAGER_USER=lhz export YARN_NODEMANAGER_USER=lhz export HADOOP_HOME=$HOME/opt/hadoop-3 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin -
激活编辑变量
bashsource ~/.proffle
1.8 完整环境变量
bash
export NODE_HOME=$HOME/opt/node-v24
export JAVA_HOME=$HOME/opt/jdk-25
export MAVEN_HOME=$HOME/opt/maven
export SCALA_HOME=$HOME/opt/jdk-25
export ZOOKEEPER_HOME=$HOME/opt/zookeeper-3
export HDFS_NAMENODE_USER=lhz
export HDFS_SECONDARYNAMENODE_USER=lhz
export HDFS_DATANODE_USER=lhz
export HDFS_ZKFC_USER=lhz
export HDFS_JOURNALNODE_USER=lhz
export HADOOP_SHELL_EXECNAME=lhz
export YARN_RESOURCEMANAGER_USER=lhz
export YARN_NODEMANAGER_USER=lhz
export HADOOP_HOME=$HOME/opt/hadoop-3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$NODE_HOME/bin:$JAVA_HOME/bin:$MAVEN_HOME/bin:$SCALA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
2. zookeeper
2.1 编辑配置文件
bash
cd $ZOOKEEPER_HOME/conf
bash
vim zoo.cfg
注释版:不推荐使用
bash
# 心跳单位,2s
tickTime=2000
# zookeeper-3初始化的同步超时时间,10个心跳单位,也即20s
initLimit=10
# 普通同步:发送一个请求并得到响应的超时时间,5个心跳单位也即10s
syncLimit=5
# 内存快照数据的存储位置
dataDir=/home/lhz/data/zookeeper-3/data
# 事务日志的存储位置
dataLogDir=/home/lhz/data/zookeeper-3/datalog
# 当前zookeeper-3节点的端口
clientPort=2181
# 单个客户端到集群中单个节点的并发连接数,通过ip判断是否同一个客户端,默认60
maxClientCnxns=1000
# 保留7个内存快照文件在dataDir中,默认保留3个
autopurge.snapRetainCount=7
# 清除快照的定时任务,默认1小时,如果设置为0,标识关闭清除任务
autopurge.purgeInterval=1
#允许客户端连接设置的最小超时时间,默认2个心跳单位
minSessionTimeout=4000
#允许客户端连接设置的最大超时时间,默认是20个心跳单位,也即40s,
maxSessionTimeout=300000
#zookeeper-3 3.5.5启动默认会把AdminService服务启动,这个服务默认是8080端口
admin.serverPort=9001
#集群地址配置
server.1=lihaozhe01:2888:3888
server.2=lihaozhe02:2888:3888
server.3=lihaozhe03:2888:3888
纯净版:推荐使用
bash
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/lhz/data/zookeeper-3/data
dataLogDir=/home/lhz/data/zookeeper-3/datalog
clientPort=2181
maxClientCnxns=1000
autopurge.snapRetainCount=7
autopurge.purgeInterval=1
minSessionTimeout=4000
maxSessionTimeout=300000
admin.serverPort=9001
server.1=lihaozhe01:2888:3888
server.2=lihaozhe02:2888:3888
server.3=lihaozhe03:2888:3888
2.2 保存后根据配置文件创建目录
在每台服务器上执行
bash
mkdir -p /home/lhz/data/zookeeper/data /home/lhz/data/zookeeper/datalog
2.3 myid
lihaozhe01
bash
echo 1 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid
lihaozhe02
bash
echo 2 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid
lihaozhe03
bash
echo 3 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid
2.4 编写zookeeper-3开机启动脚本
在/etc/systemd/system/文件夹下创建一个启动脚本zookeeper-3.service
注意:在每台服务器上编写
bash
cd /etc/systemd/system
vim zookeeper.service
内容如下:
bash
[Unit]
Description=zookeeper
After=syslog.target network.target
[Service]
Type=forking
Environment=ZOO_LOG_DIR=/home/lhz/data/zookeeper/datalog
Environment=JAVA_HOME=/home/lhz/opt/jdk-21
ExecStart=/home/lhz/opt/zookeeper-3/bin/zkServer.sh start
ExecStop=/home/lhz/opt/zookeeper-3/bin/zkServer.sh stop
Restart=always
User=lhz
Group=lhz
[Install]
WantedBy=multi-user.target`
bash
systemctl daemon-reload
# 等所有主机配置好后再执行以下命令
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper
3. hadoop
修改配置文件
bash
cd $HADOOP_HOME/etc/hadoop
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- workers
- mapred-site.xml
- yarn-site.xml
hadoop-env.sh 文件末尾追加
bash
# =========================================================================== #
# 以下为用户自定义环境变量
# =========================================================================== #
# --- Java ---
export JAVA_HOME=/home/lhz/opt/jdk-25
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
# --- 通用 ---
export HADOOP_LOG_DIR=/home/lhz/data/hadoop/logs
# --- HDFS 守护进程运行用户 (无 Kerberos 时必设) ---
export HDFS_NAMENODE_USER=lhz
export HDFS_SECONDARYNAMENODE_USER=lhz
export HDFS_DATANODE_USER=lhz
export HDFS_ZKFC_USER=lhz
export HDFS_JOURNALNODE_USER=lhz
export HADOOP_SHELL_EXECNAME=lhz
# --- YARN 守护进程运行用户 ---
export YARN_RESOURCEMANAGER_USER=lhz
export YARN_NODEMANAGER_USER=lhz
core-site.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- ======== 文件系统 ======== -->
<!-- 默认文件系统: HDFS NameNode 地址 (RPC 端口默认 8020) -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://lihaozhe:8020</value>
</property>
<!-- ======== 本地目录 ======== -->
<!-- Hadoop 运行时临时目录 (NameNode/DataNode 元数据、日志、pid 等的根目录) -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/lhz/data/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>lihaozhe01:2181,lihaozhe02:2181,lihaozhe03:2181</value>
</property>
<!-- ======== 安全与权限 ======== -->
<!-- 禁用 HDFS 权限检查 (测试环境简化操作) -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- Web UI 静态用户: 访问 HDFS/MapReduce Web 页面时显示的用户名 -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>lhz</value>
</property>
<!-- ======== 代理用户 (Hive doAs 依赖) ======== -->
<!-- 用户 lhz 可代理的主机: * 表示所有节点 -->
<property>
<name>hadoop.proxyuser.lhz.hosts</name>
<value>*</value>
</property>
<!-- 用户 lhz 可代理的用户组: * 表示所有组 -->
<property>
<name>hadoop.proxyuser.lhz.groups</name>
<value>*</value>
</property>
<!-- 用户 lhz 可代理的用户: * 表示所有用户 -->
<property>
<name>hadoop.proxyuser.lhz.users</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>lihaozhe</value>
</property>
<property>
<name>dfs.ha.namenodes.lihaozhe</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.lihaozhe.nn1</name>
<value>lihaozhe01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.lihaozhe.nn2</name>
<value>lihaozhe02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.lihaozhe.nn1</name>
<value>lihaozhe01:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.lihaozhe.nn2</name>
<value>lihaozhe02:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://lihaozhe01:8485;lihaozhe02:8485;lihaozhe03:8485/lihaozhe</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.lihaozhe</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/lhz/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/lhz/data/hadoop/journalnode/data</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.safemode.threshold.pct</name>
<value>1</value>
</property>
</configuration>
workers
bash
lihaozhe01
lihaozhe02
lihaozhe03
mapred-site.xml
xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- ======== 执行框架 ======== -->
<!-- 执行引擎: 使用 YARN 作为资源管理器 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- MR 应用 classpath: YARN 容器加载 MR 相关 JAR 的路径 -->
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
<!-- ======== 任务内存 ======== -->
<!-- Map Task JVM 最大堆内存 (MB), 需 < yarn.scheduler.maximum-allocation-mb -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
<!-- Reduce Task JVM 最大堆内存 (MB) -->
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>
<!-- ======== JobHistory 服务 ======== -->
<!-- JobHistory Server RPC 监听地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>lihaozhe01:10020</value>
</property>
<!-- JobHistory Web UI 地址 (浏览器访问 http://lihaozhe:19888) -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>lihaozhe01:19888</value>
</property>
</configuration>
yarn-site.xml
xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- ======== YARN 守护进程配置 ======== -->
<!-- YARN 容器使用的 Java 可执行文件绝对路径 -->
<property>
<name>yarn.nodemanager.java.path</name>
<value>/home/lhz/opt/jdk-25/bin/java</value>
</property>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>lihaozhe01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>lihaozhe02</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>lihaozhe01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>lihaozhe02:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>lihaozhe01:2181,lihaozhe02:2181,lihaozhe03:2181</value>
</property>
<!-- ======== NodeManager 环境变量 ======== -->
<!-- 环境变量白名单: 允许从 NM 传递到容器的环境变量 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
</value>
</property>
<!-- Shuffle 辅助服务: MapReduce 中间数据混洗所必需 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- ======== MapReduce 容器环境变量 ======== -->
<!-- ApplicationMaster JVM 环境变量 -->
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>JAVA_HOME=/home/lhz/opt/jdk-25,HADOOP_MAPRED_HOME=/home/lhz/opt/hadoop-3</value>
</property>
<!-- ApplicationMaster JVM 启动参数 (-Xmx) -->
<property>
<name>yarn.app.mapreduce.am.command-opts</name>
<value>-Xmx1024m</value>
</property>
<!-- Map Task 容器 JVM 环境变量 -->
<property>
<name>mapreduce.map.env</name>
<value>JAVA_HOME=/home/lhz/opt/jdk-25,HADOOP_MAPRED_HOME=/home/lhz/opt/hadoop-3</value>
</property>
<!-- Reduce Task 容器 JVM 环境变量 -->
<property>
<name>mapreduce.reduce.env</name>
<value>JAVA_HOME=/home/lhz/opt/jdk-25,HADOOP_MAPRED_HOME=/home/lhz/opt/hadoop-3</value>
</property>
<!-- ======== 资源调度 (内存) ======== -->
<!-- NodeManager 可分配给所有容器的物理内存总量 (MB) -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<!-- Scheduler 单个容器允许请求的最小内存 (MB) -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<!-- Scheduler 单个容器允许请求的最大内存 (MB) -->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!-- ======== 内存检查 (测试环境关闭) ======== -->
<!-- 禁用物理内存超限检查, 避免测试时容器被 YARN kill -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!-- 禁用虚拟内存超限检查 -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!-- ======== 日志聚集 ======== -->
<!-- 开启日志聚集: 容器日志在任务结束后上传到 HDFS -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志保留时长 (秒), 604800 = 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- JobHistory Server 日志查询地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://lihaozhe:19888/jobhistory/logs</value>
</property>
</configuration>
4. 配置ssh免密钥登录
创建本地秘钥并将公共秘钥写入认证文件
bash
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
bash
ssh-copy-id lhz@lihaozhe01
bash
ssh-copy-id lhz@lihaozhe02
bash
ssh-copy-id lhz@lihaozhe03
5. 分发密钥
bash
scp -r ~/.ssh/ lhz@lihaozhe02:~/
scp -r ~/.ssh/ lhz@lihaozhe03:~/
6. 分发并激活环境变量
bash
scp -r ~/.profile lhz@lihaozhe02:~/
scp -r ~/.profile lhz@lihaozhe03:~/
在各节点执行以下命令
bash
source ~/.profile
7. 启动zookeeper
7.1 myid
lihaozhe01
bash
echo 1 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid
lihaozhe02
bash
echo 2 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid
lihaozhe03
bash
echo 3 > /home/lhz/data/zookeeper-3/data/myid
more /home/lhz/data/zookeeper-3/data/myid
7.2 启动服务
在各节点执行以下命令
bash
systemctl daemon-reload
systemctl start zookeeper
systemctl enable zookeeper
systemctl status zookeeper
7.3 验证
bash
jps
bash
zkServer.sh status
8. Hadoop初始化
bash
1. 启动三个zookeeper:zkServer.sh start
2. 启动三个JournalNode:
hadoop-daemon.sh start journalnode 或者 hdfs --daemon start journalnode
3. 在其中一个namenode上格式化:hdfs namenode -format
4. 把刚刚格式化之后的元数据拷贝到另外一个namenode上
a) 启动刚刚格式化的namenode :
hadoop-daemon.sh start namenode 或者 hdfs --daemon start namenode
b) 在没有格式化的namenode上执行:hdfs namenode -bootstrapStandby
c) 启动第二个namenode:
hadoop-daemon.sh start namenode 或者 hdfs --daemon start namenode
5. 在其中一个namenode上初始化 hdfs zkfc -formatZK
6. 停止上面节点:stop-dfs.sh
7. 全面启动:start-all.sh
8. 启动resourcemanager节点
yarn-daemon.sh start resourcemanager 或者 start-yarn.sh
http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.5.0.tar
不需要执行第 8 步
9. 启动历史服务
mapred --daemon start historyserver
10 11 12 不需要执行
10、安全模式
hdfs dfsadmin -safemode enter
hdfs dfsadmin -safemode leave
11、查看哪些节点是namenodes并获取其状态
hdfs getconf -namenodes
hdfs haadmin -getServiceState lihaozhe01
12、强制切换状态
hdfs haadmin -transitionToActive --forcemanual lihaozhe01
重点提示:
bash
# 关机之前 依关闭服务
stop-yarn.sh
stop-dfs.sh
# 开机后 依次开启服务
start-dfs.sh
start-yarn.sh
或者
bash
# 关机之前关闭服务
stop-all.sh
# 开机后开启服务
start-all.sh
bash
#jps 检查进程正常后开启胡哦关闭在再做其它操作