Hadoop 分布式集群配置(OpenEuler 1主2)

一、集群架构规划

1. 节点角色分配

|-------------|----------------------------------|------------------|--------------------------|
| 节点 hostname | IP 地址(示例) | 角色 | 核心进程 |
| master | [192.168.0.100](192.168.0.100) | 主节点(NameNode+RM) | NameNode、ResourceManager |
| slave1 | [192.168.0.101](192.168.0.101) | 从节点(DataNode+NM) | DataNode、NodeManager |
| slave2 | [192.168.0.102](192.168.0.102) | 从节点(DataNode+NM) | DataNode、NodeManager |

2. 环境要求(每个节点均需满足)

  • 操作系统: 22.03 LTS
  • 实验平台:天枢一体化虚拟仿真平台
  • 网络要求:三节点互通(关闭防火墙或开放必要端口)、DNS 解析正常
  • 依赖软件:JDK 8(所有节点版本一致)
  • 管理用户:统一使用 hadoopd 普通用户(替代root用户,提升安全性)

二、所有节点前置环境准备

1. 统一系统基础配置(3个节点均执行,初始需用root用户操作)

(1)创建hadoopd管理用户并授权

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 创建hadoopd用户(设置密码时输入自定义密码,牢记) useradd hadoopd passwd hadoopd # 授予hadoopd免密sudo权限(避免执行系统命令反复输入密码) echo "hadoopd ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers # 验证sudo权限(切换用户测试,无密码即可执行sudo命令) su - hadoopd sudo ls /root # 无报错即授权成功 |

(2)设置主机名(分别在对应节点执行,hadoopd用户下用sudo)

  • master 节点:

|--------------------------------------|
| sudo hostnamectl set-hostname master |

  • slave1 节点:

|--------------------------------------|
| sudo hostnamectl set-hostname slave1 |

  • slave2 节点:

|--------------------------------------|
| sudo hostnamectl set-hostname slave2 |

(3)配置主机名解析(3个节点均执行,hadoopd用户下)

|-----------------------------------------------------------------------|
| # 安装vim(若未安装) sudo dnf install -y vim # 编辑hosts文件 sudo vim /etc/hosts |

添加内容(替换为实际IP):

|----------------------------------------------------------------|
| 192.168.1.100 master 192.168.1.101 slave1 192.168.1.102 slave2 |

验证解析:

|---------------------------------------------------|
| ping master -c 2 && ping slave1 -c 2 # 互通无丢包即正常 |

(4)关闭防火墙与 SELinux(3个节点均执行,hadoopd用户下用sudo)

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 关闭防火墙(永久关闭,避免重启失效) sudo systemctl stop firewalld sudo systemctl disable firewalld # 关闭 SELinux(临时+永久) sudo setenforce 0 sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config |

(5)安装基础依赖(3个节点均执行,hadoopd用户下用sudo)

|----------------------------------------------------------------------------------------|
| sudo dnf update -y sudo dnf install -y openssh-server openssh-clients rsync which wget |

2. 安装 JDK 8(3个节点均执行,hadoopd用户下用sudo)

(1)安装 JDK

|-----------------------------------------------------------------|
| sudo dnf install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel |

(2)配置 JAVA_HOME 环境变量(全局生效,所有用户可用)

|-----------------------|
| sudo vim /etc/profile |

添加内容:

|-----------------------------------------------------------------------------------|
| export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk export PATH=PATH:JAVA_HOME/bin |

生效并验证(hadoopd用户下):

|--------------------------------------------------|
| source /etc/profile java -version # 输出 1.8.x 即正确 |

3. 配置 SSH 免密登录(关键:master节点hadoopd用户到所有节点hadoopd用户免密)

(1)启动 SSH 服务(3个节点均执行,hadoopd用户下用sudo)

|------------------------------------------------------|
| sudo systemctl start sshd sudo systemctl enable sshd |

(2)master节点hadoopd用户生成密钥对(仅master节点hadoopd用户执行)

|-----------------------------------------------------------------------------------------------------------|
| # 切换到hadoopd用户(若当前为root) su - hadoopd # 生成RSA密钥对(一路回车,无密码) ssh-keygen -t rsa -P "" # 密钥存于 ~hadoopd/.ssh/ |

(3)配置免密登录所有节点(仅master节点hadoopd用户执行)

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 免密登录自身(master节点hadoopd) cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 修复密钥文件权限(必须严格配置,否则SSH免密失效) chmod 700 ~/.ssh # .ssh目录仅当前用户可访问 chmod 600 ~/.ssh/id_rsa # 私钥仅当前用户可读写 chmod 644 ~/.ssh/id_rsa.pub # 公钥仅当前用户可写、其他用户可读 chmod 600 ~/.ssh/authorized_keys # 授权文件仅当前用户可读写 # 免密登录slave1节点hadoopd用户 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopd@slave1 # 免密登录slave2节点hadoopd用户 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoopd@slave2 |

若没有 authorized_keys需执行命令:

touch ~/.ssh/authorized_keys,生成空的authorized_keys文件

在将客户端公钥追加到该文件中,例如本地公钥的添加命令:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

(4)验证免密登录(仅master节点hadoopd用户执行)

|----------------------------------------------------------------------------------------------|
| ssh hadoopd@master # 无需密码,直接登录 ssh hadoopd@slave1 # 无需密码,直接登录 ssh hadoopd@slave2 # 无需密码,直接登录 |

(若提示输入密码,重新检查authorized_keys权限:chmod 600 ~/.ssh/authorized_keys)

三、Hadoop 安装与配置(master节点hadoopd用户操作,后同步到从节点)

1. master节点安装 Hadoop(hadoopd用户下执行)

(1)下载并解压 Hadoop

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 下载Hadoop 3.3.6(稳定版) wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz # 解压到/usr/local/(需sudo权限) sudo tar -zxvf hadoop-3.3.6.tar.gz -C /usr/local/ # 重命名简化目录 sudo mv /usr/local/hadoop-3.3.6 /usr/local/hadoop # 授权hadoopd用户为目录所有者(关键:避免权限不足) sudo chown -R hadoopd:hadoopd /usr/local/hadoop |

(2)配置 Hadoop 环境变量(hadoopd用户级配置,仅hadoopd可用)

|-----------------------------|
| # 编辑用户环境变量文件 vim ~/.bashrc |

添加内容:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| export HADOOP_HOME=/usr/local/hadoop export PATH=PATH:HADOOP_HOME/bin:HADOOP_HOME/sbin export HADOOP_CONF_DIR=HADOOP_HOME/etc/hadoop export HADOOP_MAPRED_HOME=HADOOP_HOME export HADOOP_COMMON_HOME=HADOOP_HOME export HADOOP_HDFS_HOME=HADOOP_HOME export YARN_HOME=HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=HADOOP_HOME/lib" |

生效并验证:

|--------------------------------------------------------|
| source ~/.bashrc hadoop version # 显示Hadoop 3.3.6版本即成功 |

2. 核心配置文件修改(master节点hadoopd用户,路径:$HADOOP_HOME/etc/hadoop/)

(1)修改 [hadoop-env.sh](hadoop-env.sh)(指定JDK路径和用户)

|-------------------------------------------|
| vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh |

添加/修改:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk export HADOOP_PID_DIR=HADOOP_HOME/pids export HADOOP_LOG_DIR=HADOOP_HOME/logs export HADOOP_USER_NAME=hadoopd # 指定Hadoop运行用户为hadoopd |

(2)修改 core-site.xml(核心通信配置,替换root为hadoopd)

|-------------------------------------------|
| vim $HADOOP_HOME/etc/hadoop/core-site.xml |

在 <configuration> 内添加:

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <property> <!-- 指定NameNode地址(master主机名+端口) --> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <!-- Hadoop临时目录(自定义,避免/tmp被系统清理) --> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <!-- 允许hadoopd用户操作Hadoop(替换原root) --> <name>hadoop.http.staticuser.user</name> <value>hadoopd</value> </property> <property> <!-- 解决跨节点访问HDFS权限问题 --> <name>hadoop.proxyuser.hadoopd.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoopd.groups</name> <value>*</value> </property> |

(3)修改 hdfs-site.xml(HDFS存储配置)

|-------------------------------------------|
| vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml |

添加:

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <property> <!-- 数据副本数(2个从节点,设为2,避免副本不足) --> <name>dfs.replication</name> <value>2</value> </property> <property> <!-- NameNode元数据存储目录 --> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <!-- DataNode数据存储目录 --> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> <property> <!-- 开启WebHDFS访问 --> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <!-- 关闭HDFS权限检查(测试环境,生产环境建议开启) --> <name>dfs.permissions.enabled</name> <value>false</value> </property> <property> <!-- NameNode Web UI端口(默认9870) --> <name>dfs.namenode.http-address</name> <value>master:9870</value> </property> |

(4)修改 mapred-site.xml(MapReduce配置)

复制模板并修改:

|---------------------------------------------|
| vim $HADOOP_HOME/etc/hadoop/mapred-site.xml |

添加:

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <property> <!-- 指定MapReduce运行在YARN上 --> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <!-- 任务历史服务器地址(master节点) --> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <!-- 任务历史Web UI地址 --> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <!-- 每个Map任务内存分配(根据节点内存调整) --> <name>mapreduce.map.memory.mb</name> <value>1024</value> </property> <property> <!-- 每个Reduce任务内存分配 --> <name>mapreduce.reduce.memory.mb</name> <value>2048</value> </property> |

(5)修改 yarn-site.xml(YARN资源管理配置)

|-------------------------------------------|
| vim $HADOOP_HOME/etc/hadoop/yarn-site.xml |

添加:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <property> <!-- ResourceManager地址(master节点) --> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <!-- ResourceManager调度地址 --> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <!-- ResourceManager资源跟踪地址 --> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <!-- NodeManager辅助服务(MapReduce shuffle) --> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <!-- YARN Web UI地址 --> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <!-- 关闭内存检查(避免OpenEuler内存误判) --> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <!-- 每个节点可用内存(总内存-系统占用,示例:4GB节点设为3072MB) --> <name>yarn.nodemanager.resource.memory-mb</name> <value>3072</value> </property> <property> <!-- 每个CPU核心内存分配(1024MB/核) --> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> |

(6)修改 workers(指定从节点列表)

|-------------------------------------|
| vim $HADOOP_HOME/etc/hadoop/workers |

删除默认内容,添加从节点主机名(每行一个):

|---------------|
| slave1 slave2 |

3. 同步 Hadoop 到所有从节点(master节点hadoopd用户执行)

(1)同步安装目录(免密登录已配置,直接同步)

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 同步到slave1节点(hadoopd用户目录) /usr/local/是系统级目录,默认仅 root 用户拥有写入权限,hadoop 用户对该目录无写入权限需先传输到用户主目录,再移动(无需修改系统目录权限) #将文件先传输到 slave1 的 hadoop 用户主目录(主目录默认对 hadoop 用户有写入权限) scp -r /usr/local/hadoop hadoop@slave1:~ # 登录slave1 ssh hadoopd@slave1# 移动文件(需输入slave1的root密码) sudo mv ~/hadoop /usr/local/ # 同步到slave2节点(步骤与slave1相同) |

(2)同步环境变量(slave1、slave2节点hadoopd用户执行)

在slave1和slave2节点分别执行:

|-----------------------------|
| # 编辑用户环境变量文件 vim ~/.bashrc |

添加与master节点hadoopd用户一致的Hadoop环境变量:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| export HADOOP_HOME=/usr/local/hadoop export PATH=PATH:HADOOP_HOME/bin:HADOOP_HOME/sbin export HADOOP_CONF_DIR=HADOOP_HOME/etc/hadoop export HADOOP_MAPRED_HOME=HADOOP_HOME export HADOOP_COMMON_HOME=HADOOP_HOME export HADOOP_HDFS_HOME=HADOOP_HOME export YARN_HOME=HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=HADOOP_HOME/lib" |

生效并验证:

|---------------------------------------------------|
| source ~/.bashrc hadoop version # 显示3.3.6版本即同步成功 |

4. 创建数据目录并授权(3个节点hadoopd用户均执行)

根据配置文件路径创建目录(hadoopd用户拥有完整权限):

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 创建临时目录、数据目录、日志目录等 mkdir -p HADOOP_HOME/tmp mkdir -p HADOOP_HOME/hdfs/name mkdir -p HADOOP_HOME/hdfs/data mkdir -p HADOOP_HOME/pids mkdir -p HADOOP_HOME/logs # 确保目录权限(hadoopd用户可读写执行) chmod -R 755 HADOOP_HOME/ |

四、集群启动与验证

1. 格式化 HDFS(首次启动必需,仅执行一次!仅master节点)

|-----------------------|
| hdfs namenode -format |

2. 启动集群服务 仅master节点

(1)一键启动所有服务

|--------------|
| start-all.sh |

3. 集群状态验证 任意节点登录

1 )Web界面验证(浏览器访问,替换为master实际IP)

  • 左侧「Datanodes」:显示slave1、slave2两个节点(状态为In Service)
  • 右侧「Summary」:副本数2,可用空间正常
  • 左侧「Nodes」:显示2个活跃节点(slave1、slave2)
  • 「Cluster Metrics」:内存、CPU资源正常

五、生产环境优化建议

  • 权限收紧:移除hadoopd用户的免密sudo权限,仅授权必要命令(如systemctl)
  • 日志轮转:配置Hadoop日志轮转(修改$HADOOP_HOME/etc/hadoop/log4j.properties),避免日志占满磁盘
  • 数据备份:定期备份NameNode元数据(hdfs dfsadmin -saveNamespace),备份目录权限设为hadoopd所有
  • 资源调整:根据节点硬件配置优化yarn-site.xml和mapred-site.xml中的内存、CPU分配
  • 防火墙精细化配置:开放必要端口(9000、9870、8088、19888、10020),禁止无关端口访问
相关推荐
是阿威啊4 小时前
【第六站】测试本地项目连接虚拟机上的大数据集群
大数据·linux·hive·hadoop·spark·yarn
青木川崎5 小时前
hive实战
数据仓库·hive·hadoop
是阿威啊5 小时前
【第五站】集群组件一键启动/关闭脚本(Hadoop/YARN + Hive + Spark)
linux·运维·hive·hadoop·spark
青木川崎5 小时前
大数据技术之hive
大数据·hive·hadoop
搬砖快乐~5 小时前
面经:大数据开发岗-初面 面试题(40分钟)
大数据·hadoop·spark·kafka·面试题·面经
是阿威啊5 小时前
【第三站】本地虚拟机部署hive集群
linux·数据仓库·hive·hadoop·分布式
qq_2704900966 小时前
基于Hadoop的教育大数据可视化系统的设计与实现
大数据·hadoop·信息可视化
xerthwis2 天前
Hadoop:大数据世界的“古老基石”与“沉默的共生者”
大数据·人工智能·hadoop
yumgpkpm2 天前
Cloudera CDH5|CDH6|CDP7.1.7|CDP7.3|CMP 7.3的产品优势分析(在华为鲲鹏 ARM 麒麟KylinOS、统信UOS)
大数据·人工智能·hadoop·深度学习·spark·transformer·cloudera