文章目录
前言
提示:这里可以添加本文要记录的大概内容:
入门一下大数据,配置一下hadoop
两台服务器,阿里云,一台ESC一台轻量级应用
系统:ESC 为Alibaba Cloud Linux 3.2104 LTS 64位 可以看作Centos
轻量级为ubuntu22.04
配的hadoop为一主两从,本文中主是ubuntu,从是ubuntu和centos,启动使用非root用户
本文为事后回忆,可能会有错误和遗忘的地方,欢迎指出
提示:以下是本篇文章正文内容,下面案例可供参考
一、修改hosts和hostname
nano /etc/hosts
```
ip1 master
ip2 slave
```
两台机器需要分别进行修改,本机写内网ip,另一台是公网ip
用hostname 将对应的映射名称修改为主机名,这个映射名可以自行修改
二、创建hadoopuser
ubuntu
创建一个新用户:
打开终端,使用以下命令创建一个新用户(例如,命名为hadoopuser):
bash
sudo adduser hadoopuser
sudo usermod -aG sudo hadoopuser
为新用户设置密码:
bash
sudo passwd hadoopuser
切换到新用户:
bash
su - hadoopuser
centos
在 CentOS 系统中,默认可能没有 sudo 组,这是因为 sudo 组是可选的,并且在某些安装过程中可能不会被创建。如果你尝试将一个用户添加到 sudo 组,但系统提示该组不存在,你可以通过以下步骤来创建这个组并添加用户:
创建 sudo 组:
bash
sudo groupadd sudo
将用户添加到 sudo 组:
假设你的用户名为 hadoopuser,则可以使用以下命令:
bash
sudo usermod -aG sudo hadoopuser
验证用户是否已添加到 sudo 组:
bash
groups hadoopuser
输出应该包含 sudo。
确保 sudo 组在 /etc/sudoers 文件中有适当的权限:
编辑 /etc/sudoers 文件,确保包含类似以下的行:
bash
%sudo ALL=(ALL:ALL) ALL
这表示 sudo 组的成员可以执行 sudo 命令。
测试 sudo 权限:
切换到 hadoopuser 用户,然后尝试运行 sudo 命令,例如:
bash
su - hadoopuser
sudo whoami
如果一切正常,它应该返回 root。
切换到新用户:
bash
su - hadoopuser
三、安装jdk和hadoop
1.安装jdk
centos
使用yum进行安装,注意要安装对版本
sudo yum install java-1.8.0-openjdk-devel.x86_64
nano ~/.bashrc
为何是.bashrc而不是/etc/profile
因为ssh登陆不走profile
写入以下内容
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-2.0.1.1.al8.x86_64/jre
export CLASSPATH=.:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
export PATH JAVA_HOME CLASSPATH
随后激活和测试安装是否成功
source ~/.bashrc
echo $JAVA_HOME
java -version
ubuntu
更新软件包列表:
sudo apt-get update
安装openjdk-8-jdk:
sudo apt-get install openjdk-8-jdk
查看java版本,看看是否安装成功:
java -version
修改环境变量
nano ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 #目录要换成自己jdk所在目录
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
随后激活和测试安装是否成功
source ~/.bashrc
echo $JAVA_HOME
java -version
2、安装hadoop
ubuntu
bash
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -zvxf hadoop-3.3.6.tar.gz
nano ~/.bashrc
export HADOOP_HOME=/home/hadoopuser/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/home/hadoopuser/hadoop-3.3.6/lib/native/"
用echo自行测试
centos
bash
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -zvxf hadoop-3.3.6.tar.gz
nano ~/.bashrc
export HADOOP_HOME=/home/hadoopuser/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/home/hadoopuser/hadoop-3.3.6/lib/native/"
用echo自行测试
四、hadoop配置
大部分文件都在/home/hadoopuser/etc/hadoop
hadoop-env.sh
bash
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 z
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
export HDFS_NAMENODE_USER=hadoopuser
export HDFS_DATANODE_USER=hadoopuser
export HDFS_SECONDARYNAMENODE_USER=hadoopuser
export YARN_RESOURCEMANAGER_USER=hadoopuser
export YARN_NODEMANAGER_USER=hadoopuser
core-site.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<!--指定 namenode 的 hdfs 协议文件系统的通信地址-->
<name>fs.defaultFS</name>
<value>hdfs://paul:9000</value>
</property>
<property>
<!--指定 hadoop 集群存储临时文件的目录-->
<name>hadoop.tmp.dir</name>
<value>/home/hadoopuser/hadoop/tmp</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoopuser</value>
</property>
</configuration>
hdfs-site.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 设置NameNode Web端访问地址 -->
<property>
<name>dfs.namenode.rpc-address</name>
<value>paul:9000</value>
</property>
<!-- 设置SecondNameNode Web端访问地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>paul:50090</value>
</property>
<!-- 配置NameNode 元数据存放位置-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoopuser/hadoop/hdfs/name</value>
</property>
<!-- 配置DataNode在本地磁盘存放block的位置-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoopuser/hadoop/hdfs/data</value>
</property>
<!-- 配置数据块的副本数,配置默认是3,应小于DataNode机器数量-->
<property>
<name>dfs.replication</name>
<value>2
</value>
</property>
</configuration>
maperd-site.xml
xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 设置MapReduce程序默认运行模式: yarn集群模式 local本地模式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 设置 MapReduce历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>paul:10020</value>
</property>
<!-- 设置 MapReduce 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>paul:19888</value>
</property>
</configuration>
yarn-site.xml
xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- 指定MR 使用那种协议,默认是空值,推荐使用 mapreduce_shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>paul</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://paul:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
workers
paul
slave
非root启动时,会有十秒间隔。会点shell的话,可以去sbin下,把start-all.sh和stop-all.sh中等待时间的部分删掉
打包压缩,复制给从服务器
从服务器的hadoop-env.sh需要改JAVA_HOME
bash
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-2.0.1.1.al8.x86_64/jre
5、启动
1、权限
赋予hadoopuser对应目录的权限
bash
sudo chown -R hadoopuser:hadoopuser /home/hadoopuser/hadoop-3.3.6
sudo chmod -R 700 /home/hadoopuser/hadoop-3.3.6
sudo chown -R hadoopuser:hadoopuser /home/hadoopuser/hadoop
chmod -R 700 /home/hadoopuser/hadoop
2、格式化
bash
hadoop namenode -format
3、启动
bash
./sbin/start-all.sh
./bin/mapred --daemon start historyserver
./bin/hdfs dfsadmin -report
4、查看启动结果
bash
jps
bash
./bin/hdfs dfsadmin -report
访问主ip:9870
5、关闭
bash
./sbin/stop-all.sh
总结
接下来就可以使用java调用hadoop的api进行操作了,接下来的部分将在下一篇文章写
踩坑
无法启动
JAVA_HOME
记得改JAVA_HOME的环境变量,走~/.bashrc
主从格式化版本
如果一直启动不起来,可以手动删一下hadoop的数据路径里的文件,本文中的/home/hadoopuser/hadoop,一般还有/tmp。
没有namenode
看看端口有没有开放,防火墙问题。
看看格式化版本问题
重启重启
root启动后
建议不要root启动,特别是云服务器,已经被挖矿程序kswapd0打成筛子了