2024世界职业技能大赛大数据平台搭建hadoop(容器环境)

任务A:大数据平台搭建(容器环境)(15分)

环境说明:

html 复制代码
服务端登录地址详见各任务服务端说明。
补充说明:宿主机可通过Asbru工具或SSH客户端进行SSH访问;
相关软件安装包在宿主机的/opt目录下,请选择对应的安装包进行安装,用不到的可忽略;
所有任务中应用命令必须采用绝对路径;
进入Master节点的方式为
docker exec -it master /bin/bash
进入Slave1节点的方式为
docker exec -it slave1 /bin/bash
进入Slave2节点的方式为
docker exec -it slave2 /bin/bash
三个容器节点的root密码均为123456

子任务一:Hadoop 完全分布式安装配置

评分标准
主要知识与技能点 分值
JDK的解压安装 1
JDK的环境变量配置 1
Host配置及三个节点的分发 1
Hadoop解压安装及环境初始化 2
Hadoop集群启动并查看 2

本任务需要使用root用户完成相关配置,安装Hadoop需要配置前置环境。命令中要求使用绝对路径,具体要求如下:

任务1

1、 从宿主机/opt目录下将文件hadoop-3.1.3.tar.gz、jdk-8u212-linux-x64.tar.gz复制到容器Master中的/opt/software路径中(若路径不存在,则需新建),将Master节点JDK安装包解压到/opt/module路径中(若路径不存在,则需新建),将JDK解压命令复制并粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

用终端连接宿主机
检查容器是否启动

​ docker ps -a

连接master

docker exec -it master /bin/bash

检查/opt/software路径是否存在
shell 复制代码
[root@master ~]# ls /opt/
module  software
把宿主机里的资料复制到容器里面去
shell 复制代码
[root@Bigdata ~]# docker cp /opt/jdk-8u212-linux-x64.tar.gz master:/opt/software

Successfully copied 195MB to master:/opt/software

shell 复制代码
[root@Bigdata ~]# docker cp /opt/hadoop-3.1.3.tar.gz master:/opt/software

Successfully copied 338MB to master:/opt/software

将Master节点JDK安装包解压到/opt/module
shell 复制代码
tar zxvf /opt/software/jdk-8u212-linux-x64.tar.gz -C /opt/module/
任务2

2、 修改容器中/etc/profile文件,设置JDK环境变量并使其生效,配置完毕后在Master节点分别执行"java -version"和"javac"命令,将命令行执行结果分别截图并粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

重命名jdk文件夹
shell 复制代码
mv /opt/module/jdk1.8.0_212 /opt/module/java
在/etc/profile文件末尾写环境变量
shell 复制代码
#JAVA_HOME
export JAVA_HOME=/opt/module/java

#PATH
export PATH=$PATH:$JAVA_HOME/bin
使环境变量生效
shell 复制代码
source /etc/profile
输入 java -version
xml 复制代码
java version "1.8.0_212"
Java(TM) SE Runtime Environment (build 1.8.0_212-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode)
输入 javac
xml 复制代码
Usage: javac <options> <source files>
where possible options include:
  -g                         Generate all debugging info
  -g:none                    Generate no debugging info
  -g:{lines,vars,source}     Generate only some debugging info
  -nowarn                    Generate no warnings
  -verbose                   Output messages about what the compiler is doing
  -deprecation               Output source locations where deprecated APIs are used
  -classpath <path>          Specify where to find user class files and annotation processors
  -cp <path>                 Specify where to find user class files and annotation processors
  -sourcepath <path>         Specify where to find input source files
  -bootclasspath <path>      Override location of bootstrap class files
  -extdirs <dirs>            Override location of installed extensions
  -endorseddirs <dirs>       Override location of endorsed standards path
  -proc:{none,only}          Control whether annotation processing and/or compilation is done.
  -processor <class1>[,<class2>,<class3>...] Names of the annotation processors to run; bypasses default discovery process
  -processorpath <path>      Specify where to find annotation processors
  -parameters                Generate metadata for reflection on method parameters
  -d <directory>             Specify where to place generated class files
  -s <directory>             Specify where to place generated source files
  -h <directory>             Specify where to place generated native header files
  -implicit:{none,class}     Specify whether or not to generate class files for implicitly referenced files
  -encoding <encoding>       Specify character encoding used by source files
  -source <release>          Provide source compatibility with specified release
  -target <release>          Generate class files for specific VM version
  -profile <profile>         Check that API used is available in the specified profile
  -version                   Version information
  -help                      Print a synopsis of standard options
  -Akey[=value]              Options to pass to annotation processors
  -X                         Print a synopsis of nonstandard options
  -J<flag>                   Pass <flag> directly to the runtime system
  -Werror                    Terminate compilation if warnings occur
  @<filename>                Read options and filenames from file
任务3

3、 请完成host相关配置,将三个节点分别命名为master、slave1、slave2,并做免密登录,用scp命令并使用绝对路径从Master复制JDK解压后的安装文件到slave1、slave2节点(若路径不存在,则需新建),并配置slave1、slave2相关环境变量,将全部scp复制JDK的命令复制并粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

分别进入三个容器检查ip

输入 ifconfig

xml 复制代码
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.102  netmask 255.255.255.0  broadcast 192.168.100.255
        ether 00:50:56:80:3e:d7  txqueuelen 0  (Ethernet)
        RX packets 6934  bytes 575269 (561.7 KiB)
        RX errors 0  dropped 334  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
编写/etc/hosts文件 vi /etc/hosts

在末尾添加

xml 复制代码
192.168.100.102 master
192.168.100.103 slave1
192.168.100.104 slave2
配置免密

输入 ssh-keygen 一直敲回车

xml 复制代码
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:a5KqXjGa6r1CO1pe9cG9bR3Pp2om6BstpOB9l6SW24E root@master
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|                 |
|       . .       |
|    + . S o   .  |
| . + * = O + . + |
|. = + = E.* o . +|
| B.o . =.*.oo  ..|
|=o*+o  .+..+...  |
+----[SHA256]-----+
复制密钥
shell 复制代码
ssh-copy-id master

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"

The authenticity of host 'master (192.168.100.102)' can't be established.

RSA key fingerprint is SHA256:nuf/qVhd2k6k0u5t7GvhylqRi+4xMC3MNGmJKJNXipo.

RSA key fingerprint is MD5:e6:fd:96:10:3f:2c:9a:68:40:cc:d7:7c:e2:ee:6e:67.

Are you sure you want to continue connecting (yes/no)? yes

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

root@master's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'master'"

and check to make sure that only the key(s) you wanted were added.

shell 复制代码
ssh-copy-id slave1

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"

The authenticity of host 'slave1 (192.168.100.103)' can't be established.

RSA key fingerprint is SHA256:nuf/qVhd2k6k0u5t7GvhylqRi+4xMC3MNGmJKJNXipo.

RSA key fingerprint is MD5:e6:fd:96:10:3f:2c:9a:68:40:cc:d7:7c:e2:ee:6e:67.

Are you sure you want to continue connecting (yes/no)? yes

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

root@slave1's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'slave1'"

and check to make sure that only the key(s) you wanted were added.

shell 复制代码
ssh-copy-id slave2

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"

The authenticity of host 'slave2 (192.168.100.104)' can't be established.

RSA key fingerprint is SHA256:nuf/qVhd2k6k0u5t7GvhylqRi+4xMC3MNGmJKJNXipo.

RSA key fingerprint is MD5:e6:fd:96:10:3f:2c:9a:68:40:cc:d7:7c:e2:ee:6e:67.

Are you sure you want to continue connecting (yes/no)? yes

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

root@slave2's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'slave2'"

and check to make sure that only the key(s) you wanted were added.

复制hosts文件到slave机器上
shell 复制代码
scp /etc/hosts slave1:/etc/

hosts 100% 219 218.0KB/s 00:00

shell 复制代码
scp /etc/hosts slave2:/etc/

hosts 100% 219 218.0KB/s 00:00

在两台slave机器里也配置免密连接
shell 复制代码
ssh-keygen
ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2
复制jdk到slave1、slave2
shell 复制代码
scp -rq /opt/module/java slave1:/opt/module
scp -rq /opt/module/java slave2:/opt/module
复制环境变量
shell 复制代码
scp /etc/profile slave1:/etc/profile
scp /etc/profile slave2:/etc/profile
任务4

4、 在Master将Hadoop解压到/opt/module(若路径不存在,则需新建)目录下,并将解压包分发至slave1、slave2中,其中master、slave1、slave2节点均作为datanode,配置好相关环境,初始化Hadoop环境namenode,将初始化命令及初始化结果截图(截取初始化结果日志最后20行即可)粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

解压Hadoop
shell 复制代码
[root@master ~]# tar zxvf /opt/software/hadoop-3.1.3.tar.gz -C /opt/module/
重命名文件夹
shell 复制代码
[root@master ~]# mv /opt/module/hadoop-3.1.3 /opt/module/hadoop
环境配置 vi /etc/profile
xml 复制代码
增加下面代码
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop

#PATH
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

​​​​

使环境生效
shell 复制代码
source /etc/profile
配置hadoop文件
core-site.xml
shell 复制代码
 vi /opt/module/hadoop/etc/hadoop/core-site.xml
xml 复制代码
    <!-- 指定NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020</value>
    </property>

    <!-- 指定hadoop数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop/data</value>
    </property>
hdfs-site.xml
shell 复制代码
 vi /opt/module/hadoop/etc/hadoop/hdfs-site.xml
xml 复制代码
	<!-- nn web端访问地址-->
	<property>
        <name>dfs.namenode.http-address</name>
        <value>master:9870</value>
    </property>
	<!-- 2nn web端访问地址-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave2:9868</value>
    </property>
yarn-site.xml
shell 复制代码
vi /opt/module/hadoop/etc/hadoop/yarn-site.xml
xml 复制代码
<!-- 指定MR走shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!-- 指定ResourceManager的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>

    <!-- 环境变量的继承 -->	·
    <property>
        <name>yarn.nodemanager.env-whitelist</name>       				 
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,
CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
mapred-site.xml
shell 复制代码
vi /opt/module/hadoop/etc/hadoop/mapred-site.xml
xml 复制代码
<!-- 指定MapReduce程序运行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
hadoop-env.sh
shell 复制代码
vi /opt/module/hadoop/etc/hadoop/hadoop-env.sh

把下面代码复制到这个文件末尾

sh 复制代码
export JAVA_HOME=/opt/module/java
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
workers
shell 复制代码
vi /opt/module/hadoop/etc/hadoop/workers

localhost删除这个填下面的代码

shell 复制代码
master
slave1
slave2
分发到集群去
分发环境变量
shell 复制代码
[root@master ~]# scp /etc/profile slave1:/etc/
[root@master ~]# scp /etc/profile slave2:/etc/
分发hadoop
shell 复制代码
[root@master ~]# scp -rq /opt/module/hadoop slave1:/opt/module/
[root@master ~]# scp -rq /opt/module/hadoop slave2:/opt/module/
初始化Hadoop环境namenode
shell 复制代码
[root@master ~]# hdfs namenode -format
任务5

5、 启动Hadoop集群(包括hdfs和yarn),使用jps命令查看Master节点与slave1节点的Java进程,将jps命令与结果截图粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下。

启动Hadoop集群
shell 复制代码
[root@master ~]# start-all.sh 

或者

shell 复制代码
[root@master ~]#  /opt/module/hadoop/sbin/start-all.sh 
使用jps命令查看Master节点与slave1节点的Java进程
shell 复制代码
[root@master ~]# jps
shell 复制代码
[root@slave1 ~]# jps
12161 Jps
11112 NodeManager
10846 DataNode
shell 复制代码
[root@slave2 ~]# jps
10451 DataNode
10821 NodeManager
10582 SecondaryNameNode
11944 Jps

测试web网页是否正常

相关推荐
DolphinScheduler社区12 分钟前
10月月报 | Apache DolphinScheduler进展总结
大数据
大菠萝爱上小西瓜1 小时前
使用etl工具kettle的日常踩坑梳理之二、从Hadoop中导出数据
数据仓库·hadoop·etl
叫我二蛋1 小时前
大数据技术之Hadoop :我是恁爹
大数据·hadoop·分布式
谢李由202303220811 小时前
Hadoop 学习心得
大数据·hadoop·分布式
qq_446598041 小时前
Hadoop高可用集群工作原理
大数据·hadoop·分布式
努力的小雨1 小时前
零基础入门Hadoop:IntelliJ IDEA远程连接服务器中Hadoop运行WordCount
大数据
songqq272 小时前
kafka和Flume的整合
分布式·kafka·flume
AC学术中心2 小时前
EI检索!2024年大数据与数据挖掘会议(BDDM 2024)全解析!
大数据·人工智能·数据挖掘
qq_446598042 小时前
集群搭建高可用
大数据