Hadoop 分布式存储与计算框架详解

Hadoop开发实战:https://www.borimooc.com/course/1004.htm

hadoop是适合海量数据的分布式存储,和分布式计算的框架

  • hadoop有三大组件:
  1. mapreduce:适合海量数据的分布式计算,分为map阶段、shuffle阶段和reduce阶段
  2. hdfs:分布式文件系统,适合海量数据的分布式存储
  3. yarn:资源调度工具

在使用Hadoop之前,首先要了解Hadoop的安装模式。

  • Hadoop的安装模式分为三种:
  1. 单机(Standalone mode)
  2. 伪分布式模式(Pseudo-Distributed mode)
  3. 全分布式模式(Cluster mode)

具体介绍如下:

  1. 单机模式:Hadoop的默认模式是单机模式。在不了解硬件安装环境的情况下,Hadoop第一次解压其源码包时,它保守地选择了最小配置。Hadoop会完全运行在本地。因此它不需要与其他节点进行交互,那么它也就不使用HDFS,也不加载任何Hadoop守护进程。单机模式不需要启动任何服务即可使用,一般只用于调试。
  2. 伪分布式模式:伪分布式模式是完全分布式模式的一个特例。Hadoop程序的守护进程运行在一台节点上,使用伪分布式模式来调试Hadoop分布式程序中的代码,以及验证程序是否准确执行。

1 用户权限

在安装Hadoop集群中所有的组件全部使用hd用户安装。(上传、解压、配置、启动、关闭等)、

root 是Linux系统的超级用户。一般不要使用这个用户。在我们安装集群的时候,只有在修改/etc/profile的文件的时候才会需要用到root用户。

在进行Hadoop集群安装的时候请再三确认当前用户是谁

root@localhost \~\] 当前用户是root localhost计算机名 \~用户的根目录(/root) \[hd@localhost \~\] 当前用户是hd localhost计算机名 \~用户的根目录(/home/hd) \`\`\`Plain Text root 超级管理员(系统文件修改) hd 普通角色用户 (在/home/hd/\* 进行所有增删改查) # 2 获取机器的IP地址 \`\`\`PowerShell \[root@localhost \~\]# ifconfig eno16777736: flags=4163\ mtu 1500 inet 192.168.126.128 netmask 255.255.255.0 broadcast 192.168.126.255 如果没有查看IP的命令就安装net-tools \[hd@localhost root\]$ su root Password: \[root@localhost \~\]*# yum install -y net-tools* ## ******3 设置网卡为静态的地址****** 方法一: #切换root用户 \[hd@bogon Desktop\]$ su root Password: #修改网卡 \[root@bogon Desktop\]# vi /etc/sysconfig/network-scripts/ifcfg-eth0 #把以下这些修改就可以: BOOTPROTO="static" #修改 ONBOOT="yes" #修改 IPADDR=192.168.245.20 #修改 #重启网卡 \[root@bogon Desktop\]# service network restart 重新登录再查看IP信息 \[root@bogon Desktop\]# ifconfig 方法二: 网络连接方式为nat![](https://i-blog.csdnimg.cn/direct/ac222695184841f2b1b7647cc87a55db.png) *image-20210605110613959* 查询自己的网关![](https://i-blog.csdnimg.cn/direct/c5dcbebd6d3946cea8aeb7a6da6121d8.png) *在这里插入图片描述* 在终端输入 nmtui 命令,进入 NetworkManager Plain Text \[root@local \~\]nmtui![](https://i-blog.csdnimg.cn/direct/d6af5961f9694be99ff253cd0bba3750.png) *image-20210605105104249* 根据查看到的网关,添加到Gateway中。address为ip地址。Dns为8.8.8.8![](https://i-blog.csdnimg.cn/direct/893496c6ed5347cab68f76fbeb5fb363.png) *image-20210605105220429* 设置好ip之后进行保存,重启网络服务![](https://i-blog.csdnimg.cn/direct/f39263bc926140a3b307234397fe8211.png) *image-20210605110018031* ![](https://i-blog.csdnimg.cn/direct/85411ec20e954b188470b618ca641b5f.png) *image-20210605110355272* 本地ping一下。保证网络畅通。![](https://i-blog.csdnimg.cn/direct/c493768e03eb4e71babd35f5b7231fda.png) *image-20210605110209887* 使用shell工具进行远程连接![](https://i-blog.csdnimg.cn/direct/1ca4a8a9425b4f8b8a71c7cc7ec98949.png) *image-20210605111101609* ## ******4 Linux安装Java环境****** 因为Hadoop由Java语言开发,Hadoop集群的使用同样依赖于Java环境,所以在安装Hadoop集群前,需要先安装并配置好JDK。 ### ******4.1 把Linux自带Java环境删除****** \[hd@localhost \~\]$ su root Password: \[root@localhost hd\]*# yum remove -y java\** ### ******4.2 上传Java包****** \[root@localhost hd\]*# su hd* \[hd@localhost \~\]$ \[hd@localhost \~\]$ \[hd@localhost \~\]$ pwd /home/hd \[hd@localhost \~\]$ mkdir apps *#上传到此目录* \[hd@localhost \~\]$ cd apps/ \[hd@localhost apps\]$ *#上传过程* \[hd@localhost apps\]$ ll total 178952 -rw-rw-r--. 1 hd hd 183246769 Apr 26 2018 jdk-8u121-linux-x64.tar.gz ### ******4.3 解压java包****** *#解压* \[hd@localhost apps\]$ tar -zxvf jdk-8u121-linux-x64.tar.gz \[hd@localhost apps\]$ ll total 178956 drwxr-xr-x. 8 hd hd 4096 Dec 12 2016 jdk1.8.0_121 -rw-rw-r--. 1 hd hd 183246769 Apr 26 2018 jdk-8u121-linux-x64.tar.gz \[hd@localhost apps\]$ *#目录改名* \[hd@localhost apps\]$ mv jdk1.8.0_121/ java \[hd@localhost apps\]$ ll total 178956 drwxr-xr-x. 8 hd hd 4096 Dec 12 2016 java -rw-rw-r--. 1 hd hd 183246769 Apr 26 2018 jdk-8u121-linux-x64.tar.gz ### ******4.4 配置java环境****** \[hd@localhost apps\]$ su root Password: \[root@localhost apps\]*# cd java/* \[root@localhost java\]*# pwd* /home/hd/apps/java \[root@localhost java\]*#* \[root@localhost java\]*# vi /etc/profile* 使用vi编辑器,在/etc/profile增加java环境变量 Properties files export JAVA_HOME=/home/hd/apps/java export PATH=$PATH:$JAVA_HOME/bin 重加载一下系统环境 \[root@localhost java\]*# source /etc/profile* \[root@localhost java\]*# java -version* java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode) ### ******4.5 配置第二台,第三台机器的Java环境****** 使用scp远程拷贝命令 scp file2 \[\[user@\]host2:\]file2 1.把每一台机的java目录,拷贝到第二台机器 \[root@localhost apps\]*# su hd* \[hd@localhost apps\]$ \[hd@localhost apps\]$ scp -r java hd@192.168.126.129:/home/hd/apps/ 2.把每一台机的profile文件,拷贝到第二台机器 \[hd@localhost apps\]$ su root Password: \[root@localhost apps\]*# scp /etc/profile root@192.168.126.129:/etc/* The authenticity of host '192.168.126.129 (192.168.126.129)' can't be established. ECDSA key fingerprint is fb:0a:7a:9f:9a:bc:4f:ff:66:29:1d:1d:b9:a0:35:d1. Are you sure you want to **continue** connecting (yes/no)? yes Warning: Permanently added '192.168.126.129' (ECDSA) to the list of known hosts. root@192.168.126.129's password: profile 100% 1820 1.8KB/s 00:00 \[root@localhost apps\]*#* 3.第二台机器加载profile \[hd@localhost apps\]$ source /etc/profile \[hd@localhost apps\]$ \[hd@localhost apps\]$ java -version java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode) \[hd@localhost apps\]$ 第三台机,执行以上的步骤 ## ******5 安装hadoop之前准备****** ### ******5.1 修改主机名****** 1. 第一台机器master 2. 第二台机器slave01 3. 第三台机器slave02 \[hd@localhost \~\]$ hostnamectl set-hostname master ==== AUTHENTICATING FOR org.freedesktop.hostname1.set-static-hostname === Authentication is required to set the statically configured local host name, as well as the pretty host name. Authenticating as: root Password: ==== AUTHENTICATION COMPLETE === Plain Text \[hd@localhost \~\]$ hostnamectl set-hostname slave01 Plain Text \[hd@localhost \~\]$ hostnamectl set-hostname slave02 方法二 \[root@localhost \~\]$nmtui ### ******5.2 修改/etc/hosts 文件****** \[hd@master \~\]$ su root Password: \[root@master hd\]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.126.128 master 192.168.126.129 slave01 192.168.126.130 slave02 同步到第二,三台机器 #第二台机器 \[root@master hd\]# scp /etc/hosts root@slave01:/etc/ #第三台机器 \[root@master hd\]# scp /etc/hosts root@slave02:/etc/ ### ******5.3 关闭防火墙****** 启动:service firewalld start systemctl start firewalld 查看状态:service firewalld status systemctl status firewalld 停止: service firewalld disable systemctl disable firewalld 禁用:service firewalld stop systemctl stop firewalld 重启:service firewalld restart systemctl restart firewalld ### ******5.4 免密登录****** 需要做的免密的机器 Plain Text 机器----\>机器(免密登录) master ----\> slave01 master ----\> slave02 master ----\> master #### ******5.4.1 生成密钥****** \[hd@master \~\]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/hd/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hd/.ssh/id_rsa. Your public key has been saved in /home/hd/.ssh/id_rsa.pub. The key fingerprint is: ef:ff:98:6c:a4:66:ca:66:a0:cd:a4:da:75:9c:c0:9f hd@slave02 The key's randomart image is: +--\[ RSA 2048\]----+ \| \| \| \| \| \| \| . \| \| o S \| \| o+ + . \| \| \*..E .o \| \| .o.ooo.+..o \| \| ... oo+.o=.. \| +-----------------+ #### ******5.4.2 拷贝密钥到你需要免密登录的机器****** \[hd@master \~\]$ ssh-copy-id slave02 The authenticity of host 'slave02 (192.168.126.130)' can't be established. ECDSA key fingerprint is 09:57:a3:56:3b:5f:f0:01:55:0e:42:f3:4c:43:3d:d5. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys hd@slave02's password: ​ Number of key(s) added: 1 ​ Now try logging into the machine, with: "ssh 'slave02'" and check to make sure that only the key(s) you wanted were added. #### ******5.4.3 测试免密登录****** \[hd@master \~\]$ ssh slave02 ## ******6 Hadoop安装****** ### ******6.1 上传hadoop安装包****** ### ******6.2 解压安装包****** \[hd@master apps\]$ su hd Password: \[hd@master apps\]$ pwd /home/hd/apps \[hd@master apps\]$ tar -zxvf hadoop-2.8.1.tar.gz ### ******6.3 改目录名称****** hd@master apps\]$ mv hadoop-2.8.1 hadoop \[hd@master apps\]$ ll ### ******6.4 修改hadoop配置文件****** #### ******6.4.1 修改hadoop-env.sh****** /hadoop/etc/hadoop/ #在文件的尾部(按"G"可以跳到文档的尾部),增加 export JAVA_HOME=/home/hd/apps/java #### ******6.4.2 修改core-site.xml****** \<**configuration** \> *\

相关推荐
FuckPatience1 天前
WPF 具有跨线程功能的UI元素
wpf
诗仙&李白1 天前
HEFrame.WpfUI :一个现代化的 开源 WPF UI库
ui·开源·wpf
He BianGu1 天前
【笔记】在WPF中Binding里的详细功能介绍
笔记·wpf
He BianGu1 天前
【笔记】在WPF中 BulletDecorator 的功能、使用方式并对比 HeaderedContentControl 与常见 Panel 布局的区别
笔记·wpf
123梦野2 天前
WPF——效果和可视化对象
wpf
He BianGu2 天前
【笔记】在WPF中Decorator是什么以及何时优先考虑 Decorator 派生类
笔记·wpf
时光追逐者3 天前
一款专门为 WPF 打造的开源 Office 风格用户界面控件库
ui·开源·c#·.net·wpf
He BianGu3 天前
【笔记】介绍 WPF XAML 中 Binding 的 StringFormat详细功能
笔记·wpf
Rotion_深4 天前
C# WPF使用线程池运行Action方法
c#·wpf·线程池
攻城狮CSU4 天前
WPF 深入系列.2.布局系统.尺寸属性
wpf