Hadoop大数据应用:HDFS 集群节点扩容

目录

一、实验

1.环境

[2.HDFS 集群节点扩容](#2.HDFS 集群节点扩容)

二、问题

[1.rsync 同步报错](#1.rsync 同步报错)


一、实验

1.环境

(1)主机

表1 主机

|--------|-------------------------------------------------------------|--------|-------|----------------|----|
| 主机 | 架构 | 软件 | 版本 | IP | 备注 |
| hadoop | NameNode (已部署) SecondaryNameNode (已部署) ResourceManager(已部署) | hadoop | 2.7.7 | 192.168.204.50 | |
| node01 | DataNode(已部署) NodeManager(已部署) | hadoop | 2.7.7 | 192.168.204.51 | |
| node02 | DataNode(已部署) NodeManager(已部署) | hadoop | 2.7.7 | 192.168.204.52 | |
| node03 | DataNode(已部署) NodeManager(已部署) | hadoop | 2.7.7 | 192.168.204.53 | |
| node04 | DataNode | hadoop | 2.7.7 | 192.168.204.54 | |

(2)查看jps

hadoop节点

bash 复制代码
[root@hadoop hadoop]# jps

node01节点

node02节点

node03节点

(3) 查看节点

bash 复制代码
[root@hadoop hadoop]# ./bin/yarn node -list
24/03/14 13:40:21 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.204.50:8032
Total Nodes:3
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
    node01:40551                RUNNING       node01:8042                                  0
    node02:46073                RUNNING       node02:8042                                  0
    node03:40601                RUNNING       node03:8042                                  0

2.HDFS 集群节点扩容

(1)查看IP

地址为192.168.204.54

bash 复制代码
[root@localhost ~]# ip addr

(2)安全机制

查看

bash 复制代码
[root@localhost ~]# sestatus

关闭

bash 复制代码
[root@localhost ~]# vim /etc/selinux/config
......
SELINUX=disabled
......

再次查看(需要reboot重启)

bash 复制代码
[root@localhost ~]# sestatus

(3)防火墙

关闭

bash 复制代码
[root@localhost ~]# systemctl stop firewalld
[root@localhost ~]# systemctl mask firewalld

(4)安装java

bash 复制代码
[root@localhost ~]# yum install -y java-1.8.0-openjdk-devel.x86_64

查看

bash 复制代码
[root@localhost ~]# jps

(5)修改主机名

bash 复制代码
[root@localhost ~]# hostnamectl set-hostname node04
[root@localhost ~]# bash

(6)添加免密登录

bash 复制代码
[root@hadoop ~]# cd /root/.ssh/
[root@hadoop .ssh]# ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
[root@hadoop .ssh]# ssh-copy-id -i id_rsa.pub 192.168.204.54

验证:

bash 复制代码
[root@hadoop .ssh]# ssh 192.168.204.54

(7)域名主机名(hadoop节点)

bash 复制代码
[root@hadoop ~]# vim /etc/hosts
......
192.168.205.50 hadoop
192.168.205.51 node01
192.168.205.52 node02
192.168.205.53 node03
192.168.204.54 node04

(8)同步域名配置文件

bash 复制代码
[root@hadoop ~]# rsync -av /etc/hosts node01:/etc/
sending incremental file list
hosts

sent 359 bytes  received 41 bytes  266.67 bytes/sec
total size is 269  speedup is 0.67
[root@hadoop ~]# rsync -av /etc/hosts node02:/etc/
sending incremental file list
hosts

sent 359 bytes  received 41 bytes  800.00 bytes/sec
total size is 269  speedup is 0.67
[root@hadoop ~]# rsync -av /etc/hosts node03:/etc/
sending incremental file list
hosts

sent 359 bytes  received 41 bytes  800.00 bytes/sec
total size is 269  speedup is 0.67
[root@hadoop ~]# rsync -av /etc/hosts node04:/etc/
Warning: Permanently added 'node04' (ECDSA) to the list of known hosts.
sending incremental file list
hosts

sent 359 bytes  received 41 bytes  266.67 bytes/sec
total size is 269  speedup is 0.67

(9)同步Hadoop文件

bash 复制代码
[root@hadoop ~]# rsync -aXSH --delete /usr/local/hadoop node04:/usr/local/

(10) 清除日志(node04节点)

bash 复制代码
[root@node04 ~]# cd /usr/local/hadoop/
[root@node04 hadoop]# ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
[root@node04 hadoop]# cd logs/
[root@node04 logs]# ls
hadoop-root-namenode-hadoop.log    hadoop-root-secondarynamenode-hadoop.log    SecurityAuth-root.audit
hadoop-root-namenode-hadoop.out    hadoop-root-secondarynamenode-hadoop.out    yarn-root-resourcemanager-hadoop.log
hadoop-root-namenode-hadoop.out.1  hadoop-root-secondarynamenode-hadoop.out.1  yarn-root-resourcemanager-hadoop.out
[root@node04 logs]# rm -f *
[root@node04 logs]# ls

(11)查看slaves (hadoop节点)

bash 复制代码
[root@hadoop ~]# cd /usr/local/hadoop/etc/hadoop/
[root@hadoop hadoop]# cat slaves

(12)添加slaves

bash 复制代码
 [root@hadoop hadoop]# vim slaves
  node01
  node02
  node03
  node04

(13)同步配置到所有主机

bash 复制代码
[root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node01:/usr/local/hadoop/
[root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node02:/usr/local/hadoop/
[root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node03:/usr/local/hadoop/
[root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node04:/usr/local/hadoop/

(14)启动服务 (node04节点)

bash 复制代码
[root@node04 hadoop]# ./sbin/hadoop-daemon.sh start datanode

查看jps

(15) 验证 (hadoop节点)

查看报告,Live datanodes 显示节点为4个。

bash 复制代码
[root@hadoop hadoop]# ./bin/hdfs dfsadmin -report
Configured Capacity: 822126559232 (765.67 GB)
Present Capacity: 798787727360 (743.93 GB)
DFS Remaining: 798786990080 (743.93 GB)
DFS Used: 737280 (720 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (4):

Name: 192.168.204.54:50010 (node04)
Hostname: node04
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 5658746880 (5.27 GB)
DFS Remaining: 199872888832 (186.15 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.25%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:00:23 CST 2024


Name: 192.168.204.53:50010 (node03)
Hostname: node03
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 266240 (260 KB)
Non DFS Used: 5621547008 (5.24 GB)
DFS Remaining: 199909826560 (186.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.26%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:00:24 CST 2024


Name: 192.168.204.51:50010 (node01)
Hostname: node01
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 180224 (176 KB)
Non DFS Used: 6029209600 (5.62 GB)
DFS Remaining: 199502249984 (185.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:00:22 CST 2024


Name: 192.168.204.52:50010 (node02)
Hostname: node02
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 286720 (280 KB)
Non DFS Used: 6029328384 (5.62 GB)
DFS Remaining: 199502024704 (185.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:00:25 CST 2024

(16)查看命令

设置带宽命令为 -setBalancerBandwidth

bash 复制代码
[root@hadoop hadoop]# ./bin/hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
        [-report [-live] [-dead] [-decommissioning]]
        [-safemode <enter | leave | get | wait>]
        [-saveNamespace]
        [-rollEdits]
        [-restoreFailedStorage true|false|check]
        [-refreshNodes]
        [-setQuota <quota> <dirname>...<dirname>]
        [-clrQuota <dirname>...<dirname>]
        [-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]
        [-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]
        [-finalizeUpgrade]
        [-rollingUpgrade [<query|prepare|finalize>]]
        [-refreshServiceAcl]
        [-refreshUserToGroupsMappings]
        [-refreshSuperUserGroupsConfiguration]
        [-refreshCallQueue]
        [-refresh <host:ipc_port> <key> [arg1..argn]
        [-reconfig <datanode|...> <host:ipc_port> <start|status>]
        [-printTopology]
        [-refreshNamenodes datanode_host:ipc_port]
        [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
        [-setBalancerBandwidth <bandwidth in bytes per second>]
        [-fetchImage <local directory>]
        [-allowSnapshot <snapshotDir>]
        [-disallowSnapshot <snapshotDir>]
        [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
        [-getDatanodeInfo <datanode_host:ipc_port>]
        [-metasave filename]
        [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
        [-help [cmd]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

(17)设置带宽平衡数据

000为KB,000000为MB,

500+000000 为500MB

bash 复制代码
[root@hadoop hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 500000000

执行脚本

bash 复制代码
[root@hadoop hadoop]# ./sbin/start-balancer.sh

(18)查看状态

DFS Used 为使用情况

bash 复制代码
[root@hadoop hadoop]# ./bin/hdfs dfsadmin -report
Configured Capacity: 822126559232 (765.67 GB)
Present Capacity: 798788423680 (743.93 GB)
DFS Remaining: 798787682304 (743.93 GB)
DFS Used: 741376 (724 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (4):

Name: 192.168.204.54:50010 (node04)
Hostname: node04
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 5658730496 (5.27 GB)
DFS Remaining: 199872901120 (186.15 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.25%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:16:33 CST 2024


Name: 192.168.204.53:50010 (node03)
Hostname: node03
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 266240 (260 KB)
Non DFS Used: 5620936704 (5.23 GB)
DFS Remaining: 199910436864 (186.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.27%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:16:33 CST 2024


Name: 192.168.204.51:50010 (node01)
Hostname: node01
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 180224 (176 KB)
Non DFS Used: 6029176832 (5.62 GB)
DFS Remaining: 199502282752 (185.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:16:34 CST 2024


Name: 192.168.204.52:50010 (node02)
Hostname: node02
Decommission Status : Normal
Configured Capacity: 205531639808 (191.42 GB)
DFS Used: 286720 (280 KB)
Non DFS Used: 6029291520 (5.62 GB)
DFS Remaining: 199502061568 (185.80 GB)
DFS Used%: 0.00%
DFS Remaining%: 97.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 14 15:16:34 CST 2024

二、问题

1.rsync 同步报错

(1)报错

(2)原因分析

同步主机名称错误。

(3)解决方法

修改同步主机名称。

bash 复制代码
[root@hadoop ~]# rsync -av /etc/hosts node01:/etc/
相关推荐
0和1的舞者1 小时前
网络通信的奥秘:网络层ip与路由详解(四)
大数据·网络·计算机网络·计算机·智能路由器·计算机科学与技术
尘似鹤1 小时前
linux驱动学习---有些节点不会生成platform_device,怎么访问它们
linux
iCxhust1 小时前
windows环境下在Bochs中运行Linux0.12系统
linux·运维·服务器·windows·minix
WLJT1231231232 小时前
九寨:在山水间触摸生活的诗意
大数据·生活
七七七七074 小时前
【计算机网络】深入理解ARP协议:工作原理、报文格式与安全防护
linux·服务器·网络·计算机网络·安全
Elastic 中国社区官方博客5 小时前
在 Elasticsearch 中使用 Mistral Chat completions 进行上下文工程
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
橙色云-智橙协同研发5 小时前
从 CAD 图纸到 Excel 数据:橙色云智橙 PLM 打造制造企业数字化协同新模式
大数据·功能测试·云原生·cad·plm·云plm·bom提取
喝可乐的希饭a6 小时前
Elasticsearch 的 Routing 策略详解
大数据·elasticsearch·搜索引擎
lhxcc_fly8 小时前
Linux网络--8、NAT,代理,网络穿透
linux·服务器·网络·nat
摇滚侠8 小时前
Spring Boot3零基础教程,Spring Boot 应用打包成 exe 可执行文件,笔记91 笔记92 笔记93
linux·spring boot·笔记