基于Hadoop平台的电信客服数据的处理与分析③项目开发:搭建基于Hadoop的全分布式集群---任务7:格式化并启动Hadoop集群

任务描述

任务内容为格式化并启动Hadoop集群,并修复可能出现的Bug。

任务指导

Hadoop集群启动前需要在NameNode上格式化元数据,成功格式化后才能启动Hadoop的HDFS和YARN。

格式化启动Hadoop集群的步骤如下:

  1. 在NameNode(master1)格式化Hadoop的元数据(只需要第一次启动集群时执行一次)

  2. 启动HDFS集群

  3. 启动YARN集群

任务实现

  1. 格式化集群的NameNode(在master1执行)

    [root@master1 ~]# hdfs namenode -format

回显如下:

复制代码
[root@master1 ~]# hdfs namenode -format
23/10/18 08:57:10 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master1/192.168.3.129
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.10.1
STARTUP_MSG:   classpath = ...略
STARTUP_MSG:   build = https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816; compiled by 'centos' on 2020-09-14T13:17Z
STARTUP_MSG:   java = 1.8.0_181
************************************************************/
23/10/18 08:57:10 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
23/10/18 08:57:10 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-c67c639d-7eec-459d-9b60-3d5e696ccce8
23/10/18 08:57:10 INFO namenode.FSEditLog: Edit logging is async:true
23/10/18 08:57:10 INFO namenode.FSNamesystem: KeyProvider: null
23/10/18 08:57:10 INFO namenode.FSNamesystem: fsLock is fair: true
23/10/18 08:57:10 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
23/10/18 08:57:10 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
23/10/18 08:57:10 INFO namenode.FSNamesystem: supergroup          = supergroup
23/10/18 08:57:10 INFO namenode.FSNamesystem: isPermissionEnabled = false
23/10/18 08:57:10 INFO namenode.FSNamesystem: HA Enabled: false
23/10/18 08:57:10 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
23/10/18 08:57:10 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
23/10/18 08:57:10 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
23/10/18 08:57:10 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
23/10/18 08:57:10 INFO blockmanagement.BlockManager: The block deletion will start around 2023 Oct 18 08:57:10
23/10/18 08:57:10 INFO util.GSet: Computing capacity for map BlocksMap
23/10/18 08:57:10 INFO util.GSet: VM type       = 64-bit
23/10/18 08:57:10 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
23/10/18 08:57:10 INFO util.GSet: capacity      = 2^21 = 2097152 entries
23/10/18 08:57:10 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
23/10/18 08:57:10 WARN conf.Configuration: No unit for dfs.heartbeat.interval(3) assuming SECONDS
23/10/18 08:57:10 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
23/10/18 08:57:10 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
23/10/18 08:57:10 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
23/10/18 08:57:10 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
23/10/18 08:57:10 INFO blockmanagement.BlockManager: defaultReplication         = 2
23/10/18 08:57:10 INFO blockmanagement.BlockManager: maxReplication             = 512
23/10/18 08:57:10 INFO blockmanagement.BlockManager: minReplication             = 1
23/10/18 08:57:10 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
23/10/18 08:57:10 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
23/10/18 08:57:10 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
23/10/18 08:57:10 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
23/10/18 08:57:10 INFO namenode.FSNamesystem: Append Enabled: true
23/10/18 08:57:10 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
23/10/18 08:57:10 INFO util.GSet: Computing capacity for map INodeMap
23/10/18 08:57:10 INFO util.GSet: VM type       = 64-bit
23/10/18 08:57:10 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
23/10/18 08:57:10 INFO util.GSet: capacity      = 2^20 = 1048576 entries
23/10/18 08:57:10 INFO namenode.FSDirectory: ACLs enabled? false
23/10/18 08:57:10 INFO namenode.FSDirectory: XAttrs enabled? true
23/10/18 08:57:10 INFO namenode.NameNode: Caching file names occurring more than 10 times
23/10/18 08:57:10 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
23/10/18 08:57:10 INFO util.GSet: Computing capacity for map cachedBlocks
23/10/18 08:57:10 INFO util.GSet: VM type       = 64-bit
23/10/18 08:57:10 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
23/10/18 08:57:10 INFO util.GSet: capacity      = 2^18 = 262144 entries
23/10/18 08:57:10 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
23/10/18 08:57:10 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
23/10/18 08:57:10 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
23/10/18 08:57:10 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
23/10/18 08:57:10 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
23/10/18 08:57:10 INFO util.GSet: Computing capacity for map NameNodeRetryCache
23/10/18 08:57:10 INFO util.GSet: VM type       = 64-bit
23/10/18 08:57:10 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
23/10/18 08:57:10 INFO util.GSet: capacity      = 2^15 = 32768 entries
23/10/18 08:57:10 INFO namenode.FSImage: Allocated new BlockPoolId: BP-894844368-192.168.3.129-1697619430610
23/10/18 08:57:10 INFO common.Storage: Storage directory /opt/app/hadoop_path/hdfs/name has been successfully formatted.
23/10/18 08:57:10 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/app/hadoop_path/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
23/10/18 08:57:10 INFO namenode.FSImageFormatProtobuf: Image file /opt/app/hadoop_path/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds .
23/10/18 08:57:10 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
23/10/18 08:57:10 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
23/10/18 08:57:10 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master1/192.168.3.129
************************************************************/

回显的日志中未出现报错信息代表NameNode格式化成功。

此时在NameNode的元数据目录中会生成【current】文件夹,其中保存了HDFS的元数据文件,如下:

复制代码
[root@master1 name]# cd /opt/app/hadoop_path/hdfs/name
[root@master1 name]# ll
total 0
drwxr-xr-x 2 root root 112 Oct 18 08:57 current
[root@master1 name]# ll current/
total 16
-rw-r--r-- 1 root root 322 Oct 18 08:57 fsimage_0000000000000000000
-rw-r--r-- 1 root root  62 Oct 18 08:57 fsimage_0000000000000000000.md5
-rw-r--r-- 1 root root   2 Oct 18 08:57 seen_txid
-rw-r--r-- 1 root root 215 Oct 18 08:57 VERSION
  1. 启动HDFS集群,在master1上执行

    [root@master1 ~]# start-dfs.sh

  2. 启动YARN集群,在master1上执行

    [root@master1 ~]# start-yarn.sh

查看集群中的守护进程

全部启动完后分别在各个服务器上执行jps是可以看到下面这些进程的,输入【jps】命令,可以查看启动的守护进程,分别为:

复制代码
master1:NameNode、ResourceManager
从slave1:DataNode、NodeManager
从slave2:DataNode、NodeManager、SecondaryNameNode
相关推荐
AEIC学术交流中心6 分钟前
【快速EI检索 | ACM出版】2026年大数据与智能制造国际学术会议(BDIM 2026)
大数据·制造
wending-Y10 分钟前
记录一次排查Flink一直重启的问题
大数据·flink
十月南城14 分钟前
Hive与离线数仓方法论——分层建模、分区与桶的取舍与查询代价
数据仓库·hive·hadoop
UI设计兰亭妙微16 分钟前
医疗大数据平台电子病例界面设计
大数据·界面设计
初恋叫萱萱43 分钟前
模型瘦身实战:用 `cann-model-compression-toolkit` 实现高效 INT8 量化
大数据
袁煦丞 cpolar内网穿透实验室1 小时前
远程调试内网 Kafka 不再求运维!cpolar 内网穿透实验室第 791 个成功挑战
运维·分布式·kafka·远程工作·内网穿透·cpolar
人间打气筒(Ada)1 小时前
GlusterFS实现KVM高可用及热迁移
分布式·虚拟化·kvm·高可用·glusterfs·热迁移
xu_yule1 小时前
Redis存储(15)Redis的应用_分布式锁_Lua脚本/Redlock算法
数据库·redis·分布式
互联网科技看点1 小时前
孕期科学补铁,保障母婴健康-仁合益康蛋白琥珀酸铁口服溶液成为产妇优选方案
大数据
Dxy12393102162 小时前
深度解析 Elasticsearch:从倒排索引到 DSL 查询的实战突围
大数据·elasticsearch·搜索引擎