一、DataNode中的数据情况
数据位置
bash
/opt/module/hadoop-3.1.3/data/dfs/data/current/BP-823420375-192.168.31.102-1714395693863/current/finalized/subdir0/subdir0
块信息
每个块信息,由两个文件保存,xxx.meta保存的是数据长度、校验和、时间戳,另外一个是保存真实数据内容的。
二、工作流程
DataNode
和NameNode
的工作流程比较简单,主要内容是:DN向NN注册
,DN定时向NN汇报状态
1、周期上报的相关默认配置
hdfs-default.xml
上报NN时间配置
xml
<property>
<name>dfs.blockreport.intervalMsec</name>
<value>21600000</value>
<description>Determines block reporting interval in milliseconds.</description>
</property>
DN自查时间配置
xml
<property>
<name>dfs.datanode.directoryscan.interval</name>
<value>21600s</value>
<description>Interval in seconds for Datanode to scan data directories and
reconcile the difference between blocks in memory and on the disk.
Support multiple time unit suffix(case insensitive), as described
in dfs.heartbeat.interval.
</description>
</property>
一般,这两个配置时间是一样的,工作流程是,DN先自查,然后,将结果汇报给NN
2、超时时长相关默认配置
计算公式
心跳时间配置
xml
<property>
<name>dfs.heartbeat.interval</name>
<value>3s</value>
<description>
Determines datanode heartbeat interval in seconds.
Can use the following suffix (case insensitive):
ms(millis), s(sec), m(min), h(hour), d(day)
to specify the time (such as 2s, 2m, 1h, etc.).
Or provide complete number in seconds (such as 30 for 30 seconds).
</description>
</property>
心跳检测时间参数
xml
<property>
<name>dfs.namenode.heartbeat.recheck-interval</name>
<value>300000</value>
<description>
This time decides the interval to check for expired datanodes.
With this value and dfs.heartbeat.interval, the interval of
deciding the datanode is stale or not is also calculated.
The unit of this configuration is millisecond.
</description>
</property>