HDFS异构存储详解

HDFS异构存储类型

冷，热，温，冻数据
- 通常，公司或者组织总是有相当多的历史数据占用昂贵的粗处空间。典型的数据使用模式是新传入的数据被应用程序大量使用，从而该数据被标记为"热"数据。随着时间的推移，存储的数据每周被访问几次，而不是一天几次，这是认为其是"暖"数据。在接下来的几周和几个月中，数据使用率下降的更多，成为"冷"数据，。如果很少使用数据，例如每年查询一次或两次，这是甚至可以根据其年龄创建第四个数据风雷，并将这组很少被铲讯的旧数据被称为"冻结数据"
- Hadoop允许将不是热数据或者活跃数据的数据分配到比较便宜的存储上，用于归档或冷存储。可以设置存储策略，将较旧的数据从昂贵的高性能存储上转移到性价比较低（较便宜）的存储设备上。
- Hadoop2.5及以上版本都支持存储策略，在该策略下，不仅可以在默认的传统磁盘上存储hdfs数据，还可以在SSD（固态硬盘）上存储数据。

异构存储是Hadoop2.6.0版本出现的新特性，可以根据各个存储介质读写特性不同进行选择。

例如冷热数据的存储，对冷数据采取容量大，读写性能不高的存储介质如机械硬盘，对于热数据，可使用SSD硬盘存储。

RAM_DISK（内存）

SSD（固态硬盘）

DISK（机械硬盘）默认使用

ARCHIVE（高密度存储介质，存储档案历史数据）

取消存储策略
hdfs storagepolicies -unsetStoragePolicy -Path
在执行unset命令之后，将应用当前目录最近的祖先存储策略，如果没有任何祖先的策略，则将应用默认的存储策略
获取存储策略
hdfs storagepolicies -getStoragePolicy -path

为了更加充分的利用存储资源，我们可以将数据分为冷，热，温三个阶段来存储。具体规划如下：

step3:创建测试目录

hdfs dfs -mkdir -p /data/hdfs-test/data_phase/hot
hdfs dfs -mkdir -p /data/hdfs_test/data_phase/warm
hdfs dfs -mkdir -p /data/hdfs_test/data_phase/cold

step4:分别设置三个目录的存储策略

hdfs storagepolicies -setStoragePolicy - path /data/hdfs-test/data_phase/hot -policy HOT
hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/warn -policy WARN
hdfs storagepolicies -setStoragePolicy -path /data/hdfs -test/data_phase/cold -policy COLD
step5:查看三个目录的存储策略

hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data-phase/hot
hdfs sotragepolicies -getStoragePolicy -path /data/hdfs-test/data-phase/warm
hdfs soragepolicies -getStoragePolidy -path /data/hdfs-test/data-phase/cold
step6:上传文件测试异构存储

hdfs dfs -put /etc/profile/data/hdfs-test/data_phase/hot
hdfs dfs -put /etc/profile/data/hdfs-test/data_phase/warm
hdfs dfs -put /etc/profile/data/hdfs-test/data_phase/ cold
step7:查看不同存储策略文件的block位置
hdfs fsck /data/hdfs-test/data_phase/hot/profile -files -blocks -locations

将tmpfs挂载到目录/mnt/dn-tmpfs/，并且限制内存使用大小为1GB

step2:内存存储介质设置

将机器中已经完成好的虚拟内存盘配置到dfs.datanode.data.dir中，其次还要带上RAM_DISK标签
step3:参数设置优化

dfs.storage.policy.enabled

是否开启异构存储，默认true开启

dfs.datanode.max. locked.memory

用于在数据节点上的内存中缓存副本的内存量（以字节为单位）。默认情况下，此参数设置为0，这将禁用内存中缓存。内存值过小会导致内存中的总的可存储的数据块变小，但如果超过DataNode能承受的最大内存大小的话，部分内存块会被直接移出。
step4:在目录上设置存储策略

hdfs storagepolicies -setStoragePolicy -path -policy LAZY_PERSIST