hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
samLi062013 分钟前
中国A股上市公司真实盈余管理REM计算数据(2000-2023年)
大数据
jlting19515 分钟前
《基于 PySpark 的电影推荐系统分析及问题解决》
大数据·javascript·spark
H愚公移山H1 小时前
Elasticsearch-Elasticsearch-Rest-Client(三)
大数据·elasticsearch·搜索引擎
王亭_6661 小时前
PyTorch使用教程-深度学习框架
大数据·人工智能·pytorch·深度学习·机器学习
州周1 小时前
Flink1.19编译并Standalone模式本地运行
大数据·flink
宝哥大数据1 小时前
Flink是如何实现 End-To-End Exactly-once的?
大数据·flink
天冬忘忧2 小时前
Kafka:分布式消息系统的核心原理与安装部署
大数据·分布式·kafka
Sui_Network3 小时前
World Wide Walrus:下一代数据存储协议
大数据·人工智能·web3·去中心化·区块链
qzWsong3 小时前
hadoop+wsl 10.255.255.254,BlockMissingException: Could not obtain block: 踩坑
linux·hadoop
那一抹阳光多灿烂3 小时前
Hadoop 3.x 新特性详解
大数据·hadoop·分布式