hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
Macbethad几秒前
WPF 工业设备管理程序技术方案
java·大数据·hadoop
半夏知半秋2 分钟前
MongoDB 与 Elasticsearch 数据同步方案整理
大数据·数据库·mongodb·elasticsearch·搜索引擎
Cx330❀25 分钟前
Git 基础操作通关指南:版本回退、撤销修改与文件删除深度解析
大数据·运维·服务器·git·算法·搜索引擎·面试
武子康26 分钟前
大数据-178 Elasticsearch 7.3 Java 实战:索引与文档 CRUD 全流程示例
大数据·后端·elasticsearch
希艾席帝恩36 分钟前
从制造到“智造”:数字孪生驱动的工业革命
大数据·人工智能·数字孪生·数据可视化·数字化转型
EkihzniY1 小时前
OCR定制识别:解锁文字识别的无限可能
大数据·人工智能·ocr
jkyy20141 小时前
慢病全周期管理+数智化:重构药品零售的健康价值
大数据·人工智能·物联网·健康医疗
Sahadev_1 小时前
GitHub 一周热门项目速览 | 2025年12月08日
java·大数据·人工智能·安全·ai作画
渣渣盟1 小时前
Flink实时数据写入Redis实战
大数据·scala·apache
路边草随风1 小时前
java实现发布flink k8s application模式作业
java·大数据·flink·kubernetes