hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
进击的小小学生几秒前
多因子模型连载
大数据·python·数据分析·区块链
qiquandongkh1 分钟前
期权懂|期权入门知识:如何选择期权合约?
大数据·区块链
互联网资讯17 分钟前
抖音生活服务商系统源码怎么搭建?
大数据·运维·人工智能·生活
Allen_LVyingbo38 分钟前
医院大数据平台建设:基于快速流程化工具集的考察
大数据·网络·人工智能·健康医疗
jiejianyun8571 小时前
零售小程序怎么自己搭建?开个小卖铺如何留住客户?
大数据
web135085886353 小时前
9. 大数据集群(PySpark)+Hive+MySQL+PyEcharts+Flask:信用贷款风险分析与预测
大数据·hive·mysql
神秘打工猴10 小时前
Flink 集群有哪些⻆⾊?各⾃有什么作⽤?
大数据·flink
小刘鸭!10 小时前
Flink的三种时间语义
大数据·flink
天冬忘忧10 小时前
Flink优化----FlinkSQL 调优
大数据·sql·flink
LinkTime_Cloud10 小时前
GitLab 将停止为中国区用户提供服务,60天迁移期如何应对? | LeetTalk Daily
大数据·运维·gitlab