hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
大树887 小时前
金刚石散热越强,管路越先见顶
大数据·运维·服务器·人工智能·ai
大志哥1237 小时前
ES和Logstash日志链路系统上线后遭遇切片爆炸(解决)
大数据·elasticsearch
果丁智能9 小时前
物联网智能锁赋能集中式住宿:身份核验与远程权限管控的全链路技术实践
大数据·人工智能·物联网·智能家居
王小王-1239 小时前
基于 Hive 的网易云音乐数据分析及可视化系统
hive·hadoop·数据分析·音乐数据分析·网易云音乐分析·hive音乐分析·hadoop网易云
ApacheSeaTunnel9 小时前
实战演示 | 基于 Apache SeaTunnel 与 Apache DolphinScheduler 实现 MySQL 到 Doris 离线定时增量同步
大数据·mysql·开源·doris·数据集成·seatunnel·数据同步
weixin_397574099 小时前
PDF复杂表格的1:1还原引擎:跨页表格自动拼接技术实战
大数据·人工智能·pdf
极光代码工作室10 小时前
基于数据仓库的电商数据分析平台
大数据·hadoop·python·spark·数据可视化
秋名山码民10 小时前
Graph RAG 深度解析:从向量检索到知识推理的技术演进
大数据·人工智能·rag
m0_3801671411 小时前
面向开发者的Top10加密货币数据API(2026年最新)
大数据·人工智能·区块链
yyxx41212311 小时前
上海企业如何选择专业的钉钉服务商
java·大数据·人工智能·钉钉