hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
B站计算机毕业设计超人16 小时前
计算机毕业设计Python+百度千问大模型微博舆情分析预测 微博情感分析可视化 大数据毕业设计(源码+LW文档+PPT+讲解)
大数据·hive·hadoop·python·毕业设计·知识图谱·课程设计
irizhao17 小时前
《高质量数据集 分类指南》解读(TC609-5-2025-03)由全国数据标准化技术委员会发布
大数据·人工智能
min18112345617 小时前
HR人力资源招聘配置流程图制作教程
大数据·网络·人工智能·架构·流程图·求职招聘
Elastic 中国社区官方博客17 小时前
使用 Elasticsearch 管理 agentic 记忆
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
ask_baidu20 小时前
监控Source端Pg对Flink CDC的影响
java·大数据·postgresql·flink
早日退休!!!20 小时前
Roofline模型核心原理:延迟、吞吐与并发的底层逻辑
大数据·网络·数据库
说私域20 小时前
基于定制开发AI智能名片商城小程序的运营创新与资金效率提升研究
大数据·人工智能·小程序
edisao21 小时前
二。星链真正危险的地方,不在天上,而在网络底层
大数据·网络·人工智能·python·科技·机器学习
Python_Study202521 小时前
TOB机械制造企业获客困境与技术解决方案:从传统模式到数字化营销的架构升级
大数据·人工智能·架构