hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
ClouGence4 分钟前
不用搭复杂系统,也能做跨地域数据迁移?
大数据·数据库·saas
xixixi777778 分钟前
Token 经济引爆 AI 产业加速:从百模大战到百虾大战,谁在定义 2026 的中国 AI?
大数据·人工智能·机器学习·ai·大模型·算力·通信
Gent_倪23 分钟前
数据建模概念解析
大数据·数据建模
永霖光电_UVLED27 分钟前
Polar Light 获得了欧盟Eurostars计划的110万欧元(€1.1m)资助
大数据·人工智能·物联网·汽车·娱乐
琪伦的工具库37 分钟前
批量excel文件删除列工具使用说明:按列号或列名批量删除/保留,支持预览与大文件优化
大数据·excel
武子康1 小时前
大数据-266 实时数仓-Canal + Kafka 实现 MySQL 数据库变更实时捕获
大数据·后端·kafka
TDengine (老段)2 小时前
中原油田引入时序数据库 TDengine:写入性能提升、存储成本下降 85%
大数据·数据库·人工智能·时序数据库·tdengine·涛思数据
财经资讯数据_灵砚智能2 小时前
基于全球经济类多源新闻的NLP情感分析与数据可视化(日间)2026年4月12日
大数据·人工智能·信息可视化·自然语言处理·ai编程
Crazy CodeCrafter2 小时前
现在做服装,实体和电商怎么选?
大数据·数据库·人工智能·微信·开源软件·零售
2601_954434552 小时前
2026年专业深度测评:入门电钢琴品牌排名前五权威发布
大数据·人工智能·python