hadoop3.x 新特性

hadoop3.x 新特性

Features Hadoop 2.x Hadoop 3.x
Minimum Required Java Version JDK 6 and above. JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8.
Fault Tolerance Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用)
Storage Scheme Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space.
Change in Port Numbers Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869
YARN Timeline Service YARN timeline service introduced in Hadoop 2.0 has some scalability issues. YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability.
Intra DataNode Balancing HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced.
Number of NameNodes Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) Hadoop 3.0 supports 2 or more NameNodes.(一主多备)
Heap Size In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically.
hdfs HA 逻辑
  1. 增加用于主备之间信息共享推送的 JournalNode
  2. 增加用于选主决策的 zookeeper 集群:ha.zookeeper.quorum 配置
  3. 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:QuorumPeerMain
相关推荐
llilian_161 小时前
IRIG-B码产生器立足用户痛点,提供精准授时解决方案
大数据·数据库·功能测试·单片机·嵌入式硬件·测试工具
Elastic 中国社区官方博客8 小时前
快速 vs. 准确:衡量量化向量搜索的召回率
大数据·人工智能·elasticsearch·搜索引擎·ai·全文检索
qq_381338508 小时前
【技术日报】2026-03-18 AI 领域重磅速递
大数据·人工智能
电商API&Tina12 小时前
【电商API接口】开发者一站式电商API接入说明
大数据·数据库·人工智能·云计算·json
武子康14 小时前
大数据-253 离线数仓 - Airflow 入门与任务调度实战:DAG、Operator、Executor 部署排错指南
大数据·后端·apache hive
guoji778816 小时前
2026年Gemini 3 Pro vs 豆包2.0深度评测:海外顶流与国产黑马谁更强?
大数据·人工智能·架构
TDengine (老段)16 小时前
TDengine IDMP 组态面板 —— 工具箱
大数据·数据库·时序数据库·tdengine·涛思数据
网络工程小王16 小时前
【大数据技术详解】——Kibana(学习笔记)
大数据·笔记·学习
zxsz_com_cn18 小时前
设备预测性维护方案设计的关键要素
大数据·人工智能
唐天下闻化18 小时前
连锁数字化改造8成翻车?三维避坑实录
大数据