前言
Spark任务调度的底层核心是RDD依赖关系 ,依赖决定DAG生成、Stage切分、Shuffle是否触发,是大数据面试高频考点,也是理解Spark执行流程的根基。
本文结合WordCount完整案例,区分窄依赖/宽依赖底层逻辑,配套可直接渲染的Mermaid流程图,拆解Stage划分规则、Task生成逻辑,附带课后思考题巩固知识点,零基础也能看懂分布式执行底层流程。
一、RDD依赖与DAG基础概念
1. 什么是RDD依赖
RDD的Transformation转换算子会基于父RDD生成全新子RDD,子RDD与父RDD之间的数据传递关系,就称为RDD依赖关系。Spark调度器SparkScheduler依靠依赖关系构建DAG有向无环图,再切分Stage、生成Task下发至Executor执行。
2. DAG定义
DAG全称有向无环图:
- 有向:数据流转存在固定先后顺序,父RDD先计算,子RDD后计算;
- 无环:数据流不存在循环闭环,所有计算单向推进,不会出现循环依赖死锁。
以经典WordCount程序为例完整DAG链路:
HDFS原始文件 → textFile(RDD0) → flatMap(RDD1) → map(RDD2) → reduceByKey(RDD3) → saveAsTextFile(HDFS)
其中reduceByKey会触发宽依赖Shuffle,整条DAG会被切分为2个Stage。
WordCount DAG完整Mermaid流程图
#mermaid-svg-pDapPTvULIr3BOA1{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-pDapPTvULIr3BOA1 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-pDapPTvULIr3BOA1 .error-icon{fill:#552222;}#mermaid-svg-pDapPTvULIr3BOA1 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-pDapPTvULIr3BOA1 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-pDapPTvULIr3BOA1 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-pDapPTvULIr3BOA1 .marker.cross{stroke:#333333;}#mermaid-svg-pDapPTvULIr3BOA1 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-pDapPTvULIr3BOA1 p{margin:0;}#mermaid-svg-pDapPTvULIr3BOA1 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-pDapPTvULIr3BOA1 .cluster-label text{fill:#333;}#mermaid-svg-pDapPTvULIr3BOA1 .cluster-label span{color:#333;}#mermaid-svg-pDapPTvULIr3BOA1 .cluster-label span p{background-color:transparent;}#mermaid-svg-pDapPTvULIr3BOA1 .label text,#mermaid-svg-pDapPTvULIr3BOA1 span{fill:#333;color:#333;}#mermaid-svg-pDapPTvULIr3BOA1 .node rect,#mermaid-svg-pDapPTvULIr3BOA1 .node circle,#mermaid-svg-pDapPTvULIr3BOA1 .node ellipse,#mermaid-svg-pDapPTvULIr3BOA1 .node polygon,#mermaid-svg-pDapPTvULIr3BOA1 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-pDapPTvULIr3BOA1 .rough-node .label text,#mermaid-svg-pDapPTvULIr3BOA1 .node .label text,#mermaid-svg-pDapPTvULIr3BOA1 .image-shape .label,#mermaid-svg-pDapPTvULIr3BOA1 .icon-shape .label{text-anchor:middle;}#mermaid-svg-pDapPTvULIr3BOA1 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-pDapPTvULIr3BOA1 .rough-node .label,#mermaid-svg-pDapPTvULIr3BOA1 .node .label,#mermaid-svg-pDapPTvULIr3BOA1 .image-shape .label,#mermaid-svg-pDapPTvULIr3BOA1 .icon-shape .label{text-align:center;}#mermaid-svg-pDapPTvULIr3BOA1 .node.clickable{cursor:pointer;}#mermaid-svg-pDapPTvULIr3BOA1 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-pDapPTvULIr3BOA1 .arrowheadPath{fill:#333333;}#mermaid-svg-pDapPTvULIr3BOA1 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-pDapPTvULIr3BOA1 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-pDapPTvULIr3BOA1 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-pDapPTvULIr3BOA1 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-pDapPTvULIr3BOA1 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-pDapPTvULIr3BOA1 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-pDapPTvULIr3BOA1 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-pDapPTvULIr3BOA1 .cluster text{fill:#333;}#mermaid-svg-pDapPTvULIr3BOA1 .cluster span{color:#333;}#mermaid-svg-pDapPTvULIr3BOA1 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-pDapPTvULIr3BOA1 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-pDapPTvULIr3BOA1 rect.text{fill:none;stroke-width:0;}#mermaid-svg-pDapPTvULIr3BOA1 .icon-shape,#mermaid-svg-pDapPTvULIr3BOA1 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-pDapPTvULIr3BOA1 .icon-shape p,#mermaid-svg-pDapPTvULIr3BOA1 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-pDapPTvULIr3BOA1 .icon-shape .label rect,#mermaid-svg-pDapPTvULIr3BOA1 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-pDapPTvULIr3BOA1 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-pDapPTvULIr3BOA1 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-pDapPTvULIr3BOA1 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} ResultStage
ShuffleMapStage
flatMap
map
Shuffle宽依赖
HDFS数据源
RDD0 textFile
RDD1 拆分单词
RDD2 (word,1)
RDD3 reduceByKey 计数
写入HDFS结果
二、两类核心RDD依赖:窄依赖 & 宽依赖
RDD依赖分为两种,核心区分标准:父分区与子分区的数据映射关系。
2.1 窄依赖 NarrowDependency
定义
父RDD的单个分区数据,只会传递给子RDD唯一一个分区 ,子分区仅依赖单个父分区,数据不需要跨节点网络传输,不会触发Shuffle。
对应算子
map、filter、flatMap、union等单分区一对一转换算子。
Mermaid窄依赖示意图
#mermaid-svg-h5QD249WFZ06vCnl{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-h5QD249WFZ06vCnl .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-h5QD249WFZ06vCnl .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-h5QD249WFZ06vCnl .error-icon{fill:#552222;}#mermaid-svg-h5QD249WFZ06vCnl .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-h5QD249WFZ06vCnl .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-h5QD249WFZ06vCnl .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-h5QD249WFZ06vCnl .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-h5QD249WFZ06vCnl .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-h5QD249WFZ06vCnl .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-h5QD249WFZ06vCnl .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-h5QD249WFZ06vCnl .marker{fill:#333333;stroke:#333333;}#mermaid-svg-h5QD249WFZ06vCnl .marker.cross{stroke:#333333;}#mermaid-svg-h5QD249WFZ06vCnl svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-h5QD249WFZ06vCnl p{margin:0;}#mermaid-svg-h5QD249WFZ06vCnl .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-h5QD249WFZ06vCnl .cluster-label text{fill:#333;}#mermaid-svg-h5QD249WFZ06vCnl .cluster-label span{color:#333;}#mermaid-svg-h5QD249WFZ06vCnl .cluster-label span p{background-color:transparent;}#mermaid-svg-h5QD249WFZ06vCnl .label text,#mermaid-svg-h5QD249WFZ06vCnl span{fill:#333;color:#333;}#mermaid-svg-h5QD249WFZ06vCnl .node rect,#mermaid-svg-h5QD249WFZ06vCnl .node circle,#mermaid-svg-h5QD249WFZ06vCnl .node ellipse,#mermaid-svg-h5QD249WFZ06vCnl .node polygon,#mermaid-svg-h5QD249WFZ06vCnl .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-h5QD249WFZ06vCnl .rough-node .label text,#mermaid-svg-h5QD249WFZ06vCnl .node .label text,#mermaid-svg-h5QD249WFZ06vCnl .image-shape .label,#mermaid-svg-h5QD249WFZ06vCnl .icon-shape .label{text-anchor:middle;}#mermaid-svg-h5QD249WFZ06vCnl .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-h5QD249WFZ06vCnl .rough-node .label,#mermaid-svg-h5QD249WFZ06vCnl .node .label,#mermaid-svg-h5QD249WFZ06vCnl .image-shape .label,#mermaid-svg-h5QD249WFZ06vCnl .icon-shape .label{text-align:center;}#mermaid-svg-h5QD249WFZ06vCnl .node.clickable{cursor:pointer;}#mermaid-svg-h5QD249WFZ06vCnl .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-h5QD249WFZ06vCnl .arrowheadPath{fill:#333333;}#mermaid-svg-h5QD249WFZ06vCnl .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-h5QD249WFZ06vCnl .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-h5QD249WFZ06vCnl .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-h5QD249WFZ06vCnl .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-h5QD249WFZ06vCnl .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-h5QD249WFZ06vCnl .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-h5QD249WFZ06vCnl .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-h5QD249WFZ06vCnl .cluster text{fill:#333;}#mermaid-svg-h5QD249WFZ06vCnl .cluster span{color:#333;}#mermaid-svg-h5QD249WFZ06vCnl div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-h5QD249WFZ06vCnl .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-h5QD249WFZ06vCnl rect.text{fill:none;stroke-width:0;}#mermaid-svg-h5QD249WFZ06vCnl .icon-shape,#mermaid-svg-h5QD249WFZ06vCnl .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-h5QD249WFZ06vCnl .icon-shape p,#mermaid-svg-h5QD249WFZ06vCnl .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-h5QD249WFZ06vCnl .icon-shape .label rect,#mermaid-svg-h5QD249WFZ06vCnl .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-h5QD249WFZ06vCnl .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-h5QD249WFZ06vCnl .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-h5QD249WFZ06vCnl :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 子RDD 窄依赖
父RDD
父分区1
父分区2
父分区3
子分区1
子分区2
子分区3
2.2 宽依赖 WideDependency(Shuffle依赖)
定义
父RDD的单个分区数据,会分发至子RDD全部多个分区,同一份数据需要跨节点传输、重新分组,必然触发Shuffle网络IO,是Spark性能损耗的核心来源。
对应算子
reduceByKey、groupByKey、sortByKey、未分区对齐join等重分区算子。
Mermaid宽依赖示意图
#mermaid-svg-PtEymAuDgQYerYuU{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-PtEymAuDgQYerYuU .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-PtEymAuDgQYerYuU .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-PtEymAuDgQYerYuU .error-icon{fill:#552222;}#mermaid-svg-PtEymAuDgQYerYuU .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-PtEymAuDgQYerYuU .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-PtEymAuDgQYerYuU .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-PtEymAuDgQYerYuU .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-PtEymAuDgQYerYuU .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-PtEymAuDgQYerYuU .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-PtEymAuDgQYerYuU .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-PtEymAuDgQYerYuU .marker{fill:#333333;stroke:#333333;}#mermaid-svg-PtEymAuDgQYerYuU .marker.cross{stroke:#333333;}#mermaid-svg-PtEymAuDgQYerYuU svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-PtEymAuDgQYerYuU p{margin:0;}#mermaid-svg-PtEymAuDgQYerYuU .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-PtEymAuDgQYerYuU .cluster-label text{fill:#333;}#mermaid-svg-PtEymAuDgQYerYuU .cluster-label span{color:#333;}#mermaid-svg-PtEymAuDgQYerYuU .cluster-label span p{background-color:transparent;}#mermaid-svg-PtEymAuDgQYerYuU .label text,#mermaid-svg-PtEymAuDgQYerYuU span{fill:#333;color:#333;}#mermaid-svg-PtEymAuDgQYerYuU .node rect,#mermaid-svg-PtEymAuDgQYerYuU .node circle,#mermaid-svg-PtEymAuDgQYerYuU .node ellipse,#mermaid-svg-PtEymAuDgQYerYuU .node polygon,#mermaid-svg-PtEymAuDgQYerYuU .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-PtEymAuDgQYerYuU .rough-node .label text,#mermaid-svg-PtEymAuDgQYerYuU .node .label text,#mermaid-svg-PtEymAuDgQYerYuU .image-shape .label,#mermaid-svg-PtEymAuDgQYerYuU .icon-shape .label{text-anchor:middle;}#mermaid-svg-PtEymAuDgQYerYuU .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-PtEymAuDgQYerYuU .rough-node .label,#mermaid-svg-PtEymAuDgQYerYuU .node .label,#mermaid-svg-PtEymAuDgQYerYuU .image-shape .label,#mermaid-svg-PtEymAuDgQYerYuU .icon-shape .label{text-align:center;}#mermaid-svg-PtEymAuDgQYerYuU .node.clickable{cursor:pointer;}#mermaid-svg-PtEymAuDgQYerYuU .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-PtEymAuDgQYerYuU .arrowheadPath{fill:#333333;}#mermaid-svg-PtEymAuDgQYerYuU .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-PtEymAuDgQYerYuU .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-PtEymAuDgQYerYuU .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-PtEymAuDgQYerYuU .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-PtEymAuDgQYerYuU .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-PtEymAuDgQYerYuU .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-PtEymAuDgQYerYuU .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-PtEymAuDgQYerYuU .cluster text{fill:#333;}#mermaid-svg-PtEymAuDgQYerYuU .cluster span{color:#333;}#mermaid-svg-PtEymAuDgQYerYuU div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-PtEymAuDgQYerYuU .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-PtEymAuDgQYerYuU rect.text{fill:none;stroke-width:0;}#mermaid-svg-PtEymAuDgQYerYuU .icon-shape,#mermaid-svg-PtEymAuDgQYerYuU .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-PtEymAuDgQYerYuU .icon-shape p,#mermaid-svg-PtEymAuDgQYerYuU .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-PtEymAuDgQYerYuU .icon-shape .label rect,#mermaid-svg-PtEymAuDgQYerYuU .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-PtEymAuDgQYerYuU .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-PtEymAuDgQYerYuU .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-PtEymAuDgQYerYuU :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 子RDD 宽依赖Shuffle
父RDD
父分区1
父分区2
子分区1
子分区2
窄依赖 vs 宽依赖核心对比表
| 对比维度 | 窄依赖 NarrowDependency | 宽依赖 WideDependency |
|---|---|---|
| 分区映射 | 1个父分区 → 仅1个子分区 | 1个父分区 → 多个子分区 |
| Shuffle行为 | 无网络Shuffle,同节点本地计算 | 强制触发跨节点Shuffle传输 |
| 性能开销 | 极低,仅内存本地转换 | 高,磁盘+网络双重开销 |
| 典型算子 | map、filter、flatMap、union | reduceByKey、groupByKey、sortByKey |
| Stage划分 | 不会切断Stage | 遇到即分割新Stage |
三、基于依赖划分Stage规则
1. Stage切分核心规则
Spark遍历DAG采用从后向前逆推 逻辑:只要识别到宽依赖Shuffle ,就从该依赖处切断,分割出独立Stage;连续窄依赖全部合并至同一个Stage。
原因:窄依赖数据本地流转,可串行执行;宽依赖必须等待上游所有Task完成、Shuffle数据落地后,下游Stage才能启动。
WordCount Stage分层完整Mermaid图
#mermaid-svg-LIlBpP31N6PX8wbm{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-LIlBpP31N6PX8wbm .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-LIlBpP31N6PX8wbm .error-icon{fill:#552222;}#mermaid-svg-LIlBpP31N6PX8wbm .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-LIlBpP31N6PX8wbm .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-LIlBpP31N6PX8wbm .marker{fill:#333333;stroke:#333333;}#mermaid-svg-LIlBpP31N6PX8wbm .marker.cross{stroke:#333333;}#mermaid-svg-LIlBpP31N6PX8wbm svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-LIlBpP31N6PX8wbm p{margin:0;}#mermaid-svg-LIlBpP31N6PX8wbm .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-LIlBpP31N6PX8wbm .cluster-label text{fill:#333;}#mermaid-svg-LIlBpP31N6PX8wbm .cluster-label span{color:#333;}#mermaid-svg-LIlBpP31N6PX8wbm .cluster-label span p{background-color:transparent;}#mermaid-svg-LIlBpP31N6PX8wbm .label text,#mermaid-svg-LIlBpP31N6PX8wbm span{fill:#333;color:#333;}#mermaid-svg-LIlBpP31N6PX8wbm .node rect,#mermaid-svg-LIlBpP31N6PX8wbm .node circle,#mermaid-svg-LIlBpP31N6PX8wbm .node ellipse,#mermaid-svg-LIlBpP31N6PX8wbm .node polygon,#mermaid-svg-LIlBpP31N6PX8wbm .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-LIlBpP31N6PX8wbm .rough-node .label text,#mermaid-svg-LIlBpP31N6PX8wbm .node .label text,#mermaid-svg-LIlBpP31N6PX8wbm .image-shape .label,#mermaid-svg-LIlBpP31N6PX8wbm .icon-shape .label{text-anchor:middle;}#mermaid-svg-LIlBpP31N6PX8wbm .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-LIlBpP31N6PX8wbm .rough-node .label,#mermaid-svg-LIlBpP31N6PX8wbm .node .label,#mermaid-svg-LIlBpP31N6PX8wbm .image-shape .label,#mermaid-svg-LIlBpP31N6PX8wbm .icon-shape .label{text-align:center;}#mermaid-svg-LIlBpP31N6PX8wbm .node.clickable{cursor:pointer;}#mermaid-svg-LIlBpP31N6PX8wbm .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-LIlBpP31N6PX8wbm .arrowheadPath{fill:#333333;}#mermaid-svg-LIlBpP31N6PX8wbm .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-LIlBpP31N6PX8wbm .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-LIlBpP31N6PX8wbm .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-LIlBpP31N6PX8wbm .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-LIlBpP31N6PX8wbm .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-LIlBpP31N6PX8wbm .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-LIlBpP31N6PX8wbm .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-LIlBpP31N6PX8wbm .cluster text{fill:#333;}#mermaid-svg-LIlBpP31N6PX8wbm .cluster span{color:#333;}#mermaid-svg-LIlBpP31N6PX8wbm div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-LIlBpP31N6PX8wbm .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-LIlBpP31N6PX8wbm rect.text{fill:none;stroke-width:0;}#mermaid-svg-LIlBpP31N6PX8wbm .icon-shape,#mermaid-svg-LIlBpP31N6PX8wbm .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-LIlBpP31N6PX8wbm .icon-shape p,#mermaid-svg-LIlBpP31N6PX8wbm .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-LIlBpP31N6PX8wbm .icon-shape .label rect,#mermaid-svg-LIlBpP31N6PX8wbm .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-LIlBpP31N6PX8wbm .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-LIlBpP31N6PX8wbm .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-LIlBpP31N6PX8wbm :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Stage0 ResultStage
Stage1 ShuffleMapStage
flatMap
flatMap
flatMap
map
map
map
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
Shuffle宽依赖
分区数据b1
RDD0分区1
分区数据b2
RDD0分区2
分区数据b3
RDD0分区3
RDD1分区1
RDD1分区2
RDD1分区3
RDD2分区1
RDD2分区2
RDD2分区3
RDD3分区1 reduce聚合
RDD3分区2 reduce聚合
RDD3分区3 reduce聚合
2. Stage与Task的生成关系
- 一个Stage内包含连续一串窄依赖RDD;
- Stage的Task数量 = 当前Stage最后一层RDD的分区数量;
- 每个Task串行执行Stage内全部窄依赖算子,一套分区数据完整走完map→flatMap等转换。
WordCount Task生成流程图:
#mermaid-svg-zq8Arfw3D14Hl2AN{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-zq8Arfw3D14Hl2AN .error-icon{fill:#552222;}#mermaid-svg-zq8Arfw3D14Hl2AN .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-zq8Arfw3D14Hl2AN .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-zq8Arfw3D14Hl2AN .marker{fill:#333333;stroke:#333333;}#mermaid-svg-zq8Arfw3D14Hl2AN .marker.cross{stroke:#333333;}#mermaid-svg-zq8Arfw3D14Hl2AN svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-zq8Arfw3D14Hl2AN p{margin:0;}#mermaid-svg-zq8Arfw3D14Hl2AN .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN .cluster-label text{fill:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN .cluster-label span{color:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN .cluster-label span p{background-color:transparent;}#mermaid-svg-zq8Arfw3D14Hl2AN .label text,#mermaid-svg-zq8Arfw3D14Hl2AN span{fill:#333;color:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN .node rect,#mermaid-svg-zq8Arfw3D14Hl2AN .node circle,#mermaid-svg-zq8Arfw3D14Hl2AN .node ellipse,#mermaid-svg-zq8Arfw3D14Hl2AN .node polygon,#mermaid-svg-zq8Arfw3D14Hl2AN .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-zq8Arfw3D14Hl2AN .rough-node .label text,#mermaid-svg-zq8Arfw3D14Hl2AN .node .label text,#mermaid-svg-zq8Arfw3D14Hl2AN .image-shape .label,#mermaid-svg-zq8Arfw3D14Hl2AN .icon-shape .label{text-anchor:middle;}#mermaid-svg-zq8Arfw3D14Hl2AN .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-zq8Arfw3D14Hl2AN .rough-node .label,#mermaid-svg-zq8Arfw3D14Hl2AN .node .label,#mermaid-svg-zq8Arfw3D14Hl2AN .image-shape .label,#mermaid-svg-zq8Arfw3D14Hl2AN .icon-shape .label{text-align:center;}#mermaid-svg-zq8Arfw3D14Hl2AN .node.clickable{cursor:pointer;}#mermaid-svg-zq8Arfw3D14Hl2AN .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-zq8Arfw3D14Hl2AN .arrowheadPath{fill:#333333;}#mermaid-svg-zq8Arfw3D14Hl2AN .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-zq8Arfw3D14Hl2AN .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-zq8Arfw3D14Hl2AN .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-zq8Arfw3D14Hl2AN .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-zq8Arfw3D14Hl2AN .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-zq8Arfw3D14Hl2AN .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-zq8Arfw3D14Hl2AN .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-zq8Arfw3D14Hl2AN .cluster text{fill:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN .cluster span{color:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-zq8Arfw3D14Hl2AN .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-zq8Arfw3D14Hl2AN rect.text{fill:none;stroke-width:0;}#mermaid-svg-zq8Arfw3D14Hl2AN .icon-shape,#mermaid-svg-zq8Arfw3D14Hl2AN .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-zq8Arfw3D14Hl2AN .icon-shape p,#mermaid-svg-zq8Arfw3D14Hl2AN .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-zq8Arfw3D14Hl2AN .icon-shape .label rect,#mermaid-svg-zq8Arfw3D14Hl2AN .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-zq8Arfw3D14Hl2AN .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-zq8Arfw3D14Hl2AN .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-zq8Arfw3D14Hl2AN :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Shuffle输出
Shuffle输出
Shuffle输出
Shuffle输出
Shuffle输出
Shuffle输出
Shuffle输出
Shuffle输出
Shuffle输出
Task1 处理分区1全链路
Task2 处理分区2全链路
Task3 处理分区3全链路
Task4 子分区1聚合
Task5 子分区2聚合
Task6 子分区3聚合
Driver collect收集结果
四、完整执行流程总结
- 编写RDD转换代码,Spark根据算子依赖构建DAG有向无环图;
- 逆序遍历DAG,以宽依赖Shuffle为分割点,切分多个独立Stage;
- 每个Stage根据RDD分区数生成对应数量Task;
- 调度器先提交上游ShuffleMapStage的Task,全部执行完成后Shuffle落盘;
- 再启动下游ResultStage的聚合Task,读取Shuffle文件完成最终计算;
- Action算子触发结果汇总至Driver或写入外部存储。
课后思考题(面试原题)
- 在Spark中,RDD依赖分为哪两类?它们各自有什么特点?二者的区别和联系是什么?
参考答案
分为窄依赖 与宽依赖两类。
- 窄依赖:父分区一对一映射子分区,无Shuffle,连续窄依赖合并同一Stage,执行效率高;
- 宽依赖:父分区一对多分发至子分区,强制触发Shuffle网络传输,是Stage分割边界;
- 联系:二者共同组成RDD完整依赖链路,配合构建DAG、支撑Spark任务调度;核心区别为分区映射关系与是否产生Shuffle。