背景
Apache Datafusion Comet 是苹果公司开源的加速Spark运行的向量化项目。
本项目采用了 Spark插件化 + Protobuf + Arrow + DataFusion 架构形式
其中
- Spark插件是 利用 SparkPlugin 插件,其中分为 DriverPlugin 和 ExecutorPlugin ,这两个插件在driver和 Executor启动的时候就会调用
- Protobuf 是用来序列化 spark对应的表达式以及计划,用来传递给 native 引擎去执行,利用了 体积小,速度快的特性
- Arrow 是用来 spark 和 native 引擎进行高效的数据交换(native执行的结果或者spark执行的数据结果),主要在JNI中利用Arrow IPC 列式存储以及零拷贝等特点进行进程间数据交换
- DataFusion 主要是利用Rust native以及Arrow内存格式实现的向量化执行引擎,Spark中主要offload对应的算子到该引擎中去执行
本文基于 datafusion comet 截止到2026年1月13号的main分支的最新代码(对应的commit为 eef5f28a0727d9aef043fa2b87d6747ff68b827a)
主要分析 CometSparkSessionExtensions 中的向量化规则转换
CometExecRule分析
上代码:
class CometSparkSessionExtensions
extends (SparkSessionExtensions => Unit)
with Logging
with ShimCometSparkSessionExtensions {
override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectColumnar { session => CometScanColumnar(session) }
extensions.injectColumnar { session => CometExecColumnar(session) }
extensions.injectQueryStagePrepRule { session => CometScanRule(session) }
extensions.injectQueryStagePrepRule { session => CometExecRule(session) }
}
case class CometScanColumnar(session: SparkSession) extends ColumnarRule {
override def preColumnarTransitions: Rule[SparkPlan] = CometScanRule(session)
}
case class CometExecColumnar(session: SparkSession) extends ColumnarRule {
override def preColumnarTransitions: Rule[SparkPlan] = CometExecRule(session)
override def postColumnarTransitions: Rule[SparkPlan] =
EliminateRedundantTransitions(session)
}
}
这里主要是 应用CometExecRule规则 将spark物理规则转换为 Comet native 规则。
这篇主要说说 Spark的物理计划转换为 Native计划:
val normalizedPlan = normalizePlan(plan)
val planWithJoinRewritten = if (CometConf.COMET_REPLACE_SMJ.get()) {
normalizedPlan.transformUp { case p =>
RewriteJoin.rewrite(p)
}
} else {
normalizedPlan
}
var newPlan = transform(planWithJoinRewritten)
// Remove placeholders
newPlan = newPlan.transform {
case CometSinkPlaceHolder(_, _, s) => s
case CometScanWrapper(_, s) => s
}
newPlan = newPlan.transform {
case op: CometExec =>
if (op.originalPlan.logicalLink.isEmpty) {
op.unsetTagValue(SparkPlan.LOGICAL_PLAN_TAG)
op.unsetTagValue(SparkPlan.LOGICAL_PLAN_INHERITED_TAG)
} else {
op.originalPlan.logicalLink.foreach(op.setLogicalLink)
}
op
case op: CometShuffleExchangeExec =>
if (op.originalPlan.logicalLink.isEmpty) {
op.unsetTagValue(SparkPlan.LOGICAL_PLAN_TAG)
op.unsetTagValue(SparkPlan.LOGICAL_PLAN_INHERITED_TAG)
} else {
op.originalPlan.logicalLink.foreach(op.setLogicalLink)
}
op
case op: CometBroadcastExchangeExec =>
if (op.originalPlan.logicalLink.isEmpty) {
op.unsetTagValue(SparkPlan.LOGICAL_PLAN_TAG)
op.unsetTagValue(SparkPlan.LOGICAL_PLAN_INHERITED_TAG)
} else {
op.originalPlan.logicalLink.foreach(op.setLogicalLink)
}
op
}
// Convert native execution block by linking consecutive native operators.
var firstNativeOp = true
newPlan.transformDown {
case op: CometNativeExec =>
val newPlan = if (firstNativeOp) {
firstNativeOp = false
op.convertBlock()
} else {
op
}
// If reaching leaf node, reset `firstNativeOp` to true
// because it will start a new block in next iteration.
if (op.children.isEmpty) {
firstNativeOp = true
}
newPlan
case op =>
firstNativeOp = true
op
}
-
第一步 normalizePlan
表达式的归一化处理,主要是处理
Double.NaN和Float.NaN和0.0f和0.0d和-0.0f和-0.0d一致性的问题 -
第二步RewriteJoin.rewrite
是否开启
SortMergeJoin转换ShuffleHashJoin,由配置项spark.comet.exec.replaceSortMergeJoin(默认false)控制 -
第三步 transform
这里进行
SparkPlan转CometNativeExec具体转换- scanRule 中 CometScanExec (满足scan.scanImpl == CometConf.SCAN_NATIVE_DATAFUSION,目前看起来应该是SCAN_NATIVE_COMET)变成 ProtoBuf格式的
CometNativeScanExec - CometBatchScanExec CometScanExec 变成 CometScanWrapper
spark.comet.sparkToColumnar.enabled (默认是false)才会把spark.comet.sparkToColumnar.supportedOperatorList中的转换为列格式转换 - 如果 shouldApplySparkToColumnar(也就是叶子结点是列向量化的)可以的话,
转化为CometScanWrapper(CometSparkToColumnarExec, cometProto) - 如果子结点存在broadcast,且broadcast额下游是comet native 则封装为
CometSinkPlaceHolder(nativeOp,braodcast, CometBroadcastExchangeExec())算子
如果存在不是broadcast的就转换为 comet native的 - 对于 ShuffleExchangeExec 分情况:
- 如果
hashPartition,且spark.comet.native.shuffle.partitioning.hash.enabled (默认true)开启,且hash的表达式支持comet native(能够转换protbuf),则支持; - 如果
singleParition,则支持; - 如果
RangePartitioning,且spark.comet.native.shuffle.partitioning.range.enabled(默认是false),且order的表达式支持comet native(能够转化为protobuf),如果上述返回 true,且他的子结点是comet native(CometNativeExec)的话,则转换为CometSinkPlaceHolder(native Protobuf,ShuffleExchangeExec,CometShuffleExchangeExec(CometNativeShuffle))算子
如果上述存在,则直接返回,否则 判断该 ShuffleExchangeExec 是否支持
ColumnarShuffle如果支持,则把该 ShuffleExchangeExec 改成
shuffleComet protbuf,CometSinkPlaceHolder(native Protobuf,ShuffleExchangeExec,CometShuffleExchangeExec(CometColumnarShuffle)) 算子- 对于该项目支持的算子 CometExecRule.nativeExecs,
-
如果 该算子的child结点都是 comet native的,且每个算子都是 operation.enabled(如project.enabled),则构造对应的Protobuf,以及对应的CometNativeExec(如
CometProjectExec(Protobuf,ProjectExec,))
目前支持的物理算子如下:val nativeExecs: Map[Class[_ <: SparkPlan], CometOperatorSerde[_]] = Map( classOf[ProjectExec] -> CometProjectExec, classOf[FilterExec] -> CometFilterExec, classOf[LocalLimitExec] -> CometLocalLimitExec, classOf[GlobalLimitExec] -> CometGlobalLimitExec, classOf[ExpandExec] -> CometExpandExec, classOf[GenerateExec] -> CometExplodeExec, classOf[HashAggregateExec] -> CometHashAggregateExec, classOf[ObjectHashAggregateExec] -> CometObjectHashAggregateExec, classOf[BroadcastHashJoinExec] -> CometBroadcastHashJoinExec, classOf[ShuffledHashJoinExec] -> CometHashJoinExec, classOf[SortMergeJoinExec] -> CometSortMergeJoinExec, classOf[SortExec] -> CometSortExec, classOf[LocalTableScanExec] -> CometLocalTableScanExec, classOf[WindowExec] -> CometWindowExec) /** * Sinks that have a native plan of ScanExec. */ val sinks: Map[Class[_ <: SparkPlan], CometOperatorSerde[_]] = Map( classOf[CoalesceExec] -> CometCoalesceExec, classOf[CollectLimitExec] -> CometCollectLimitExec, classOf[TakeOrderedAndProjectExec] -> CometTakeOrderedAndProjectExec, classOf[UnionExec] -> CometUnionExec)
-
对于其他不支持的算子,则直接返回当前的算子或者说标记一下 fallback理由
见如下例子:-
Scan ======> CometScan -
| | -
Filter CometFilter -
| | -
HashAggregate CometHashAggregate -
| | -
Exchange CometExchange -
| | -
HashAggregate CometHashAggregate -
| | -
UnsupportedOperator UnsupportedOperator - Native execution doesn't necessarily have to start from
CometScan: -
Scan =======> CometScan -
| | -
UnsupportedOperator UnsupportedOperator -
| | -
HashAggregate HashAggregate -
| | -
Exchange CometExchange -
| | -
HashAggregate CometHashAggregate -
| | -
UnsupportedOperator UnsupportedOperator - A sink can also be Comet operators other than
CometExchange, for instanceCometUnion: -
Scan Scan =======> CometScan CometScan -
| | | | -
Filter Filter CometFilter CometFilter -
| | | | -
Union CometUnion -
| | -
Project CometProject
*/
-
- scanRule 中 CometScanExec (满足scan.scanImpl == CometConf.SCAN_NATIVE_DATAFUSION,目前看起来应该是SCAN_NATIVE_COMET)变成 ProtoBuf格式的
-
第四步 移除对应的
CometSinkPlaceHolder/CometScanWrapper,
且调用对应的CometNativeExec.convertBlock 方法,用Protobuf bytes 填充 serializedPlanOpt,方便在native size进行执行