Spark Datafusion Comet 向量化Rust Native--Native算子指标如何传递到Spark UI上展示

背景

Apache Datafusion Comet 是苹果公司开源的加速Spark运行的向量化项目。

本项目采用了 Spark插件化 + Protobuf + Arrow + DataFusion 架构形式

其中

Spark插件是利用 SparkPlugin 插件，其中分为 DriverPlugin 和 ExecutorPlugin ,这两个插件在driver和 Executor启动的时候就会调用
Protobuf 是用来序列化 spark对应的表达式以及计划，用来传递给 native 引擎去执行,利用了体积小，速度快的特性
Arrow 是用来 spark 和 native 引擎进行高效的数据交换(native执行的结果或者spark执行的数据结果)，主要在JNI中利用Arrow IPC 列式存储以及零拷贝等特点进行进程间数据交换
DataFusion 主要是利用Rust native以及Arrow内存格式实现的向量化执行引擎，Spark中主要offload对应的算子到该引擎中去执行

本文基于 datafusion comet 截止到2026年1月13号的main分支的最新代码(对应的commit为 eef5f28a0727d9aef043fa2b87d6747ff68b827a)
主要分析Rust Native的Spark Datafusion Comet 向量化Rust Native--执行Datafusion计划中涉及到的指标更新的细节实现

Native Rust 中的指标更新

这里的指标更新主要是指把 DataFusion native物理计划端的执行指标反馈到JVM端中去，直接上代码:

复制代码

fn update_metrics(env: &mut JNIEnv, exec_context: &mut ExecutionContext) -> CometResult<()> {
    if let Some(native_query) = &exec_context.root_op {
        let metrics = exec_context.metrics.as_obj();
        update_comet_metric(env, metrics, native_query)
    } else {
        Ok(())
    }
}

获取到JVM端CometMetricNode实例
exec_context.metrics.as_obj()
通过调用update_comet_metric更新指标
复制代码
```
pub(crate) fn update_comet_metric(
 env: &mut JNIEnv,
 metric_node: &JObject,
 spark_plan: &Arc<SparkPlan>,
 ) -> Result<(), CometError> {
     if metric_node.is_null() {
         return Ok(());
     }
 
     let native_metric = to_native_metric_node(spark_plan);
     let jbytes = env.byte_array_from_slice(&native_metric?.encode_to_vec())?;
 
     unsafe { jni_call!(env, comet_metric_node(metric_node).set_all_from_bytes(&jbytes) -> ()) }
 }
```
- 通过递归迭代获取到子节点的Metrics
  
  调用to_native_metric_node方法，该方法的主要作用就是获取到DataFusion ExecutionPlan的metrics或者additional_native_plans的metrics，并且在该计划的子节点上调用to_native_metric_node方法获取子节点的指标并把这些metrics集中到一起
  
  以ScanExec为例，这里就会记录拉取数据的耗时指标,以及获取到的行数指标
- 序列化二进制数据
  
  这里使用了JNIEnv.byte_array_from_slice和protobuf的encode_to_vec方法，其中protobuf的encode_to_vec 用于将 protobuf 消息结构体直接序列化为 Vec 二进制数据
- 调用JNI方法将指标传送给JVM端
  复制代码
```
 jni_call!(env, comet_metric_node(metric_node).set_all_from_bytes(&jbytes) -> ())
```
  这里调用的set_all_from_bytes方法也就是JVM端的CometMetricNode类的set_all_from_bytes方法：
  复制代码
```
 private def set_all(metricNode: Metric.NativeMetricNode): Unit = {
    metricNode.getMetricsMap.forEach((name, value) => {
      set(name, value)
    })
    metricNode.getChildrenList.asScala.zip(children).foreach { case (child, childNode) =>
      childNode.set_all(child)
    }
  }

  def set_all_from_bytes(bytes: Array[Byte]): Unit = {
    val metricNode = Metric.NativeMetricNode.parseFrom(bytes)
    set_all(metricNode)
  }
```
  这里会调用NativeMetricNode.parseFrom方法将ProtoBuf的二进制反编码为NativeMetricNode数据，在依次赋值到对应的CometMetricNode节点，这样DataFusion native的指标数据就传递到了JVM端

Native Rust 中的指标展示

CometMetricNode中的baselineMetrics方法如下：

复制代码

  def baselineMetrics(sc: SparkContext): Map[String, SQLMetric] = {
    Map(
      "output_rows" -> SQLMetrics.createMetric(sc, "number of output rows"),
      "elapsed_compute" -> SQLMetrics.createNanoTimingMetric(
        sc,
        "total time (in ms) spent in this operator"))
  }

这里的output_rows和elapsed_compute和native中MetricValue的值都是一一对应的:

复制代码

    pub enum MetricValue {
      /// Number of output rows produced: "output_rows" metric
      OutputRows(Count),
      /// Elapsed Compute Time: the wall clock time spent in "cpu
      /// intensive" work.
      ///
      /// This measurement represents, roughly:
      /// ```
      /// use std::time::Instant;
      /// let start = Instant::now();
      /// // ...CPU intensive work here...
      /// let elapsed_compute = (Instant::now() - start).as_nanos();
      /// ```
      ///
      /// Note 1: Does *not* include time other operators spend
      /// computing input.
      ///
      /// Note 2: *Does* includes time when the thread could have made
      /// progress but the OS did not schedule it (e.g. due to CPU
      /// contention), thus making this value different than the
      /// classical definition of "cpu_time", which is the time reported
      /// from `clock_gettime(CLOCK_THREAD_CPUTIME_ID, ..)`.
      ElapsedCompute(Time),
      /// Number of spills produced: "spill_count" metric
      SpillCount(Count),
      /// Total size of spilled bytes produced: "spilled_bytes" metric
      SpilledBytes(Count),
      /// Total size of output bytes produced: "output_bytes" metric
      OutputBytes(Count),
      /// Total size of spilled rows produced: "spilled_rows" metric
      SpilledRows(Count),
      /// Current memory used
      CurrentMemoryUsage(Gauge),
      ...

该指标对应的赋值在to_native_metric_node方法中：

复制代码

  node_metrics
     .unwrap_or_default()
     .iter()
     .map(|m| m.value())
     .map(|m| (m.name(), m.as_usize() as i64))
     .for_each(|(name, value)| {
         native_metric_node.metrics.insert(name.to_string(), value);
     });

而JVM端中的baselineMetrics指标数据在创建对应的CometNativeExec计划的时候就会进行关联，如:

复制代码

  abstract class CometNativeExec extends CometExec {

    ...
    override lazy val metrics: Map[String, SQLMetric] =
      CometMetricNode.baselineMetrics(sparkContext)
    ...

这里的metrics就会在Spark UI上进行展示。