DolphinDB自定义聚合函数:UDAF详解

目录

    • 摘要
    • 一、自定义聚合函数概述
      • [1.1 什么是聚合函数](#1.1 什么是聚合函数)
      • [1.2 为什么需要UDAF](#1.2 为什么需要UDAF)
    • 二、聚合函数原理
      • [2.1 Map-Reduce模式](#2.1 Map-Reduce模式)
      • [2.2 状态管理](#2.2 状态管理)
    • 三、创建UDAF
      • [3.1 使用defg定义](#3.1 使用defg定义)
      • [3.2 使用map-reduce](#3.2 使用map-reduce)
      • [3.3 完整UDAF示例](#3.3 完整UDAF示例)
    • 四、窗口聚合
      • [4.1 累计聚合](#4.1 累计聚合)
      • [4.2 滑动窗口聚合](#4.2 滑动窗口聚合)
      • [4.3 时间窗口聚合](#4.3 时间窗口聚合)
    • 五、分布式聚合
      • [5.1 分布式聚合原理](#5.1 分布式聚合原理)
      • [5.2 分布式聚合示例](#5.2 分布式聚合示例)
      • [5.3 分区聚合优化](#5.3 分区聚合优化)
    • 六、实战案例
      • [6.1 统计指标聚合](#6.1 统计指标聚合)
      • [6.2 时间序列聚合](#6.2 时间序列聚合)
      • [6.3 工业指标聚合](#6.3 工业指标聚合)
    • 七、性能优化
      • [7.1 向量化计算](#7.1 向量化计算)
      • [7.2 状态优化](#7.2 状态优化)
      • [7.3 内存优化](#7.3 内存优化)
    • 八、总结
    • 参考资料

摘要

本文深入讲解DolphinDB自定义聚合函数开发。从聚合函数原理到状态管理,从map-reduce模式到性能优化,从窗口聚合到分布式聚合,全面介绍UDAF开发的核心方法。通过丰富的代码示例,帮助读者掌握自定义聚合函数的核心技能。


一、自定义聚合函数概述

1.1 什么是聚合函数

聚合函数将多行数据聚合为一个结果:
#mermaid-svg-1yanoXzI5iAoOfZm{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-1yanoXzI5iAoOfZm .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-1yanoXzI5iAoOfZm .error-icon{fill:#552222;}#mermaid-svg-1yanoXzI5iAoOfZm .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-1yanoXzI5iAoOfZm .marker{fill:#333333;stroke:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm .marker.cross{stroke:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-1yanoXzI5iAoOfZm p{margin:0;}#mermaid-svg-1yanoXzI5iAoOfZm .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster-label text{fill:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster-label span{color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster-label span p{background-color:transparent;}#mermaid-svg-1yanoXzI5iAoOfZm .label text,#mermaid-svg-1yanoXzI5iAoOfZm span{fill:#333;color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .node rect,#mermaid-svg-1yanoXzI5iAoOfZm .node circle,#mermaid-svg-1yanoXzI5iAoOfZm .node ellipse,#mermaid-svg-1yanoXzI5iAoOfZm .node polygon,#mermaid-svg-1yanoXzI5iAoOfZm .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .rough-node .label text,#mermaid-svg-1yanoXzI5iAoOfZm .node .label text,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape .label,#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape .label{text-anchor:middle;}#mermaid-svg-1yanoXzI5iAoOfZm .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .rough-node .label,#mermaid-svg-1yanoXzI5iAoOfZm .node .label,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape .label,#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape .label{text-align:center;}#mermaid-svg-1yanoXzI5iAoOfZm .node.clickable{cursor:pointer;}#mermaid-svg-1yanoXzI5iAoOfZm .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm .arrowheadPath{fill:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-1yanoXzI5iAoOfZm .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-1yanoXzI5iAoOfZm .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-1yanoXzI5iAoOfZm .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-1yanoXzI5iAoOfZm .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-1yanoXzI5iAoOfZm .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-1yanoXzI5iAoOfZm .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster text{fill:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster span{color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-1yanoXzI5iAoOfZm .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-1yanoXzI5iAoOfZm rect.text{fill:none;stroke-width:0;}#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape p,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape .label rect,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-1yanoXzI5iAoOfZm .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-1yanoXzI5iAoOfZm .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-1yanoXzI5iAoOfZm :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 聚合函数原理
多行数据
聚合计算
单一结果
内置聚合
SUM/AVG/MAX/MIN
自定义聚合
UDAF

1.2 为什么需要UDAF

场景 说明
复杂计算 内置函数无法满足
业务逻辑 特定业务聚合
性能优化 自定义优化
分布式计算 分布式聚合

二、聚合函数原理

2.1 Map-Reduce模式

#mermaid-svg-60KfjNo9PjiOveis{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-60KfjNo9PjiOveis .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-60KfjNo9PjiOveis .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-60KfjNo9PjiOveis .error-icon{fill:#552222;}#mermaid-svg-60KfjNo9PjiOveis .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-60KfjNo9PjiOveis .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-60KfjNo9PjiOveis .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-60KfjNo9PjiOveis .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-60KfjNo9PjiOveis .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-60KfjNo9PjiOveis .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-60KfjNo9PjiOveis .marker{fill:#333333;stroke:#333333;}#mermaid-svg-60KfjNo9PjiOveis .marker.cross{stroke:#333333;}#mermaid-svg-60KfjNo9PjiOveis svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-60KfjNo9PjiOveis p{margin:0;}#mermaid-svg-60KfjNo9PjiOveis .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster-label text{fill:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster-label span{color:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster-label span p{background-color:transparent;}#mermaid-svg-60KfjNo9PjiOveis .label text,#mermaid-svg-60KfjNo9PjiOveis span{fill:#333;color:#333;}#mermaid-svg-60KfjNo9PjiOveis .node rect,#mermaid-svg-60KfjNo9PjiOveis .node circle,#mermaid-svg-60KfjNo9PjiOveis .node ellipse,#mermaid-svg-60KfjNo9PjiOveis .node polygon,#mermaid-svg-60KfjNo9PjiOveis .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .rough-node .label text,#mermaid-svg-60KfjNo9PjiOveis .node .label text,#mermaid-svg-60KfjNo9PjiOveis .image-shape .label,#mermaid-svg-60KfjNo9PjiOveis .icon-shape .label{text-anchor:middle;}#mermaid-svg-60KfjNo9PjiOveis .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .rough-node .label,#mermaid-svg-60KfjNo9PjiOveis .node .label,#mermaid-svg-60KfjNo9PjiOveis .image-shape .label,#mermaid-svg-60KfjNo9PjiOveis .icon-shape .label{text-align:center;}#mermaid-svg-60KfjNo9PjiOveis .node.clickable{cursor:pointer;}#mermaid-svg-60KfjNo9PjiOveis .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-60KfjNo9PjiOveis .arrowheadPath{fill:#333333;}#mermaid-svg-60KfjNo9PjiOveis .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-60KfjNo9PjiOveis .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-60KfjNo9PjiOveis .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-60KfjNo9PjiOveis .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-60KfjNo9PjiOveis .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-60KfjNo9PjiOveis .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-60KfjNo9PjiOveis .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .cluster text{fill:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster span{color:#333;}#mermaid-svg-60KfjNo9PjiOveis div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-60KfjNo9PjiOveis .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-60KfjNo9PjiOveis rect.text{fill:none;stroke-width:0;}#mermaid-svg-60KfjNo9PjiOveis .icon-shape,#mermaid-svg-60KfjNo9PjiOveis .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-60KfjNo9PjiOveis .icon-shape p,#mermaid-svg-60KfjNo9PjiOveis .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-60KfjNo9PjiOveis .icon-shape .label rect,#mermaid-svg-60KfjNo9PjiOveis .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-60KfjNo9PjiOveis .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-60KfjNo9PjiOveis .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-60KfjNo9PjiOveis :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Map-Reduce
数据分片1
Map
数据分片2
数据分片3
中间结果
Reduce
最终结果

2.2 状态管理

python 复制代码
// 聚合函数需要维护状态
// 例如:计算平均值需要维护 sum 和 count

// 状态结构
class AggState {
    sum = 0
    count = 0
    
    def update(value) {
        sum += value
        count += 1
    }
    
    def finalize() {
        return sum / count
    }
}

三、创建UDAF

3.1 使用defg定义

python 复制代码
// 使用defg定义聚合函数
defg mySum(x) {
    return sum(x)
}

// 使用
t = table(1..10 as value)
select mySum(value) from t  // 55

3.2 使用map-reduce

python 复制代码
// Map-Reduce聚合函数
def myAvgMap(x) {
    return [sum(x), count(x)]
}

def myAvgReduce(mapResults) {
    totalSum = sum(mapResults[0])
    totalCount = sum(mapResults[1])
    return totalSum / totalCount
}

// 注册聚合函数
addAggregator("myAvg", myAvgMap, myAvgReduce)

// 使用
t = table(1..10 as value)
select myAvg(value) from t  // 5.5

3.3 完整UDAF示例

python 复制代码
// 计算加权平均
def weightedAvgMap(values, weights) {
    return [sum(values * weights), sum(weights)]
}

def weightedAvgReduce(mapResults) {
    totalWeightedSum = sum(mapResults[0])
    totalWeights = sum(mapResults[1])
    return totalWeightedSum / totalWeights
}

addAggregator("weightedAvg", weightedAvgMap, weightedAvgReduce)

// 使用
t = table(
    1..10 as value,
    [1, 1, 1, 1, 1, 2, 2, 2, 2, 2] as weight
)

select weightedAvg(value, weight) from t

四、窗口聚合

4.1 累计聚合

python 复制代码
// 累计聚合函数
defg cumAvg(x) {
    return cumsum(x) \ cumcount(x)
}

// 使用
t = table(1..10 as value)
select value, cumAvg(value) as cum_avg from t

4.2 滑动窗口聚合

python 复制代码
// 滑动窗口聚合
def movingStd(x, window) {
    return mstd(x, window)
}

// 使用
t = table(1..100 as value)
select value, movingStd(value, 10) as moving_std from t

4.3 时间窗口聚合

python 复制代码
// 时间窗口聚合
def timeWindowAvg(timestamp, value, window) {
    return mavg(value, window)
}

// 使用
t = table(
    2024.01.01T00:00:00 + 0..99 * 60000 as timestamp,
    rand(20.0..30.0, 100) as temperature
)

select timestamp, temperature,
       timeWindowAvg(timestamp, temperature, 10) as avg_10
from t

五、分布式聚合

5.1 分布式聚合原理

#mermaid-svg-xFhaahn7Wj1SS6K9{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-xFhaahn7Wj1SS6K9 .error-icon{fill:#552222;}#mermaid-svg-xFhaahn7Wj1SS6K9 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-xFhaahn7Wj1SS6K9 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .marker.cross{stroke:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-xFhaahn7Wj1SS6K9 p{margin:0;}#mermaid-svg-xFhaahn7Wj1SS6K9 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster-label text{fill:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster-label span{color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster-label span p{background-color:transparent;}#mermaid-svg-xFhaahn7Wj1SS6K9 .label text,#mermaid-svg-xFhaahn7Wj1SS6K9 span{fill:#333;color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node rect,#mermaid-svg-xFhaahn7Wj1SS6K9 .node circle,#mermaid-svg-xFhaahn7Wj1SS6K9 .node ellipse,#mermaid-svg-xFhaahn7Wj1SS6K9 .node polygon,#mermaid-svg-xFhaahn7Wj1SS6K9 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .rough-node .label text,#mermaid-svg-xFhaahn7Wj1SS6K9 .node .label text,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape .label{text-anchor:middle;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .rough-node .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .node .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape .label{text-align:center;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node.clickable{cursor:pointer;}#mermaid-svg-xFhaahn7Wj1SS6K9 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .arrowheadPath{fill:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xFhaahn7Wj1SS6K9 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster text{fill:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster span{color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-xFhaahn7Wj1SS6K9 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 rect.text{fill:none;stroke-width:0;}#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape p,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape .label rect,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xFhaahn7Wj1SS6K9 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-xFhaahn7Wj1SS6K9 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 分布式聚合
节点1
Map
节点2
节点3
中间结果
Reduce
最终结果

5.2 分布式聚合示例

python 复制代码
// 创建分布式表
db = database("dfs://agg_db", VALUE, 1..100)
schema = table(1:0, `device_id`timestamp`value,
               [INT, TIMESTAMP, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `device_id)

// 插入数据
loadTable("dfs://agg_db", "sensor_data").append!(
    table(
        take(1..100, 100000) as device_id,
        take(now(), 100000) as timestamp,
        rand(20.0..30.0, 100000) as value
    )
)

// 分布式聚合
t = loadTable("dfs://agg_db", "sensor_data")

// 使用自定义聚合函数
select device_id, myAvg(value) as avg_value
from t
group by device_id

5.3 分区聚合优化

python 复制代码
// 分区聚合:利用分区并行计算
select device_id, avg(value) as avg_value
from t
group by device_id

// 分区裁剪优化
select avg(value) as avg_value
from t
where device_id in 1..10

六、实战案例

6.1 统计指标聚合

python 复制代码
// ========== 统计指标聚合函数 ==========

// 计算变异系数
defg cv(x) {
    return std(x) / avg(x)
}

// 计算偏度
defg skewness(x) {
    n = count(x)
    m = avg(x)
    s = std(x)
    return sum((x - m)^3) / (n * s^3)
}

// 计算峰度
defg kurtosis(x) {
    n = count(x)
    m = avg(x)
    s = std(x)
    return sum((x - m)^4) / (n * s^4) - 3
}

// 使用
t = table(rand(20.0..30.0, 1000) as value)
select cv(value) as cv,
       skewness(value) as skew,
       kurtosis(value) as kurt
from t

6.2 时间序列聚合

python 复制代码
// ========== 时间序列聚合函数 ==========

// 计算时间序列斜率
defg slope(timestamp, value) {
    n = count(value)
    sumX = sum(timestamp)
    sumY = sum(value)
    sumXY = sum(timestamp * value)
    sumX2 = sum(timestamp * timestamp)
    
    return (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX)
}

// 计算时间序列截距
defg intercept(timestamp, value) {
    n = count(value)
    avgX = avg(timestamp)
    avgY = avg(value)
    
    return avgY - slope(timestamp, value) * avgX
}

// 使用
t = table(
    1..100 as time,
    2 * (1..100) + rand(-5.0..5.0, 100) as value
)

select slope(time, value) as slope,
       intercept(time, value) as intercept
from t

6.3 工业指标聚合

python 复制代码
// ========== 工业指标聚合函数 ==========

// 计算OEE
defg calculateOEE(availability, performance, quality) {
    return avg(availability * performance * quality) * 100
}

// 计算合格率
defg passRate(values, lowerLimit, upperLimit) {
    return sum(values >= lowerLimit and values <= upperLimit) / count(values) * 100
}

// 计算CPK
defg cpk(values, lowerLimit, upperLimit) {
    m = avg(values)
    s = std(values)
    cpu = (upperLimit - m) / (3 * s)
    cpl = (m - lowerLimit) / (3 * s)
    return min(cpu, cpl)
}

// 使用
t = table(
    rand(95.0..105.0, 1000) as measurement
)

select passRate(measurement, 90, 110) as pass_rate,
       cpk(measurement, 90, 110) as cpk
from t

七、性能优化

7.1 向量化计算

python 复制代码
// 向量化:使用向量化操作
defg fastSum(x) {
    return sum(x)  // 向量化
}

// 避免循环
defg slowSum(x) {
    total = 0
    for (v in x) {
        total += v  // 非向量化,慢
    }
    return total
}

7.2 状态优化

python 复制代码
// 状态优化:减少中间结果
defg optimizedAvg(x) {
    // 直接计算,不存储中间结果
    return sum(x) / count(x)
}

7.3 内存优化

python 复制代码
// 内存优化:使用流式计算
defg streamingAgg(x) {
    // 流式计算,不存储全部数据
    return sum(x)
}

八、总结

本文详细介绍了DolphinDB自定义聚合函数:

  1. 聚合原理:Map-Reduce模式、状态管理
  2. 创建方法:defg定义、map-reduce注册
  3. 窗口聚合:累计聚合、滑动窗口、时间窗口
  4. 分布式聚合:分布式原理、分区优化
  5. 实战应用:统计指标、时间序列、工业指标
  6. 性能优化:向量化、状态优化、内存优化

思考题

  1. 如何设计分布式聚合函数?
  2. 窗口聚合和普通聚合有什么区别?
  3. 如何优化聚合函数性能?

参考资料


相关推荐
七夜zippoe3 天前
DolphinDB自定义函数:UDF开发指南
开发语言·python·自定义函数·udf·dolphindb
七夜zippoe5 天前
DolphinDB异常检测引擎:实时告警
java·服务器·网络·异常·告警·dolphindb
七夜zippoe7 天前
DolphinDB横截面引擎:实时统计分析
分析·引擎·dolphindb·实时统计
j7~11 天前
【MYSQL】基本查询(表的增删查改)--详解
数据库·mysql·select·create·聚合函数·update·groupby
七夜zippoe14 天前
DolphinDB时间序列引擎:实时聚合计算
服务器·前端·时间序列·dolphindb·实时聚合
七夜zippoe14 天前
DolphinDB流数据表:创建与订阅
开发语言·订阅··dolphindb·数据表
Xxtaoaooo18 天前
DolphinDB开发者评测:用多范式编程重新定义时序数据分析
数据挖掘·数据分析·dolphindb
七夜zippoe23 天前
DolphinDB海量数据查询:分页与采样
分页·查询·采样·dolphindb·海量
七夜zippoe24 天前
DolphinDB多表关联查询:JOIN优化
优化·join·dolphindb·多表联查