目录
-
- 摘要
- 一、自定义聚合函数概述
-
- [1.1 什么是聚合函数](#1.1 什么是聚合函数)
- [1.2 为什么需要UDAF](#1.2 为什么需要UDAF)
- 二、聚合函数原理
-
- [2.1 Map-Reduce模式](#2.1 Map-Reduce模式)
- [2.2 状态管理](#2.2 状态管理)
- 三、创建UDAF
-
- [3.1 使用defg定义](#3.1 使用defg定义)
- [3.2 使用map-reduce](#3.2 使用map-reduce)
- [3.3 完整UDAF示例](#3.3 完整UDAF示例)
- 四、窗口聚合
-
- [4.1 累计聚合](#4.1 累计聚合)
- [4.2 滑动窗口聚合](#4.2 滑动窗口聚合)
- [4.3 时间窗口聚合](#4.3 时间窗口聚合)
- 五、分布式聚合
-
- [5.1 分布式聚合原理](#5.1 分布式聚合原理)
- [5.2 分布式聚合示例](#5.2 分布式聚合示例)
- [5.3 分区聚合优化](#5.3 分区聚合优化)
- 六、实战案例
-
- [6.1 统计指标聚合](#6.1 统计指标聚合)
- [6.2 时间序列聚合](#6.2 时间序列聚合)
- [6.3 工业指标聚合](#6.3 工业指标聚合)
- 七、性能优化
-
- [7.1 向量化计算](#7.1 向量化计算)
- [7.2 状态优化](#7.2 状态优化)
- [7.3 内存优化](#7.3 内存优化)
- 八、总结
- 参考资料
摘要
本文深入讲解DolphinDB自定义聚合函数开发。从聚合函数原理到状态管理,从map-reduce模式到性能优化,从窗口聚合到分布式聚合,全面介绍UDAF开发的核心方法。通过丰富的代码示例,帮助读者掌握自定义聚合函数的核心技能。
一、自定义聚合函数概述
1.1 什么是聚合函数
聚合函数将多行数据聚合为一个结果:
#mermaid-svg-1yanoXzI5iAoOfZm{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-1yanoXzI5iAoOfZm .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-1yanoXzI5iAoOfZm .error-icon{fill:#552222;}#mermaid-svg-1yanoXzI5iAoOfZm .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-1yanoXzI5iAoOfZm .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-1yanoXzI5iAoOfZm .marker{fill:#333333;stroke:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm .marker.cross{stroke:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-1yanoXzI5iAoOfZm p{margin:0;}#mermaid-svg-1yanoXzI5iAoOfZm .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster-label text{fill:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster-label span{color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster-label span p{background-color:transparent;}#mermaid-svg-1yanoXzI5iAoOfZm .label text,#mermaid-svg-1yanoXzI5iAoOfZm span{fill:#333;color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .node rect,#mermaid-svg-1yanoXzI5iAoOfZm .node circle,#mermaid-svg-1yanoXzI5iAoOfZm .node ellipse,#mermaid-svg-1yanoXzI5iAoOfZm .node polygon,#mermaid-svg-1yanoXzI5iAoOfZm .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .rough-node .label text,#mermaid-svg-1yanoXzI5iAoOfZm .node .label text,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape .label,#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape .label{text-anchor:middle;}#mermaid-svg-1yanoXzI5iAoOfZm .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .rough-node .label,#mermaid-svg-1yanoXzI5iAoOfZm .node .label,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape .label,#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape .label{text-align:center;}#mermaid-svg-1yanoXzI5iAoOfZm .node.clickable{cursor:pointer;}#mermaid-svg-1yanoXzI5iAoOfZm .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm .arrowheadPath{fill:#333333;}#mermaid-svg-1yanoXzI5iAoOfZm .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-1yanoXzI5iAoOfZm .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-1yanoXzI5iAoOfZm .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-1yanoXzI5iAoOfZm .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-1yanoXzI5iAoOfZm .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-1yanoXzI5iAoOfZm .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-1yanoXzI5iAoOfZm .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster text{fill:#333;}#mermaid-svg-1yanoXzI5iAoOfZm .cluster span{color:#333;}#mermaid-svg-1yanoXzI5iAoOfZm div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-1yanoXzI5iAoOfZm .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-1yanoXzI5iAoOfZm rect.text{fill:none;stroke-width:0;}#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape p,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-1yanoXzI5iAoOfZm .icon-shape .label rect,#mermaid-svg-1yanoXzI5iAoOfZm .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-1yanoXzI5iAoOfZm .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-1yanoXzI5iAoOfZm .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-1yanoXzI5iAoOfZm :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 聚合函数原理
多行数据
聚合计算
单一结果
内置聚合
SUM/AVG/MAX/MIN
自定义聚合
UDAF
1.2 为什么需要UDAF
| 场景 | 说明 |
|---|---|
| 复杂计算 | 内置函数无法满足 |
| 业务逻辑 | 特定业务聚合 |
| 性能优化 | 自定义优化 |
| 分布式计算 | 分布式聚合 |
二、聚合函数原理
2.1 Map-Reduce模式
#mermaid-svg-60KfjNo9PjiOveis{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-60KfjNo9PjiOveis .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-60KfjNo9PjiOveis .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-60KfjNo9PjiOveis .error-icon{fill:#552222;}#mermaid-svg-60KfjNo9PjiOveis .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-60KfjNo9PjiOveis .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-60KfjNo9PjiOveis .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-60KfjNo9PjiOveis .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-60KfjNo9PjiOveis .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-60KfjNo9PjiOveis .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-60KfjNo9PjiOveis .marker{fill:#333333;stroke:#333333;}#mermaid-svg-60KfjNo9PjiOveis .marker.cross{stroke:#333333;}#mermaid-svg-60KfjNo9PjiOveis svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-60KfjNo9PjiOveis p{margin:0;}#mermaid-svg-60KfjNo9PjiOveis .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster-label text{fill:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster-label span{color:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster-label span p{background-color:transparent;}#mermaid-svg-60KfjNo9PjiOveis .label text,#mermaid-svg-60KfjNo9PjiOveis span{fill:#333;color:#333;}#mermaid-svg-60KfjNo9PjiOveis .node rect,#mermaid-svg-60KfjNo9PjiOveis .node circle,#mermaid-svg-60KfjNo9PjiOveis .node ellipse,#mermaid-svg-60KfjNo9PjiOveis .node polygon,#mermaid-svg-60KfjNo9PjiOveis .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .rough-node .label text,#mermaid-svg-60KfjNo9PjiOveis .node .label text,#mermaid-svg-60KfjNo9PjiOveis .image-shape .label,#mermaid-svg-60KfjNo9PjiOveis .icon-shape .label{text-anchor:middle;}#mermaid-svg-60KfjNo9PjiOveis .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .rough-node .label,#mermaid-svg-60KfjNo9PjiOveis .node .label,#mermaid-svg-60KfjNo9PjiOveis .image-shape .label,#mermaid-svg-60KfjNo9PjiOveis .icon-shape .label{text-align:center;}#mermaid-svg-60KfjNo9PjiOveis .node.clickable{cursor:pointer;}#mermaid-svg-60KfjNo9PjiOveis .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-60KfjNo9PjiOveis .arrowheadPath{fill:#333333;}#mermaid-svg-60KfjNo9PjiOveis .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-60KfjNo9PjiOveis .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-60KfjNo9PjiOveis .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-60KfjNo9PjiOveis .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-60KfjNo9PjiOveis .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-60KfjNo9PjiOveis .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-60KfjNo9PjiOveis .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-60KfjNo9PjiOveis .cluster text{fill:#333;}#mermaid-svg-60KfjNo9PjiOveis .cluster span{color:#333;}#mermaid-svg-60KfjNo9PjiOveis div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-60KfjNo9PjiOveis .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-60KfjNo9PjiOveis rect.text{fill:none;stroke-width:0;}#mermaid-svg-60KfjNo9PjiOveis .icon-shape,#mermaid-svg-60KfjNo9PjiOveis .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-60KfjNo9PjiOveis .icon-shape p,#mermaid-svg-60KfjNo9PjiOveis .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-60KfjNo9PjiOveis .icon-shape .label rect,#mermaid-svg-60KfjNo9PjiOveis .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-60KfjNo9PjiOveis .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-60KfjNo9PjiOveis .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-60KfjNo9PjiOveis :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Map-Reduce
数据分片1
Map
数据分片2
数据分片3
中间结果
Reduce
最终结果
2.2 状态管理
python
// 聚合函数需要维护状态
// 例如:计算平均值需要维护 sum 和 count
// 状态结构
class AggState {
sum = 0
count = 0
def update(value) {
sum += value
count += 1
}
def finalize() {
return sum / count
}
}
三、创建UDAF
3.1 使用defg定义
python
// 使用defg定义聚合函数
defg mySum(x) {
return sum(x)
}
// 使用
t = table(1..10 as value)
select mySum(value) from t // 55
3.2 使用map-reduce
python
// Map-Reduce聚合函数
def myAvgMap(x) {
return [sum(x), count(x)]
}
def myAvgReduce(mapResults) {
totalSum = sum(mapResults[0])
totalCount = sum(mapResults[1])
return totalSum / totalCount
}
// 注册聚合函数
addAggregator("myAvg", myAvgMap, myAvgReduce)
// 使用
t = table(1..10 as value)
select myAvg(value) from t // 5.5
3.3 完整UDAF示例
python
// 计算加权平均
def weightedAvgMap(values, weights) {
return [sum(values * weights), sum(weights)]
}
def weightedAvgReduce(mapResults) {
totalWeightedSum = sum(mapResults[0])
totalWeights = sum(mapResults[1])
return totalWeightedSum / totalWeights
}
addAggregator("weightedAvg", weightedAvgMap, weightedAvgReduce)
// 使用
t = table(
1..10 as value,
[1, 1, 1, 1, 1, 2, 2, 2, 2, 2] as weight
)
select weightedAvg(value, weight) from t
四、窗口聚合
4.1 累计聚合
python
// 累计聚合函数
defg cumAvg(x) {
return cumsum(x) \ cumcount(x)
}
// 使用
t = table(1..10 as value)
select value, cumAvg(value) as cum_avg from t
4.2 滑动窗口聚合
python
// 滑动窗口聚合
def movingStd(x, window) {
return mstd(x, window)
}
// 使用
t = table(1..100 as value)
select value, movingStd(value, 10) as moving_std from t
4.3 时间窗口聚合
python
// 时间窗口聚合
def timeWindowAvg(timestamp, value, window) {
return mavg(value, window)
}
// 使用
t = table(
2024.01.01T00:00:00 + 0..99 * 60000 as timestamp,
rand(20.0..30.0, 100) as temperature
)
select timestamp, temperature,
timeWindowAvg(timestamp, temperature, 10) as avg_10
from t
五、分布式聚合
5.1 分布式聚合原理
#mermaid-svg-xFhaahn7Wj1SS6K9{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-xFhaahn7Wj1SS6K9 .error-icon{fill:#552222;}#mermaid-svg-xFhaahn7Wj1SS6K9 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-xFhaahn7Wj1SS6K9 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .marker.cross{stroke:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-xFhaahn7Wj1SS6K9 p{margin:0;}#mermaid-svg-xFhaahn7Wj1SS6K9 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster-label text{fill:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster-label span{color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster-label span p{background-color:transparent;}#mermaid-svg-xFhaahn7Wj1SS6K9 .label text,#mermaid-svg-xFhaahn7Wj1SS6K9 span{fill:#333;color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node rect,#mermaid-svg-xFhaahn7Wj1SS6K9 .node circle,#mermaid-svg-xFhaahn7Wj1SS6K9 .node ellipse,#mermaid-svg-xFhaahn7Wj1SS6K9 .node polygon,#mermaid-svg-xFhaahn7Wj1SS6K9 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .rough-node .label text,#mermaid-svg-xFhaahn7Wj1SS6K9 .node .label text,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape .label{text-anchor:middle;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .rough-node .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .node .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape .label,#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape .label{text-align:center;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node.clickable{cursor:pointer;}#mermaid-svg-xFhaahn7Wj1SS6K9 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .arrowheadPath{fill:#333333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-xFhaahn7Wj1SS6K9 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xFhaahn7Wj1SS6K9 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster text{fill:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 .cluster span{color:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-xFhaahn7Wj1SS6K9 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-xFhaahn7Wj1SS6K9 rect.text{fill:none;stroke-width:0;}#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape p,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-xFhaahn7Wj1SS6K9 .icon-shape .label rect,#mermaid-svg-xFhaahn7Wj1SS6K9 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xFhaahn7Wj1SS6K9 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-xFhaahn7Wj1SS6K9 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-xFhaahn7Wj1SS6K9 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 分布式聚合
节点1
Map
节点2
节点3
中间结果
Reduce
最终结果
5.2 分布式聚合示例
python
// 创建分布式表
db = database("dfs://agg_db", VALUE, 1..100)
schema = table(1:0, `device_id`timestamp`value,
[INT, TIMESTAMP, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `device_id)
// 插入数据
loadTable("dfs://agg_db", "sensor_data").append!(
table(
take(1..100, 100000) as device_id,
take(now(), 100000) as timestamp,
rand(20.0..30.0, 100000) as value
)
)
// 分布式聚合
t = loadTable("dfs://agg_db", "sensor_data")
// 使用自定义聚合函数
select device_id, myAvg(value) as avg_value
from t
group by device_id
5.3 分区聚合优化
python
// 分区聚合:利用分区并行计算
select device_id, avg(value) as avg_value
from t
group by device_id
// 分区裁剪优化
select avg(value) as avg_value
from t
where device_id in 1..10
六、实战案例
6.1 统计指标聚合
python
// ========== 统计指标聚合函数 ==========
// 计算变异系数
defg cv(x) {
return std(x) / avg(x)
}
// 计算偏度
defg skewness(x) {
n = count(x)
m = avg(x)
s = std(x)
return sum((x - m)^3) / (n * s^3)
}
// 计算峰度
defg kurtosis(x) {
n = count(x)
m = avg(x)
s = std(x)
return sum((x - m)^4) / (n * s^4) - 3
}
// 使用
t = table(rand(20.0..30.0, 1000) as value)
select cv(value) as cv,
skewness(value) as skew,
kurtosis(value) as kurt
from t
6.2 时间序列聚合
python
// ========== 时间序列聚合函数 ==========
// 计算时间序列斜率
defg slope(timestamp, value) {
n = count(value)
sumX = sum(timestamp)
sumY = sum(value)
sumXY = sum(timestamp * value)
sumX2 = sum(timestamp * timestamp)
return (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX)
}
// 计算时间序列截距
defg intercept(timestamp, value) {
n = count(value)
avgX = avg(timestamp)
avgY = avg(value)
return avgY - slope(timestamp, value) * avgX
}
// 使用
t = table(
1..100 as time,
2 * (1..100) + rand(-5.0..5.0, 100) as value
)
select slope(time, value) as slope,
intercept(time, value) as intercept
from t
6.3 工业指标聚合
python
// ========== 工业指标聚合函数 ==========
// 计算OEE
defg calculateOEE(availability, performance, quality) {
return avg(availability * performance * quality) * 100
}
// 计算合格率
defg passRate(values, lowerLimit, upperLimit) {
return sum(values >= lowerLimit and values <= upperLimit) / count(values) * 100
}
// 计算CPK
defg cpk(values, lowerLimit, upperLimit) {
m = avg(values)
s = std(values)
cpu = (upperLimit - m) / (3 * s)
cpl = (m - lowerLimit) / (3 * s)
return min(cpu, cpl)
}
// 使用
t = table(
rand(95.0..105.0, 1000) as measurement
)
select passRate(measurement, 90, 110) as pass_rate,
cpk(measurement, 90, 110) as cpk
from t
七、性能优化
7.1 向量化计算
python
// 向量化:使用向量化操作
defg fastSum(x) {
return sum(x) // 向量化
}
// 避免循环
defg slowSum(x) {
total = 0
for (v in x) {
total += v // 非向量化,慢
}
return total
}
7.2 状态优化
python
// 状态优化:减少中间结果
defg optimizedAvg(x) {
// 直接计算,不存储中间结果
return sum(x) / count(x)
}
7.3 内存优化
python
// 内存优化:使用流式计算
defg streamingAgg(x) {
// 流式计算,不存储全部数据
return sum(x)
}
八、总结
本文详细介绍了DolphinDB自定义聚合函数:
- 聚合原理:Map-Reduce模式、状态管理
- 创建方法:defg定义、map-reduce注册
- 窗口聚合:累计聚合、滑动窗口、时间窗口
- 分布式聚合:分布式原理、分区优化
- 实战应用:统计指标、时间序列、工业指标
- 性能优化:向量化、状态优化、内存优化
思考题:
- 如何设计分布式聚合函数?
- 窗口聚合和普通聚合有什么区别?
- 如何优化聚合函数性能?