DolphinDB时间序列查询:时间窗口与聚合

目录

    • 摘要
    • 一、时间序列查询概述
      • [1.1 什么是时间序列查询](#1.1 什么是时间序列查询)
      • [1.2 时间序列查询特点](#1.2 时间序列查询特点)
      • [1.3 应用场景](#1.3 应用场景)
    • 二、时间窗口函数
      • [2.1 bar函数:固定窗口](#2.1 bar函数:固定窗口)
      • [2.2 dailyAlignedBar:对齐窗口](#2.2 dailyAlignedBar:对齐窗口)
      • [2.3 时间窗口类型](#2.3 时间窗口类型)
    • 三、滑动窗口计算
      • [3.1 滑动窗口聚合](#3.1 滑动窗口聚合)
      • [3.2 时间滑动窗口](#3.2 时间滑动窗口)
      • [3.3 累计计算](#3.3 累计计算)
    • 四、窗口函数详解
      • [4.1 ROWS窗口](#4.1 ROWS窗口)
      • [4.2 RANGE窗口](#4.2 RANGE窗口)
      • [4.3 分组窗口](#4.3 分组窗口)
    • 五、时间对齐
      • [5.1 多源时间对齐](#5.1 多源时间对齐)
      • [5.2 重采样](#5.2 重采样)
      • [5.3 插值填充](#5.3 插值填充)
    • 六、时间序列分析
      • [6.1 同比环比计算](#6.1 同比环比计算)
      • [6.2 时间序列分解](#6.2 时间序列分解)
      • [6.3 时间序列统计](#6.3 时间序列统计)
    • 七、实战案例
      • [7.1 设备监控时间序列分析](#7.1 设备监控时间序列分析)
      • [7.2 生产指标时间序列分析](#7.2 生产指标时间序列分析)
    • 八、总结
    • 参考资料

摘要

本文深入讲解DolphinDB时间序列查询技术。从时间窗口函数到滑动窗口计算,从会话窗口到实时聚合,全面介绍时间序列数据处理的核心方法。通过丰富的代码示例,帮助读者掌握时间序列查询的核心技能。


一、时间序列查询概述

1.1 什么是时间序列查询

时间序列查询是针对按时间排序的数据进行的查询操作:
时间序列查询
时间序列数据
时间窗口
窗口内聚合
时间维度分析
查询类型
时间窗口聚合
滑动窗口计算
时间对齐

1.2 时间序列查询特点

特点 说明
时间有序 数据按时间排序
窗口聚合 按时间窗口聚合
滑动计算 滑动窗口统计
时间对齐 多源时间对齐

1.3 应用场景

场景 说明
监控告警 实时指标监控
趋势分析 历史趋势分析
异常检测 时间序列异常
预测分析 时间序列预测

二、时间窗口函数

2.1 bar函数:固定窗口

python 复制代码
// 创建时间序列数据
t = table(
    1..1000 as id,
    2024.01.01T00:00:00 + 0..999 * 60000 as timestamp,  // 每分钟一条
    rand(20.0..30.0, 1000) as temperature,
    rand(40.0..60.0, 1000) as humidity
)

// 按小时窗口聚合
select bar(timestamp, 1h) as hour,
       avg(temperature) as avg_temp,
       max(temperature) as max_temp,
       min(temperature) as min_temp,
       count(*) as cnt
from t
group by bar(timestamp, 1h)

// 按10分钟窗口聚合
select bar(timestamp, 10m) as window_10m,
       avg(temperature) as avg_temp,
       avg(humidity) as avg_humidity
from t
group by bar(timestamp, 10m)

2.2 dailyAlignedBar:对齐窗口

python 复制代码
// 创建跨天数据
t = table(
    2024.01.01T20:00:00 + 0..999 * 60000 as timestamp,
    rand(20.0..30.0, 1000) as temperature
)

// 按日对齐的窗口聚合
select dailyAlignedBar(timestamp, 08:00:00, 1d) as day,
       avg(temperature) as avg_temp,
       max(temperature) as max_temp,
       min(temperature) as min_temp
from t
group by dailyAlignedBar(timestamp, 08:00:00, 1d)

// 按班次对齐(8小时一班)
select dailyAlignedBar(timestamp, 00:00:00, 8h) as shift,
       avg(temperature) as avg_temp
from t
group by dailyAlignedBar(timestamp, 00:00:00, 8h)

2.3 时间窗口类型

函数 说明 适用场景
bar 固定大小窗口 任意时间窗口
dailyAlignedBar 日对齐窗口 按日统计
interval 等间隔窗口 等间隔采样

三、滑动窗口计算

3.1 滑动窗口聚合

python 复制代码
// 创建时间序列数据
t = table(
    1..100 as id,
    2024.01.01T00:00:00 + 0..99 * 60000 as timestamp,
    rand(20.0..30.0, 100) as temperature
)

// 滑动平均(前10条)
select id, timestamp, temperature,
       mavg(temperature, 10) as moving_avg_10
from t

// 滑动最大值
select id, timestamp, temperature,
       mmax(temperature, 10) as moving_max_10
from t

// 滑动最小值
select id, timestamp, temperature,
       mmin(temperature, 10) as moving_min_10
from t

// 滑动标准差
select id, timestamp, temperature,
       mstd(temperature, 10) as moving_std_10
from t

3.2 时间滑动窗口

python 复制代码
// 时间滑动窗口
select timestamp, temperature,
       mavg(temperature, 10) as mavg_10,
       msum(temperature, 10) as msum_10,
       mcount(temperature, 10) as mcount_10
from t
context by id  // 按设备分组滑动

3.3 累计计算

python 复制代码
// 累计计算
select id, timestamp, temperature,
       cumsum(temperature) as cum_sum,
       cumavg(temperature) as cum_avg,
       cummax(temperature) as cum_max,
       cummin(temperature) as cum_min,
       cumcount(temperature) as cum_count
from t

四、窗口函数详解

4.1 ROWS窗口

python 复制代码
// ROWS窗口:基于行数
select id, timestamp, temperature,
       avg(temperature) over (
           order by timestamp
           rows between 5 preceding and current row
       ) as avg_5_rows,
       max(temperature) over (
           order by timestamp
           rows between 10 preceding and 5 preceding
       ) as max_10_to_5
from t

4.2 RANGE窗口

python 复制代码
// RANGE窗口:基于值范围
select id, timestamp, temperature,
       avg(temperature) over (
           order by timestamp
           range between 60000 preceding and current row  // 1分钟内
       ) as avg_1min,
       sum(temperature) over (
           order by timestamp
           range between 3600000 preceding and current row  // 1小时内
       ) as sum_1hour
from t

4.3 分组窗口

python 复制代码
// 创建多设备数据
t = table(
    take(1..10, 1000) as device_id,
    2024.01.01T00:00:00 + 0..999 * 60000 as timestamp,
    rand(20.0..30.0, 1000) as temperature
)

// 分组滑动窗口
select device_id, timestamp, temperature,
       mavg(temperature, 10) as moving_avg
from t
context by device_id

// 分组累计
select device_id, timestamp, temperature,
       cumsum(temperature) as cum_sum,
       cumavg(temperature) as cum_avg
from t
context by device_id

五、时间对齐

5.1 多源时间对齐

python 复制代码
// 创建两个不同频率的数据源
t1 = table(
    2024.01.01T00:00:00 + 0..99 * 60000 as timestamp,  // 每分钟
    rand(20.0..30.0, 100) as temp_1min
)

t2 = table(
    2024.01.01T00:00:00 + 0..19 * 300000 as timestamp,  // 每5分钟
    rand(40.0..60.0, 20) as humidity_5min
)

// 时间对齐聚合
select bar(t1.timestamp, 5m) as window_5m,
       avg(t1.temp_1min) as avg_temp,
       last(t2.humidity_5min) as humidity
from t1
left join t2 on bar(t1.timestamp, 5m) = t2.timestamp
group by bar(t1.timestamp, 5m)

5.2 重采样

python 复制代码
// 创建高频数据
t = table(
    2024.01.01T00:00:00 + 0..999 * 6000 as timestamp,  // 每6秒
    rand(20.0..30.0, 1000) as temperature
)

// 重采样:6秒 → 1分钟
select bar(timestamp, 1m) as minute,
       first(temperature) as open,
       max(temperature) as high,
       min(temperature) as low,
       last(temperature) as close,
       count(*) as cnt
from t
group by bar(timestamp, 1m)

// 重采样:6秒 → 5分钟
select bar(timestamp, 5m) as minute_5,
       first(temperature) as open,
       max(temperature) as high,
       min(temperature) as low,
       last(temperature) as close
from t
group by bar(timestamp, 5m)

5.3 插值填充

python 复制代码
// 创建有缺失的数据
t = table(
    2024.01.01T00:00:00 + [0, 1, 3, 5, 8, 10] * 60000 as timestamp,
    [25.0, 26.0, NULL, 27.0, NULL, 28.0] as temperature
)

// 前向填充
select timestamp, temperature,
       ffill(temperature) as temp_ffill
from t

// 线性插值
select timestamp, temperature,
       interpolate(temperature, "linear") as temp_linear
from t

六、时间序列分析

6.1 同比环比计算

python 复制代码
// 创建日数据
t = table(
    2024.01.01 + 0..99 as date,
    rand(1000.0..2000.0, 100) as sales
)

// 环比(日环比)
select date, sales,
       prev(sales) as prev_sales,
       (sales - prev(sales)) / prev(sales) as mom_rate
from t

// 同比(周同比)
select date, sales,
       lag(sales, 7) as prev_week_sales,
       (sales - lag(sales, 7)) / lag(sales, 7) as wow_rate
from t

6.2 时间序列分解

python 复制代码
// 创建带趋势和季节性的数据
t = table(
    1..365 as day,
    2024.01.01 + 0..364 as date,
    100 + 0.1 * (1..365) + 10 * sin(2 * pi * (1..365) / 365) + rand(-5.0..5.0, 365) as value
)

// 移动平均(趋势)
select day, date, value,
       mavg(value, 7) as trend_7d,
       mavg(value, 30) as trend_30d
from t

// 周期性分析
select day, date, value,
       value - mavg(value, 7) as detrended
from t

6.3 时间序列统计

python 复制代码
// 时间序列统计
select 
    count(*) as total_records,
    min(timestamp) as start_time,
    max(timestamp) as end_time,
    max(timestamp) - min(timestamp) as duration,
    avg(temperature) as avg_temp,
    std(temperature) as std_temp,
    skewness(temperature) as skew,
    kurtosis(temperature) as kurt
from t

七、实战案例

7.1 设备监控时间序列分析

python 复制代码
// 创建设备监控数据
t = table(
    take(1..100, 10000) as device_id,
    2024.01.01T00:00:00 + take(0..99, 10000) * 60000 as timestamp,
    rand(20.0..30.0, 10000) as temperature,
    rand(40.0..60.0, 10000) as humidity,
    rand(1000.0..1020.0, 10000) as pressure
)

// 设备实时监控
select device_id, timestamp, temperature,
       mavg(temperature, 10) as moving_avg,
       temperature - mavg(temperature, 10) as deviation,
       mstd(temperature, 10) as moving_std
from t
context by device_id

// 设备异常检测(偏离均值3倍标准差)
select device_id, timestamp, temperature,
       mavg(temperature, 30) as avg_30,
       mstd(temperature, 30) as std_30,
       abs(temperature - mavg(temperature, 30)) > 3 * mstd(temperature, 30) as is_anomaly
from t
context by device_id

7.2 生产指标时间序列分析

python 复制代码
// 创建生产数据
t = table(
    2024.01.01 + 0..89 as date,
    rand(800..1200, 90) as production,
    rand(50..100, 90) as efficiency,
    rand(1..5, 90) as defects
)

// 日生产统计
select date, production, efficiency, defects,
       mavg(production, 7) as production_ma7,
       mavg(efficiency, 7) as efficiency_ma7,
       cumsum(production) as cum_production,
       cumsum(defects) as cum_defects
from t

// 周统计
select bar(date, 7d) as week,
       sum(production) as total_production,
       avg(efficiency) as avg_efficiency,
       sum(defects) as total_defects
from t
group by bar(date, 7d)

八、总结

本文详细介绍了DolphinDB时间序列查询:

  1. 时间窗口函数:bar、dailyAlignedBar、interval
  2. 滑动窗口计算:mavg、mmax、mmin、mstd
  3. 窗口函数:ROWS窗口、RANGE窗口、分组窗口
  4. 时间对齐:多源对齐、重采样、插值填充
  5. 时间序列分析:同比环比、趋势分解、统计分析
  6. 实战应用:设备监控、生产分析

思考题

  1. 如何选择合适的时间窗口大小?
  2. 滑动窗口和固定窗口有什么区别?
  3. 如何处理时间序列中的缺失值?

参考资料

相关推荐
Xxtaoaooo3 小时前
DolphinDB工业物联网实时分析:从海量数据困局到毫秒级预警的技术突围
物联网·struts·servlet·工业物联网·dolphindb
七夜zippoe1 天前
DolphinDB索引设计:提升查询性能
数据库·索引·性能·查询·dolphindb
七夜zippoe3 天前
DolphinDB查询优化:执行计划分析
大数据·数据库·信息可视化·dolphindb·查询优化
七夜zippoe4 天前
DolphinDB数据压缩与存储优化
优化·存储·数据·压缩·dolphindb
七夜zippoe5 天前
DolphinDB分区策略:HASH分区与COMPO分区
算法·哈希算法·hash·dolphindb·compo
七夜zippoe5 天前
DolphinDB分布式表:创建与管理
数据库·分布式·维度·dolphindb·数据写入
七夜zippoe7 天前
# DolphinDB分区策略:RANGE分区详解
数据库·策略·分区·range·dolphindb
katheta7 天前
时间序列模型总体分类
人工智能·分类·数据挖掘·时间序列·时序模型
七夜zippoe8 天前
DolphinDB分区策略:VALUE分区详解
数据库·oracle·分区·value·dolphindb