目录
-
- 摘要
- 一、时间序列查询概述
-
- [1.1 什么是时间序列查询](#1.1 什么是时间序列查询)
- [1.2 时间序列查询特点](#1.2 时间序列查询特点)
- [1.3 应用场景](#1.3 应用场景)
- 二、时间窗口函数
-
- [2.1 bar函数:固定窗口](#2.1 bar函数:固定窗口)
- [2.2 dailyAlignedBar:对齐窗口](#2.2 dailyAlignedBar:对齐窗口)
- [2.3 时间窗口类型](#2.3 时间窗口类型)
- 三、滑动窗口计算
-
- [3.1 滑动窗口聚合](#3.1 滑动窗口聚合)
- [3.2 时间滑动窗口](#3.2 时间滑动窗口)
- [3.3 累计计算](#3.3 累计计算)
- 四、窗口函数详解
-
- [4.1 ROWS窗口](#4.1 ROWS窗口)
- [4.2 RANGE窗口](#4.2 RANGE窗口)
- [4.3 分组窗口](#4.3 分组窗口)
- 五、时间对齐
-
- [5.1 多源时间对齐](#5.1 多源时间对齐)
- [5.2 重采样](#5.2 重采样)
- [5.3 插值填充](#5.3 插值填充)
- 六、时间序列分析
-
- [6.1 同比环比计算](#6.1 同比环比计算)
- [6.2 时间序列分解](#6.2 时间序列分解)
- [6.3 时间序列统计](#6.3 时间序列统计)
- 七、实战案例
-
- [7.1 设备监控时间序列分析](#7.1 设备监控时间序列分析)
- [7.2 生产指标时间序列分析](#7.2 生产指标时间序列分析)
- 八、总结
- 参考资料
摘要
本文深入讲解DolphinDB时间序列查询技术。从时间窗口函数到滑动窗口计算,从会话窗口到实时聚合,全面介绍时间序列数据处理的核心方法。通过丰富的代码示例,帮助读者掌握时间序列查询的核心技能。
一、时间序列查询概述
1.1 什么是时间序列查询
时间序列查询是针对按时间排序的数据进行的查询操作:
时间序列查询
时间序列数据
时间窗口
窗口内聚合
时间维度分析
查询类型
时间窗口聚合
滑动窗口计算
时间对齐
1.2 时间序列查询特点
| 特点 | 说明 |
|---|---|
| 时间有序 | 数据按时间排序 |
| 窗口聚合 | 按时间窗口聚合 |
| 滑动计算 | 滑动窗口统计 |
| 时间对齐 | 多源时间对齐 |
1.3 应用场景
| 场景 | 说明 |
|---|---|
| 监控告警 | 实时指标监控 |
| 趋势分析 | 历史趋势分析 |
| 异常检测 | 时间序列异常 |
| 预测分析 | 时间序列预测 |
二、时间窗口函数
2.1 bar函数:固定窗口
python
// 创建时间序列数据
t = table(
1..1000 as id,
2024.01.01T00:00:00 + 0..999 * 60000 as timestamp, // 每分钟一条
rand(20.0..30.0, 1000) as temperature,
rand(40.0..60.0, 1000) as humidity
)
// 按小时窗口聚合
select bar(timestamp, 1h) as hour,
avg(temperature) as avg_temp,
max(temperature) as max_temp,
min(temperature) as min_temp,
count(*) as cnt
from t
group by bar(timestamp, 1h)
// 按10分钟窗口聚合
select bar(timestamp, 10m) as window_10m,
avg(temperature) as avg_temp,
avg(humidity) as avg_humidity
from t
group by bar(timestamp, 10m)
2.2 dailyAlignedBar:对齐窗口
python
// 创建跨天数据
t = table(
2024.01.01T20:00:00 + 0..999 * 60000 as timestamp,
rand(20.0..30.0, 1000) as temperature
)
// 按日对齐的窗口聚合
select dailyAlignedBar(timestamp, 08:00:00, 1d) as day,
avg(temperature) as avg_temp,
max(temperature) as max_temp,
min(temperature) as min_temp
from t
group by dailyAlignedBar(timestamp, 08:00:00, 1d)
// 按班次对齐(8小时一班)
select dailyAlignedBar(timestamp, 00:00:00, 8h) as shift,
avg(temperature) as avg_temp
from t
group by dailyAlignedBar(timestamp, 00:00:00, 8h)
2.3 时间窗口类型
| 函数 | 说明 | 适用场景 |
|---|---|---|
| bar | 固定大小窗口 | 任意时间窗口 |
| dailyAlignedBar | 日对齐窗口 | 按日统计 |
| interval | 等间隔窗口 | 等间隔采样 |
三、滑动窗口计算
3.1 滑动窗口聚合
python
// 创建时间序列数据
t = table(
1..100 as id,
2024.01.01T00:00:00 + 0..99 * 60000 as timestamp,
rand(20.0..30.0, 100) as temperature
)
// 滑动平均(前10条)
select id, timestamp, temperature,
mavg(temperature, 10) as moving_avg_10
from t
// 滑动最大值
select id, timestamp, temperature,
mmax(temperature, 10) as moving_max_10
from t
// 滑动最小值
select id, timestamp, temperature,
mmin(temperature, 10) as moving_min_10
from t
// 滑动标准差
select id, timestamp, temperature,
mstd(temperature, 10) as moving_std_10
from t
3.2 时间滑动窗口
python
// 时间滑动窗口
select timestamp, temperature,
mavg(temperature, 10) as mavg_10,
msum(temperature, 10) as msum_10,
mcount(temperature, 10) as mcount_10
from t
context by id // 按设备分组滑动
3.3 累计计算
python
// 累计计算
select id, timestamp, temperature,
cumsum(temperature) as cum_sum,
cumavg(temperature) as cum_avg,
cummax(temperature) as cum_max,
cummin(temperature) as cum_min,
cumcount(temperature) as cum_count
from t
四、窗口函数详解
4.1 ROWS窗口
python
// ROWS窗口:基于行数
select id, timestamp, temperature,
avg(temperature) over (
order by timestamp
rows between 5 preceding and current row
) as avg_5_rows,
max(temperature) over (
order by timestamp
rows between 10 preceding and 5 preceding
) as max_10_to_5
from t
4.2 RANGE窗口
python
// RANGE窗口:基于值范围
select id, timestamp, temperature,
avg(temperature) over (
order by timestamp
range between 60000 preceding and current row // 1分钟内
) as avg_1min,
sum(temperature) over (
order by timestamp
range between 3600000 preceding and current row // 1小时内
) as sum_1hour
from t
4.3 分组窗口
python
// 创建多设备数据
t = table(
take(1..10, 1000) as device_id,
2024.01.01T00:00:00 + 0..999 * 60000 as timestamp,
rand(20.0..30.0, 1000) as temperature
)
// 分组滑动窗口
select device_id, timestamp, temperature,
mavg(temperature, 10) as moving_avg
from t
context by device_id
// 分组累计
select device_id, timestamp, temperature,
cumsum(temperature) as cum_sum,
cumavg(temperature) as cum_avg
from t
context by device_id
五、时间对齐
5.1 多源时间对齐
python
// 创建两个不同频率的数据源
t1 = table(
2024.01.01T00:00:00 + 0..99 * 60000 as timestamp, // 每分钟
rand(20.0..30.0, 100) as temp_1min
)
t2 = table(
2024.01.01T00:00:00 + 0..19 * 300000 as timestamp, // 每5分钟
rand(40.0..60.0, 20) as humidity_5min
)
// 时间对齐聚合
select bar(t1.timestamp, 5m) as window_5m,
avg(t1.temp_1min) as avg_temp,
last(t2.humidity_5min) as humidity
from t1
left join t2 on bar(t1.timestamp, 5m) = t2.timestamp
group by bar(t1.timestamp, 5m)
5.2 重采样
python
// 创建高频数据
t = table(
2024.01.01T00:00:00 + 0..999 * 6000 as timestamp, // 每6秒
rand(20.0..30.0, 1000) as temperature
)
// 重采样:6秒 → 1分钟
select bar(timestamp, 1m) as minute,
first(temperature) as open,
max(temperature) as high,
min(temperature) as low,
last(temperature) as close,
count(*) as cnt
from t
group by bar(timestamp, 1m)
// 重采样:6秒 → 5分钟
select bar(timestamp, 5m) as minute_5,
first(temperature) as open,
max(temperature) as high,
min(temperature) as low,
last(temperature) as close
from t
group by bar(timestamp, 5m)
5.3 插值填充
python
// 创建有缺失的数据
t = table(
2024.01.01T00:00:00 + [0, 1, 3, 5, 8, 10] * 60000 as timestamp,
[25.0, 26.0, NULL, 27.0, NULL, 28.0] as temperature
)
// 前向填充
select timestamp, temperature,
ffill(temperature) as temp_ffill
from t
// 线性插值
select timestamp, temperature,
interpolate(temperature, "linear") as temp_linear
from t
六、时间序列分析
6.1 同比环比计算
python
// 创建日数据
t = table(
2024.01.01 + 0..99 as date,
rand(1000.0..2000.0, 100) as sales
)
// 环比(日环比)
select date, sales,
prev(sales) as prev_sales,
(sales - prev(sales)) / prev(sales) as mom_rate
from t
// 同比(周同比)
select date, sales,
lag(sales, 7) as prev_week_sales,
(sales - lag(sales, 7)) / lag(sales, 7) as wow_rate
from t
6.2 时间序列分解
python
// 创建带趋势和季节性的数据
t = table(
1..365 as day,
2024.01.01 + 0..364 as date,
100 + 0.1 * (1..365) + 10 * sin(2 * pi * (1..365) / 365) + rand(-5.0..5.0, 365) as value
)
// 移动平均(趋势)
select day, date, value,
mavg(value, 7) as trend_7d,
mavg(value, 30) as trend_30d
from t
// 周期性分析
select day, date, value,
value - mavg(value, 7) as detrended
from t
6.3 时间序列统计
python
// 时间序列统计
select
count(*) as total_records,
min(timestamp) as start_time,
max(timestamp) as end_time,
max(timestamp) - min(timestamp) as duration,
avg(temperature) as avg_temp,
std(temperature) as std_temp,
skewness(temperature) as skew,
kurtosis(temperature) as kurt
from t
七、实战案例
7.1 设备监控时间序列分析
python
// 创建设备监控数据
t = table(
take(1..100, 10000) as device_id,
2024.01.01T00:00:00 + take(0..99, 10000) * 60000 as timestamp,
rand(20.0..30.0, 10000) as temperature,
rand(40.0..60.0, 10000) as humidity,
rand(1000.0..1020.0, 10000) as pressure
)
// 设备实时监控
select device_id, timestamp, temperature,
mavg(temperature, 10) as moving_avg,
temperature - mavg(temperature, 10) as deviation,
mstd(temperature, 10) as moving_std
from t
context by device_id
// 设备异常检测(偏离均值3倍标准差)
select device_id, timestamp, temperature,
mavg(temperature, 30) as avg_30,
mstd(temperature, 30) as std_30,
abs(temperature - mavg(temperature, 30)) > 3 * mstd(temperature, 30) as is_anomaly
from t
context by device_id
7.2 生产指标时间序列分析
python
// 创建生产数据
t = table(
2024.01.01 + 0..89 as date,
rand(800..1200, 90) as production,
rand(50..100, 90) as efficiency,
rand(1..5, 90) as defects
)
// 日生产统计
select date, production, efficiency, defects,
mavg(production, 7) as production_ma7,
mavg(efficiency, 7) as efficiency_ma7,
cumsum(production) as cum_production,
cumsum(defects) as cum_defects
from t
// 周统计
select bar(date, 7d) as week,
sum(production) as total_production,
avg(efficiency) as avg_efficiency,
sum(defects) as total_defects
from t
group by bar(date, 7d)
八、总结
本文详细介绍了DolphinDB时间序列查询:
- 时间窗口函数:bar、dailyAlignedBar、interval
- 滑动窗口计算:mavg、mmax、mmin、mstd
- 窗口函数:ROWS窗口、RANGE窗口、分组窗口
- 时间对齐:多源对齐、重采样、插值填充
- 时间序列分析:同比环比、趋势分解、统计分析
- 实战应用:设备监控、生产分析
思考题:
- 如何选择合适的时间窗口大小?
- 滑动窗口和固定窗口有什么区别?
- 如何处理时间序列中的缺失值?