Spatiotemporal Prediction using Deep Learning

In time series data, there are several tasks that are commonly performed, such as classification, event detection, anomaly detection, and the most dominant task is forecasting. Forecasting is simply predicting future information by utilizing information from the present and past. This is possible by assuming that time series data are correlated with each other or autocorrelated. In deep learning or neural networks, this assumption is utilized in the creation of architectures such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Both of these architectures are part of the Recurrent Neural Network (RNN).

In the application, RNN can predict future information (Forecasting) in the form of a sequence by utilizing a sequence of current and past information. However, problems arise if the information we want to get is in the form of a spatial sequence. So the question comes up, can neural networks be applied to the prediction of spatiotemporal data?

Results of CNN and RNN Combination

Unlike RNN that can capture time patterns, Convolutional Neural Network is very good at capturing spatial patterns. This is because CNN allows the extraction of spatial features into complex and abstract features by utilizing the Convolution layer.

By utilizing algorithms from CNN and RNN, an architecture called ConvLSTM was born. ConvLSTM is simply an algorithm that integrates convolution elements into the LSTM architecture so that it is possible to extract spatial features first and then extract the time series pattern. This allows for spatiotemporal data to be predicted over time. However, ConvLSTM is better for nowcasting or forecasting in a short time (not too long).

Example : Java Region Hourly Temperature Nowcasting

To demonstrate ConvLSTM there are several data that can be used such as video data and climate data. In this article, climate data in the form of Hourly Temperature in the Java Island region of Indonesia is used for nowcasting. The objective of this example is to create a model that allows predicting 12 hours of temperature data by utilizing data from the previous 12 hours.

Steps of Analysis

Normalization

The first thing done to the data is normalization. This is done to constrain the value to be between 0 and 1. Normalization is done on temperature data so that it can be treated as video data using a min-max scaler.

Press enter or click to view image in full size

Data Splitting

As in machine learning models, the data is divided into train, validation, and test. 70% of the initial data is used as train data, 15% of the next data for validation, and the remaining 15% for test.

Reshaping

To be able to use ConvLSTM architecture specifically ConvLSTM2D, input and output data needs to be converted into 5D tensor with the following details (num_samples, num_timesteps, num_longitudes, num_latitudes, num_features) . Where num_samples is the total number of samples, num_timesteps is the number of sequences in each sample, num_longitudes is the number of longitudes or number of rows, num_latitudes is the number of longitudes or number of columns, and num_features is the number of features or channels.

In this problem, the output data is shifted by one from the input data. This means that if the input starts from time t then the output starts from time t+1. This is done because the model produced is better than the output data only in the form of spatial data with a time dimension of one.

Train The ConvLSTM

The architecture used is 3 stacked ConvLSTM2D layers plus Conv3D output layer with an alternating Batch Normalization layer in each layer. The architecture in Python syntax can be seen below.

python 复制代码
from tensorflow import keras
from tensorflow.keras import layers

inp = layers.Input(shape=(None, num_longitudes,  num_latitudes, num_features))

    x = layers.BatchNormalization()(inp)
    x = layers.ConvLSTM2D(
        filters=16,
        kernel_size=(5, 5),
        padding="same",
        return_sequences=True,
        activation="relu",
    )(x)
    x = layers.BatchNormalization()(x)
    x = layers.ConvLSTM2D(
        filters=32,
        kernel_size=(3, 3),
        padding="same",
        return_sequences=True,
        activation="relu",
    )(x)
    x = layers.BatchNormalization()(x)
    x = layers.BatchNormalization()(x)
    x = layers.ConvLSTM2D(
        filters=32,
        kernel_size=(1, 1),
        padding="same",
        return_sequences=True,
        activation="relu",
    )(x)
    x = layers.BatchNormalization()(x)
    x = layers.Conv3D(
        filters=1, kernel_size=(3, 3, 3), activation="sigmoid", padding="same"
    )(x)

    model = keras.models.Model(inp, x)
    model.compile(
        loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.Adam(),
    )

The hyperparameters in this model are the results of hyperparameter tuning using the Hyperband method. After that, training is carried out and the loss for each part of the data is obtained as follows.

Prediction

After the best model has been obtained, then we try to do a 12-hour prediction with the data that has been displayed previously. After that, it is compared with the actual value or ground truth of what we predict.

It can be seen that ConvLSTM can be said to be good enough in making predictions in a short period (Nowcasting). The predicted value can be seen to follow the temperature pattern and also the temperature value even though the prediction results look blurry.

Restriction

Apart from the ability of ConvLSTM to predict spatiotemporal in the short term. The architecture used has a restriction that the predicted value cannot be more than the maximum value and below the minimum value in the training data. This is because we use min-max scaler for data normalization so that if the data we predict has anomalies, the architecture currently used cannot capture it.

Conclusion

ConvLSTM is a method that can be used to predict spatiotemporal values for short time periods. There are many applications with this method, one of which is in the field of climate as in the example in this article with quite usable results.

相关推荐
冬奇Lab9 小时前
Workflow 系列(01):基础理论——三种执行模型与 Anthropic 5 种模式
人工智能·agent·工作流引擎
冬奇Lab9 小时前
每日一个开源项目(第143篇):page-agent - 纯 JS 的网页 GUI Agent,无需截图、无需插件、无需后端
前端·人工智能·agent
程序员cxuan11 小时前
虽迟但到!GPT-5.6 终于来了!
人工智能·后端·程序员
ZhengEnCi13 小时前
Q03-UI设计进阶技巧-让界面更高级的7个核心原则
人工智能
IT_陈寒14 小时前
React的这个渲染问题连官方文档都没说清楚
前端·人工智能·后端
不加辣椒15 小时前
第12章 工具调用与 Agent 提示工程
人工智能
用户16931761726615 小时前
前端给AI消息做日期分组与时间线
人工智能
i晟15 小时前
Claude Code Harness 深度拆解:从你敲回车到模型回复,中间发生了什么
人工智能
用户2527362781416 小时前
【踩坑复盘】我在本地跑 RAG 知识库时踩了 5 个大坑,吐血整理避坑指南
人工智能
大模型真好玩16 小时前
LangChain DeepAgents 速通指南(九)—— 生产级智能体框架 DeepAgents Code 源码导读
人工智能·langchain·agent