
论文网址:[2401.10134] Spatial-Temporal Large Language Model for Traffic Prediction
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
[1. 心得](#1. 心得)
[2. 论文逐段精读](#2. 论文逐段精读)
[2.1. Abstract](#2.1. Abstract)
[2.2. Introduction](#2.2. Introduction)
[2.3. Related Work](#2.3. Related Work)
[2.3.1. Large Language Models for Time Series Analysis](#2.3.1. Large Language Models for Time Series Analysis)
[2.3.2. Traffic Prediction](#2.3.2. Traffic Prediction)
[2.4. Problem Definition](#2.4. Problem Definition)
[2.5. Methodology](#2.5. Methodology)
[2.5.1. Overview](#2.5.1. Overview)
[2.5.2. Spatial-Temporal Embedding and Fusion](#2.5.2. Spatial-Temporal Embedding and Fusion)
[2.5.3. Partially Frozen Attention (PFA) LLM](#2.5.3. Partially Frozen Attention (PFA) LLM)
[2.6. Experiments](#2.6. Experiments)
[2.6.1. Datasdets](#2.6.1. Datasdets)
[2.6.2. Baselines](#2.6.2. Baselines)
[2.6.3. Implementations](#2.6.3. Implementations)
[2.6.4. Evaluation Metrics](#2.6.4. Evaluation Metrics)
[2.6.5. Main Results](#2.6.5. Main Results)
[2.6.6. Performance of ST-LLM and Ablation Studies](#2.6.6. Performance of ST-LLM and Ablation Studies)
[2.6.7. Parameter Analysis](#2.6.7. Parameter Analysis)
[2.6.8. Inference Time Analysis](#2.6.8. Inference Time Analysis)
[2.6.9. Few-Shot Prediction](#2.6.9. Few-Shot Prediction)
[2.6.10. Zero-Shot Prediction](#2.6.10. Zero-Shot Prediction)
[2.7. Conclusion](#2.7. Conclusion)
[3. Reference](#3. Reference)
1. 心得
(1)尽管几天后要投的论文还没开始写,仍然嚼嚼饼干写写阅读笔记。哎。这年头大家都跑得太快了
(2)比起数学,LLM适合配一杯奶茶读,全程轻松愉悦,这一篇就是分开三个卷积→合在一起→LLM(部分解冻一些模块)→over
2. 论文逐段精读
2.1. Abstract
①They proposed Spatial-Temporal Large Language Model (ST-LLM) to predict traffic(好像没什么特别的我就不写了,就是在介绍方法,说以前的精度不高。具体方法看以下图吧)
2.2. Introduction
①Traditional CNN and RNN cannot capture complex/long range spatial and temporal dependencies. GNNs are prone to overfitting, thus reseachers mainly use attention mechanism.
②Existing traffic prediction methods mainly focus on temporal feature rather than spatial
③For better long term prediction, they proposed partially frozen attention (PFA)
2.3. Related Work
2.3.1. Large Language Models for Time Series Analysis
①Listing TEMPO-GPT, TIME-LLM, OFA, TEST, and LLM-TIME, which all utilize temporal feature only. However, GATGPT, which introduced spatial feature, ignores temporal dependencies.
imputation n.归责;归罪;归咎;归因
2.3.2. Traffic Prediction
①Filter is a common and classic method for processing traffic data
②Irrgular city net makes CNN hard to apply or extract spatial feature
2.4. Problem Definition
①Input traffic data: , where
denotes timesteps,
denotes numberof spatial stations,
denotes feature
②Task: given historical traffic data of
time steps only, learning a function
with parameter
to predict future
timesteps:
:
2.5. Methodology
2.5.1. Overview
①Overall framework of ST-LLM:

where Spatial-Temporal Embedding layer extracts timesteps , spatial embedding
, and temporal embedding
of historical
timesteps. Then, they three are combined to
. Freeze first
layers and preserve last
layers in PFA LLM and get output
. Lastly, regresion convolution convert it to
.
2.5.2. Spatial-Temporal Embedding and Fusion
①They get tokens by pointwise convolution:
②Applying linear layer to encode input to day
and week
:
where and
are learnable parameter and the output is
③They extract spatial correlations by:
④Fusion convolution:
where
2.5.3. Partially Frozen Attention (PFA) LLM
①They freeze the first layers (including multihead attention and feed-forward layers) which contains important information:
where ,
,
denotes learnable positional encoding,
represents the intermediate representation of the
-th layer after applying the frozen multi-head attention (MHA) and the first unfrozen layer normalization (LN),
symbolizes the final representation after applying the unfrozen LN and frozen feed-forward network (FFN), and:
②Unfreezing the last layers:
③The final regresion convolution (RConv):
④Loss function:
where is ground truth
⑤Algorithm:

2.6. Experiments
2.6.1. Datasdets
①Statistics of datasets:

②NYCTaxi: includes 266 virtual stations and 4,368 timesteps (each timestep is half-hour)
③CHBike: includes 250 sites and 4,368 timesteps (30 mins as well)
2.6.2. Baselines
①GNN based baselines: DCRNN, STGCN, GWN, AGCRN, STGNCDE, DGCRN
②Attention based model: ASTGCN, GMAN, ASTGNN
③LLMs: OFA, GATGPT, GCNGPT, LLAMA2
2.6.3. Implementations
①Data split: 6:2:2
②Historical and future timesteps:
③
④Learning rate: 0.001 and Ranger21 optimizer for LLM and 0.001 and Adam for GCN and attention based
⑤LLM: GPT2 and LLAMA2 7B
⑥Layer: 6 for GPT2 and 8 for LLAMA2
⑦Epoch: 100
⑧Batch size: 64
2.6.4. Evaluation Metrics
①Metrics: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE)
2.6.5. Main Results
①Performance table:

2.6.6. Performance of ST-LLM and Ablation Studies
①Module ablation:

②Frozen ablation:

2.6.7. Parameter Analysis
①Hyperparameter ablation:

2.6.8. Inference Time Analysis
①Inference time table:

2.6.9. Few-Shot Prediction
①10% samples few-shot learning:

2.6.10. Zero-Shot Prediction
①Performance:

2.7. Conclusion
~
3. Reference
@inproceedings{liu2024spatial,
title={Spatial-Temporal Large Language Model for Traffic Prediction},
author={Liu, Chenxi and Yang, Sun and Xu, Qianxiong and Li, Zhishuai and Long, Cheng and Li, Ziyue and Zhao, Rui},
booktitle={MDM},
year={2024}
}