[MDM 2024]Spatial-Temporal Large Language Model for Traffic Prediction

论文网址:[2401.10134] Spatial-Temporal Large Language Model for Traffic Prediction

论文代码:GitHub - ChenxiLiu-HNU/ST-LLM: Official implementation of the paper "Spatial-Temporal Large Language Model for Traffic Prediction"

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

[1. 心得](#1. 心得)

[2. 论文逐段精读](#2. 论文逐段精读)

[2.1. Abstract](#2.1. Abstract)

[2.2. Introduction](#2.2. Introduction)

[2.3. Related Work](#2.3. Related Work)

[2.3.1. Large Language Models for Time Series Analysis](#2.3.1. Large Language Models for Time Series Analysis)

[2.3.2. Traffic Prediction](#2.3.2. Traffic Prediction)

[2.4. Problem Definition](#2.4. Problem Definition)

[2.5. Methodology](#2.5. Methodology)

[2.5.1. Overview](#2.5.1. Overview)

[2.5.2. Spatial-Temporal Embedding and Fusion](#2.5.2. Spatial-Temporal Embedding and Fusion)

[2.5.3. Partially Frozen Attention (PFA) LLM](#2.5.3. Partially Frozen Attention (PFA) LLM)

[2.6. Experiments](#2.6. Experiments)

[2.6.1. Datasdets](#2.6.1. Datasdets)

[2.6.2. Baselines](#2.6.2. Baselines)

[2.6.3. Implementations](#2.6.3. Implementations)

[2.6.4. Evaluation Metrics](#2.6.4. Evaluation Metrics)

[2.6.5. Main Results](#2.6.5. Main Results)

[2.6.6. Performance of ST-LLM and Ablation Studies](#2.6.6. Performance of ST-LLM and Ablation Studies)

[2.6.7. Parameter Analysis](#2.6.7. Parameter Analysis)

[2.6.8. Inference Time Analysis](#2.6.8. Inference Time Analysis)

[2.6.9. Few-Shot Prediction](#2.6.9. Few-Shot Prediction)

[2.6.10. Zero-Shot Prediction](#2.6.10. Zero-Shot Prediction)

[2.7. Conclusion](#2.7. Conclusion)

[3. Reference](#3. Reference)


1. 心得

(1)尽管几天后要投的论文还没开始写,仍然嚼嚼饼干写写阅读笔记。哎。这年头大家都跑得太快了

(2)比起数学,LLM适合配一杯奶茶读,全程轻松愉悦,这一篇就是分开三个卷积→合在一起→LLM(部分解冻一些模块)→over

2. 论文逐段精读

2.1. Abstract

①They proposed Spatial-Temporal Large Language Model (ST-LLM) to predict traffic(好像没什么特别的我就不写了,就是在介绍方法,说以前的精度不高。具体方法看以下图吧)

2.2. Introduction

①Traditional CNN and RNN cannot capture complex/long range spatial and temporal dependencies. GNNs are prone to overfitting, thus reseachers mainly use attention mechanism.

②Existing traffic prediction methods mainly focus on temporal feature rather than spatial

③For better long term prediction, they proposed partially frozen attention (PFA)

2.3.1. Large Language Models for Time Series Analysis

①Listing TEMPO-GPT, TIME-LLM, OFA, TEST, and LLM-TIME, which all utilize temporal feature only. However, GATGPT, which introduced spatial feature, ignores temporal dependencies.

imputation n.归责;归罪;归咎;归因

2.3.2. Traffic Prediction

①Filter is a common and classic method for processing traffic data

②Irrgular city net makes CNN hard to apply or extract spatial feature

2.4. Problem Definition

①Input traffic data: , where denotes timesteps, denotes numberof spatial stations, denotes feature

②Task: given historical traffic data of time steps only, learning a function with parameter to predict future timesteps: :

2.5. Methodology

2.5.1. Overview

①Overall framework of ST-LLM:

where Spatial-Temporal Embedding layer extracts timesteps , spatial embedding , and temporal embedding of historical timesteps. Then, they three are combined to . Freeze first layers and preserve last layers in PFA LLM and get output . Lastly, regresion convolution convert it to .

2.5.2. Spatial-Temporal Embedding and Fusion

①They get tokens by pointwise convolution:

②Applying linear layer to encode input to day and week :

where and are learnable parameter and the output is

③They extract spatial correlations by:

④Fusion convolution:

where

2.5.3. Partially Frozen Attention (PFA) LLM

①They freeze the first layers (including multihead attention and feed-forward layers) which contains important information:

where , , denotes learnable positional encoding, represents the intermediate representation of the -th layer after applying the frozen multi-head attention (MHA) and the first unfrozen layer normalization (LN), symbolizes the final representation after applying the unfrozen LN and frozen feed-forward network (FFN), and:

②Unfreezing the last layers:

③The final regresion convolution (RConv):

④Loss function:

where is ground truth

⑤Algorithm:

2.6. Experiments

2.6.1. Datasdets

①Statistics of datasets:

②NYCTaxi: includes 266 virtual stations and 4,368 timesteps (each timestep is half-hour)

③CHBike: includes 250 sites and 4,368 timesteps (30 mins as well)

2.6.2. Baselines

①GNN based baselines: DCRNN, STGCN, GWN, AGCRN, STGNCDE, DGCRN

②Attention based model: ASTGCN, GMAN, ASTGNN

③LLMs: OFA, GATGPT, GCNGPT, LLAMA2

2.6.3. Implementations

①Data split: 6:2:2

②Historical and future timesteps:

④Learning rate: 0.001 and Ranger21 optimizer for LLM and 0.001 and Adam for GCN and attention based

⑤LLM: GPT2 and LLAMA2 7B

⑥Layer: 6 for GPT2 and 8 for LLAMA2

⑦Epoch: 100

⑧Batch size: 64

2.6.4. Evaluation Metrics

①Metrics: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE)

2.6.5. Main Results

①Performance table:

2.6.6. Performance of ST-LLM and Ablation Studies

①Module ablation:

②Frozen ablation:

2.6.7. Parameter Analysis

①Hyperparameter ablation:

2.6.8. Inference Time Analysis

①Inference time table:

2.6.9. Few-Shot Prediction

①10% samples few-shot learning:

2.6.10. Zero-Shot Prediction

①Performance:

2.7. Conclusion

~

3. Reference

@inproceedings{liu2024spatial,

title={Spatial-Temporal Large Language Model for Traffic Prediction},

author={Liu, Chenxi and Yang, Sun and Xu, Qianxiong and Li, Zhishuai and Long, Cheng and Li, Ziyue and Zhao, Rui},

booktitle={MDM},

year={2024}

}

相关推荐
Blossom.1181 小时前
使用Python和Scikit-Learn实现机器学习模型调优
开发语言·人工智能·python·深度学习·目标检测·机器学习·scikit-learn
scdifsn2 小时前
动手学深度学习12.7. 参数服务器-笔记&练习(PyTorch)
pytorch·笔记·深度学习·分布式计算·数据并行·参数服务器
DFminer2 小时前
【LLM】fast-api 流式生成测试
人工智能·机器人
郄堃Deep Traffic3 小时前
机器学习+城市规划第十四期:利用半参数地理加权回归来实现区域带宽不同的规划任务
人工智能·机器学习·回归·城市规划
海盗儿3 小时前
Attention Is All You Need (Transformer) 以及Transformer pytorch实现
pytorch·深度学习·transformer
GIS小天3 小时前
AI+预测3D新模型百十个定位预测+胆码预测+去和尾2025年6月7日第101弹
人工智能·算法·机器学习·彩票
阿部多瑞 ABU4 小时前
主流大语言模型安全性测试(三):阿拉伯语越狱提示词下的表现与分析
人工智能·安全·ai·语言模型·安全性测试
cnbestec4 小时前
Xela矩阵三轴触觉传感器的工作原理解析与应用场景
人工智能·线性代数·触觉传感器
不爱写代码的玉子4 小时前
HALCON透视矩阵
人工智能·深度学习·线性代数·算法·计算机视觉·矩阵·c#
sbc-study4 小时前
PCDF (Progressive Continuous Discrimination Filter)模块构建
人工智能·深度学习·计算机视觉