TIMEGPT时序大模型介绍

TimeGPT: A Comprehensive Guide to Time Series Forecasting

Introduction to TimeGPT

TimeGPT, developed by Nixtla, is a generative pre-trained Transformer model specialized for forecasting tasks. It was trained on the largest dataset in history---over 100 billion rows of financial, weather, energy, and web data---democratizing time series analysis capabilities. This tool can identify patterns and predict future data points in seconds.

Principle Introduction

TimeGPT is the first time series foundation model capable of zero-shot inference. The general idea is to train the model on vast datasets from various domains (pre-trained model) and then generate zero-shot inferences on unseen data.

This approach relies on transfer learning, where the model leverages knowledge acquired during training to solve new tasks. This method is effective only when the model is sufficiently large and trained on extensive data.

To this end, the authors trained TimeGPT on over 100 billion data points from open-source time series data. The dataset spans diverse domains, including finance, economics, weather, web traffic, energy, and sales. The authors did not disclose the specific sources of the public data used to manage these 100 billion data points.

This diversity is crucial for the success of the foundation model, as it enables learning different temporal patterns, thus improving generalization. For example, weather data may exhibit daily seasonality (hotter during the day than at night) and yearly seasonality, while traffic data may show daily seasonality (more cars on the road during the day than at night) and weekly seasonality (more cars on weekdays than weekends).

To ensure model robustness and generalization, preprocessing was kept minimal. Only missing values were imputed, with the rest retained in their original form. Although the authors did not specify the data input method, interpolation techniques such as linear, spline, or moving average are likely used.

The model was trained over multiple days, during which hyperparameters and learning rates were optimized. While the exact duration and GPU resources were not disclosed, we know the model was implemented in PyTorch, using the Adam optimizer and a learning rate decay strategy.

TimeGPT utilizes a Transformer architecture, specifically a full encoder-decoder structure. Inputs can include historical data windows and exogenous data windows, such as on-time events or other series. The encoder's attention mechanism learns different attributes from the input, which are then fed into the decoder to generate predictions until the user-specified forecast horizon is reached.

Notably, the authors implemented conformal prediction in TimeGPT, allowing the model to estimate prediction intervals based on historical errors.

TimeGPT Features

  • Pre-trained Model: Generates predictions without specific training, though fine-tuning is possible.
  • Exogenous Variables: Supports multivariate forecasting tasks by incorporating external variables.
  • Conformal Prediction: Estimates prediction intervals, enabling anomaly detection (e.g., flagging data points outside a 99% confidence interval as anomalies).

All these tasks can be achieved through zero-shot inference or minimal fine-tuning, marking a paradigm shift in time series forecasting.

Currently, TimeGPT is accessible only via API in closed beta. As mentioned, the model was trained on 100 billion data points from publicly available data. Since the authors did not specify the datasets used, testing on known benchmark datasets (e.g., ETT or weather) is unreasonable, as the model may have seen these during training.

Preparation

Install Required Packages

Install the nixtla package (note: nixtlats is deprecated):

bash 复制代码
!pip install nixtla

Output:

复制代码
Collecting nixtlats
  Downloading nixtlats-0.5.2-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: httpx in /usr/local/lib/python3.11/dist-packages (from nixtlats) (0.28.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.2.2)
Requirement already satisfied: pydantic in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.11.4)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.32.3)
Requirement already satisfied: tenacity in /usr/local/lib/python3.11/dist-packages (from nixtlats) (9.1.2)
Collecting utilsforecast>=0.1.7 (from nixtlats)
  Downloading utilsforecast-0.2.12-py3-none-any.whl.metadata (7.6 kB)
...
Installing collected packages: utilsforecast, nixtlats
Successfully installed nixtlats-0.5.2 utilsforecast-0.2.12

Obtain API Token

Register at Nixtla's dashboard using an institutional email to obtain an API token. The free tier allows 1,000 API calls per month.

Usage Tutorial

Univariate Forecasting

Load and Initialize
python 复制代码
from nixtlats import TimeGPT
import pandas as pd

# Create TimeGPT object with token
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')

Note : The nixtlats package and TimeGPT class are deprecated; use nixtla and NixtlaClient instead.

Load Dataset
python 复制代码
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
df = df.sort_values(['unique_id', 'ds'])
df.head()

Output:

复制代码
   unique_id                  ds      y
0        BE 2016-10-22 00:00:00  70.00
1        BE 2016-10-22 01:00:00  37.10
2        BE 2016-10-22 02:00:00  37.10
3        BE 2016-10-22 03:00:00  44.75
4        BE 2016-10-22 04:00:00  37.10
Visualize Data
python 复制代码
timegpt.plot(df, time_col='ds', target_col='y')

This generates a plot of the time series data.

Forecast with Confidence Intervals
python 复制代码
fcst_df = timegpt.forecast(df, h=24, level=[80, 90])
fcst_df.head()

Output:

复制代码
   unique_id                  ds   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0        BE 2016-12-31 00:00:00  45.190582      33.011285      35.508618      54.872547      57.369880
1        BE 2016-12-31 01:00:00  43.244987      30.388532      35.376340      51.113635      56.101443
2        BE 2016-12-31 02:00:00  41.958897      29.285654      35.340688      48.577106      54.632139
3        BE 2016-12-31 03:00:00  39.796680      29.909487      32.327371      47.265990      49.683874
4        BE 2016-12-31 04:00:00  39.204865      30.731904      30.998638      47.411091      47.677825
Forecast with Different Parameters
python 复制代码
timegpt_fcst_pred_int_df = timegpt.forecast(
    df=df, h=12, level=[80, 90, 99.7],
    time_col='timestamp', target_col='value',
)
print(timegpt_fcst_pred_int_df.shape)
timegpt_fcst_pred_int_df.head()

Output:

复制代码
(48, 9)
   unique_id           timestamp   TimeGPT  TimeGPT-lo-99.7  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90  TimeGPT-hi-99.7
0        BE 2016-12-31 00:00:00  45.190453       28.008072      33.011395      35.508424      54.872481      57.369510       62.372833
1        BE 2016-12-31 01:00:00  43.244446       27.750938      30.387266      35.374624      51.114267      56.101625       58.737954
2        BE 2016-12-31 02:00:00  41.958389       25.092357      29.283794      35.340795      48.575984      54.632985       58.824421
3        BE 2016-12-31 03:00:00  39.796486       26.072040      29.910928      32.326250      47.266722      49.682044       53.520932
4        BE 2016-12-31 04:00:00  39.204536       18.367774      30.731239      30.998955      47.410118      47.677833       60.041299
Plot Forecasts
python 复制代码
timegpt.plot(fcst_df, time_col='ds', target_col='TimeGPT')
timegpt.plot(df, fcst_df, level=[80, 90], max_insample_length=24 * 5)
Air Passengers Dataset
python 复制代码
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()

Output:

复制代码
   timestamp  value
0 1949-01-01    112
1 1949-02-01    118
2 1949-03-01    132
3 1949-04-01    129
4 1949-05-01    121
timegpt.plot(df, time_col='timestamp', target_col='value')
Forecast Air Passengers
python 复制代码
timegpt_fcst_df = timegpt.forecast(df=df, h=12, freq='MS', time_col='timestamp', target_col='value')
timegpt_fcst_df.head()

Output:

复制代码
   timestamp   TimeGPT
0 1961-01-01  437.837921
1 1961-02-01  426.062744
2 1961-03-01  463.116547
3 1961-04-01  478.244507
4 1961-05-01  505.646484
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
Long-Term Forecasting
python 复制代码
timegpt_fcst_df = timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value', freq='MS')
timegpt_fcst_df.head()

Output:

复制代码
WARNING:nixtlats.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
   timestamp   TimeGPT
0 1961-01-01  437.837921
1 1961-02-01  426.062744
2 1961-03-01  463.116547
3 1961-04-01  478.244507
4 1961-05-01  505.646484
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
Short-Term Forecasting
python 复制代码
timegpt_fcst_df = timegpt.forecast(df=df, h=6, time_col='timestamp', target_col='value', freq='MS')
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
Setting Frequency

The freq parameter is critical, indicating the time unit between consecutive data points. Ensure the DataFrame has a DateTime index with the appropriate frequency:

python 复制代码
df_time_index = df.set_index('timestamp')
df_time_index.index = pd.DatetimeIndex(df_time_index.index, freq='MS')
timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value').head()

Output:

复制代码
WARNING:nixtlats.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
   timestamp   TimeGPT
0 1961-01-01  437.837921
1 1961-02-01  426.062744
2 1961-03-01  463.116547
3 1961-04-01  478.244507
4 1961-05-01  505.646484
Validate Token
python 复制代码
timegpt.validate_token()

Output:

复制代码
True

Anomaly Detection

Anomaly detection in time series data is critical in fields like finance, healthcare, security, and infrastructure. TimeGPT's detect_anomalies method automatically identifies anomalies by evaluating each observation's context within the time series, using a 99% prediction interval by default. Observations outside this interval are flagged as anomalies (labeled as 1 in the anomaly column).

python 复制代码
import pandas as pd
from nixtlats import TimeGPT

pm_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D')
timegpt_anomalies_df.head()

Output:

复制代码
   timestamp  anomaly  TimeGPT-lo-99  TimeGPT  TimeGPT-hi-99
0 2008-01-10        0       6.936009  8.224194       9.512378
1 2008-01-11        0       6.863336  8.151521       9.439705
2 2008-01-12        0       6.839064  8.127249       9.415433
3 2008-01-13        0       7.629072  8.917256      10.205441
4 2008-01-14        0       7.714111  9.002295      10.290480
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')

Adjusting Anomaly Detection Threshold

Adjust the level parameter to modify the prediction interval:

python 复制代码
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=90)
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')

A higher level (e.g., 99.99) widens the interval, detecting fewer anomalies:

python 复制代码
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=99.99)
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')

Including Date Features

Incorporate date features for better anomaly detection:

python 复制代码
from nixtlats.date_features import CountryHolidays

timegpt_anomalies_df_x = timegpt.detect_anomalies(
    pm_df, time_col='timestamp', target_col='value', freq='D', date_features=True, level=99.99,
)
timegpt.plot(pm_df, timegpt_anomalies_df_x, time_col='timestamp', target_col='value')

Forecasting with Exogenous Variables

Exogenous variables provide additional context that can improve predictions. For example, temperature data can enhance ice cream sales forecasts. Add exogenous variables as columns after the target column.

Example: Electricity Price Forecasting

python 复制代码
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()

Output:

复制代码
   unique_id                  ds      y  Exogenous1  Exogenous2  day_0  day_1  day_2  day_3  day_4  day_5  day_6
0        BE 2016-10-22 00:00:00  70.00     57253.0     49593.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
1        BE 2016-10-22 01:00:00  37.10     51887.0     46073.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
2        BE 2016-10-22 02:00:00  37.10     51896.0     44927.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
3        BE 2016-10-22 03:00:00  44.75     48428.0     44483.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
4        BE 2016-10-22 04:00:00  37.10     46721.0     44338.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0

Load future exogenous variables:

python 复制代码
future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
print(future_ex_vars_df.shape)
future_ex_vars_df.head()

Output:

复制代码
(96, 11)
   unique_id                  ds  Exogenous1  Exogenous2  day_0  day_1  day_2  day_3  day_4  day_5  day_6
0        BE 2016-12-31 00:00:00     70318.0     64108.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
1        BE 2016-12-31 01:00:00     67898.0     62492.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
2        BE 2016-12-31 02:00:00     68379.0     61571.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
3        BE 2016-12-31 03:00:00     64972.0     60381.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
4        BE 2016-12-31 04:00:00     62900.0     60298.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0

Forecast with exogenous variables:

python 复制代码
timegpt_fcst_ex_vars_df = timegpt.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df.head()

Output:

复制代码
   unique_id                  ds   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0        BE 2016-12-31 00:00:00  51.633533      37.170360      41.667443      61.599622      66.096706
1        BE 2016-12-31 01:00:00  45.751707      31.324216      36.895095      54.608318      60.179197
2        BE 2016-12-31 02:00:00  39.651087      26.457148      33.045684      46.256490      52.845026
3        BE 2016-12-31 03:00:00  34.000518      20.566910      23.985966      44.015071      47.434127
4        BE 2016-12-31 04:00:00  33.785968      18.989039      24.427422      43.144514      48.582897
timegpt.plot(
    df[['unique_id', 'ds', 'y']],
    timegpt_fcst_ex_vars_df,
    max_insample_length=365,
    level=[80, 90],
)

Feature Importance

python 复制代码
timegpt.weights_x.plot.barh(x='features', y='weights')

Adding Country Holidays

python 复制代码
from nixtlats.date_features import CountryHolidays

timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df, X_df=future_ex_vars_df, h=24, level=[80, 90],
    date_features=[CountryHolidays(['US'])]
)
timegpt.weights_x.plot.barh(x='features', y='weights')

Multivariate Forecasting

Install neuralforecast for comparison:

python 复制代码
!pip install neuralforecast

Perform rolling forecasts with TimeGPT:

python 复制代码
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')
timegpt_preds = []

for i in range(0, 162, 7):
    timegpt_preds_df = timegpt.forecast(
        df=df.iloc[:1213+i],
        X_df=future_exog[i:i+7],
        h=7,
        finetune_steps=50,
        id_col='unique_id',
        time_col='ds',
        target_col='y'
    )
    preds = timegpt_preds_df['TimeGPT']
    timegpt_preds.extend(preds)

len(timegpt_preds)

Output:

复制代码
168
test['TimeGPT'] = timegpt_preds
test.tail(100)

Output:

复制代码
      unique_id         ds     y  published  is_holiday     TimeGPT
1281         0 2023-07-05  1864        0.0          0  2479.581543
1282         0 2023-07-06  1706        0.0          0  2305.177979
1283         0 2023-07-07  1468        0.0          0  2074.042725
1284         0 2023-07-08   977        0.0          0   969.086182
1285         0 2023-07-09  1063        0.0          0  1131.523193
...        ...        ...   ...        ...        ...          ...
1376         0 2023-10-08   737        0.0          0   692.883728
1377         0 2023-10-09  1237        0.0          1  1092.072266
1378         0 2023-10-10  1755        1.0          0  1192.525146
1379         0 2023-10-11  3241        0.0          0   926.260010
1380         0 2023-10-12  2262        0.0          0   922.098145
test.to_csv('medium_views_test.csv', header=True, index=False)

Visualize Predictions

python 复制代码
import matplotlib.pyplot as plt

published_dates = test[test['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(test['ds'], test['y'])
ax.plot(test['ds'], test['TimeGPT'], label='TimeGPT')
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()

Comparison with Other Models

Compare TimeGPT with N-BEATS, N-HiTS, and PatchTST:

python 复制代码
from neuralforecast.models import NHITS, NBEATS, PatchTST
from neuralforecast import NeuralForecast

horizon = 7
models = [
    NHITS(h=horizon, input_size=5*horizon, max_steps=50),
    NBEATS(h=horizon, input_size=5*horizon, max_steps=50),
    PatchTST(h=horizon, input_size=5*horizon, max_steps=50)
]
nf = NeuralForecast(models=models, freq='D')
future_exog = test[['unique_id', 'published', 'is_holiday']]
preds_df = nf.cross_validation(df=df, static_df=future_exog, step_size=7, n_windows=24)
preds_df.head()

Output:

复制代码
   unique_id         ds      cutoff       NHITS      NBEATS    PatchTST     y
0         0 2023-04-28 2023-04-27  1571.078491  1479.276855  1484.261597  1470
1         0 2023-04-29 2023-04-27  1208.617920  1015.555298  1099.190552  1004
2         0 2023-04-30 2023-04-27  1308.625122  1204.944092  1200.549072  1051
3         0 2023-05-01 2023-04-27  1811.447754  1830.838379  1797.462524  1333
4         0 2023-05-02 2023-04-27  1952.458862  1857.008911  1853.445679  1778
preds_df['TimeGPT'] = test['TimeGPT']
preds_df.head()

Output:

复制代码
   unique_id         ds      cutoff       NHITS      NBEATS    PatchTST     y     TimeGPT
0         0 2023-04-28 2023-04-27  1571.078491  1479.276855  1484.261597  1470  1351.176636
1         0 2023-04-29 2023-04-27  1208.617920  1015.555298  1099.190552  1004   962.566650
2         0 2023-04-30 2023-04-27  1308.625122  1204.944092  1200.549072  1051  1148.006470
3         0 2023-05-01 2023-04-27  1811.447754  1830.838379  1797.462524  1333  1856.734009
4         0 2023-05-02 2023-04-27  1952.458862  1857.008911  1853.445679  1778  1914.413452

Visualize Model Comparison

python 复制代码
published_dates = test[test['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(preds_df['ds'], preds_df['y'], label='actual')
ax.plot(preds_df['ds'], preds_df['TimeGPT'], ls='--', label='TimeGPT')
ax.plot(preds_df['ds'], preds_df['NHITS'], ls=':', label='NHiTS')
ax.plot(preds_df['ds'], preds_df['NBEATS'], ls='-.', label='NBEATS')
ax.plot(preds_df['ds'], preds_df['PatchTST'], ls='-', label='PatchTST')
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()

Observation: N-HiTS predicts peaks not observed in reality, PatchTST often under-predicts, while TimeGPT closely aligns with actual data.

Evaluation

Evaluate model performance using Mean Absolute Error (MAE) and Mean Squared Error (MSE). Round predictions to integers for daily view counts:

python 复制代码
from neuralforecast.losses.numpy import mae, mse

preds_df = preds_df.round({
    'NHITS': 0, 'NBEATS': 0, 'PatchTST': 0, 'TimeGPT': 0
})
preds_df.head()

Output:

复制代码
   unique_id         ds      cutoff  NHITS  NBEATS  PatchTST     y  TimeGPT
0         0 2023-04-28 2023-04-27   1571    1479      1484  1470     1351
1         0 2023-04-29 2023-04-27   1209    1016      1099  1004      963
2         0 2023-04-30 2023-04-27   1309    1205      1201  1051     1148
3         0 2023-05-01 2023-04-27   1811    1831      1797  1333     1857
4         0 2023-05-02 2023-04-27   1952    1857      1853  1778     1914
data = {
    'N-HiTS': [mae(preds_df['NHITS'], preds_df['y']), mse(preds_df['NHITS'], preds_df['y'])],
    'N-BEATS': [mae(preds_df['NBEATS'], preds_df['y']), mse(preds_df['NBEATS'], preds_df['y'])],
    'PatchTST': [mae(preds_df['PatchTST'], preds_df['y']), mse(preds_df['PatchTST'], preds_df['y'])],
    'TimeGPT': [mae(preds_df['TimeGPT'], preds_df['y']), mse(preds_df['TimeGPT'], preds_df['y'])]
}
metrics_df = pd.DataFrame(data=data)
metrics_df.index = ['mae', 'mse']
metrics_df.style.highlight_min(color='lightgreen', axis=1)

Output:

复制代码
          N-HiTS     N-BEATS    PatchTST     TimeGPT
mae     300.125000  267.815476  269.113095  295.547619
mse  219075.267857 183030.005952 185470.077381 231426.928571

Load Forecasting

Case Study 1

python 复制代码
import warnings
from nixtlats import TimeGPT
import pandas as pd
warnings.filterwarnings('ignore')
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')

df = pd.read_csv('/content/drive/MyDrive/datasets/test2.csv')
df = df.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
df['ds'] = pd.to_datetime(df['date'] + ' ' + df['DATA_TIME'])
cols = ['unique_id', 'ds', 'y', 'DAY']
df = df[cols]
print(df.isnull().sum())
print(df.shape)
df.tail()

Output:

复制代码
unique_id    0
ds           0
y            0
DAY          0
dtype: int64
(33408, 4)
       unique_id                  ds       y  DAY
33403 |210000005481 2023-12-29 22:45:00   964.2  363
33404 |210000005481 2023-12-29 23:00:00  1036.8  363
33405 |210000005481 2023-12-29 23:15:00  1030.8  363
33406 |210000005481 2023-12-29 23:30:00  1049.4  363
33407 |210000005481 2023-12-29 23:45:00  1054.8  363
future_ex_vars_df_origin = pd.read_csv('/content/drive/MyDrive/datasets/210000005481.csv')
future_ex_vars_df = future_ex_vars_df_origin.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
future_ex_vars_df['ds'] = pd.to_datetime(future_ex_vars_df['date'] + ' ' + future_ex_vars_df['DATA_TIME'])
future_ex_vars_df = future_ex_vars_df[future_ex_vars_df['DAY'] == 364]
cols = ['unique_id', 'ds', 'DAY']
future_ex_vars_df = future_ex_vars_df[cols]
print(future_ex_vars_df.shape)
print(future_ex_vars_df.isnull().sum())
future_ex_vars_df.head()

Output:

复制代码
(96, 3)
unique_id    0
ds           0
DAY          0
dtype: int64
       unique_id                  ds  DAY
33408 |210000005481 2023-12-30 00:00:00  364
33409 |210000005481 2023-12-30 00:15:00  364
33410 |210000005481 2023-12-30 00:30:00  364
33411 |210000005481 2023-12-30 00:45:00  364
33412 |210000005481 2023-12-30 01:00:00  364
timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df,
    X_df=future_ex_vars_df,
    finetune_steps=100,
    h=96,
    time_col='ds',
    target_col='y',
    model='timegpt-1-long-horizon',
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.head()

Output:

复制代码
       unique_id                  ds     TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000005481 2023-12-30 00:00:00  1022.289181     741.859170     837.336947    1207.241414    1302.719192
1 |210000005481 2023-12-30 00:15:00   963.901790     446.810781     603.597085    1324.206496    1480.992800
2 |210000005481 2023-12-30 00:30:00   933.234249     400.233163     554.460030    1312.008468    1466.235335
3 |210000005481 2023-12-30 00:45:00   972.456844     521.245904     613.786275    1331.127413    1423.667785
4 |210000005481 2023-12-30 01:00:00   962.090755     505.215810     593.242837    1330.938674    1418.965700
timegpt.plot(
    df[['unique_id', 'ds', 'y']],
    timegpt_fcst_ex_vars_df,
    max_insample_length=365,
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.shape
timegpt_fcst_ex_vars_df['ds'] = pd.to_datetime(timegpt_fcst_ex_vars_df['ds'])
real_data = future_ex_vars_df_origin[future_ex_vars_df_origin['DAY'] == 364]
real_data['ds'] = pd.to_datetime(real_data['date'] + ' ' + real_data['DATA_TIME'])
real_data = real_data.rename(columns={'CONS_ID': 'unique_id'})
cols = ['unique_id', 'ds', 'LOAD']
real_data = real_data[cols]
print(real_data.shape)
real_data.head()
merge_data = pd.merge(real_data, timegpt_fcst_ex_vars_df, how='left', on=['unique_id', 'ds'])
merge_data.head()

Output:

复制代码
(96, 3)
       unique_id                  ds    LOAD     TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000005481 2023-12-30 00:00:00  1279.8  1022.289181     741.859170     837.336947    1207.241414    1302.719192
1 |210000005481 2023-12-30 00:15:00  1211.4   963.901790     446.810781     603.597085    1324.206496    1480.992800
2 |210000005481 2023-12-30 00:30:00  1412.4   933.234249     400.233163     554.460030    1312.008468    1466.235335
3 |210000005481 2023-12-30 00:45:00  1349.4   972.456844     521.245904     613.786275    1331.127413    1423.667785
4 |210000005481 2023-12-30 01:00:00   964.2   962.090755     505.215810     593.242837    1330.938674    1418.965700
import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(merge_data['ds'], merge_data['LOAD'], label='real_data', color='red', marker='o', linestyle='-')
ax.plot(merge_data['ds'], merge_data['TimeGPT'], label='predict_data', color='blue', marker='s', linestyle='--')
plt.show()

Case Study 2

python 复制代码
import warnings
from nixtlats import TimeGPT
import pandas as pd
warnings.filterwarnings('ignore')
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')

df = pd.read_csv('/content/drive/MyDrive/datasets/test.csv')
df = df.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
df['ds'] = pd.to_datetime(df['date'] + ' ' + df['DATA_TIME'])
cols = ['unique_id', 'ds', 'y', 'DAY']
df = df[cols]
print(df.isnull().sum())
df.head()

Output:

复制代码
unique_id    0
ds           0
y            0
DAY          0
dtype: int64
       unique_id                  ds      y  DAY
0 |210000003901 2023-01-16 00:00:00  68.19   16
1 |210000003901 2023-01-16 00:15:00  51.39   16
2 |210000003901 2023-01-16 00:30:00  47.24   16
3 |210000003901 2023-01-16 00:45:00  47.17   16
4 |210000003901 2023-01-16 01:00:00  49.31   16
future_ex_vars_df_origin = pd.read_csv('/content/drive/MyDrive/datasets/210000003901.csv')
future_ex_vars_df = future_ex_vars_df_origin.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
future_ex_vars_df['ds'] = pd.to_datetime(future_ex_vars_df['date'] + ' ' + future_ex_vars_df['DATA_TIME'])
future_ex_vars_df = future_ex_vars_df[future_ex_vars_df['DAY'] == 364]
cols = ['unique_id', 'ds', 'DAY']
future_ex_vars_df = future_ex_vars_df[cols]
print(future_ex_vars_df.shape)
print(future_ex_vars_df.isnull().sum())
future_ex_vars_df.head()

Output:

复制代码
(96, 3)
unique_id    0
ds           0
DAY          0
dtype: int64
       unique_id                  ds  DAY
33408 |210000003901 2023-12-30 00:00:00  364
33409 |210000003901 2023-12-30 00:15:00  364
33410 |210000003901 2023-12-30 00:30:00  364
33411 |210000003901 2023-12-30 00:45:00  364
33412 |210000003901 2023-12-30 01:00:00  364
timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df,
    X_df=future_ex_vars_df,
    finetune_steps=100,
    h=96,
    time_col='ds',
    target_col='y',
    model='timegpt-1-long-horizon',
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.head()

Output:

复制代码
       unique_id                  ds   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000003901 2023-12-30 00:00:00  75.555785      59.431775      66.383720      84.727850      91.679795
1 |210000003901 2023-12-30 00:15:00  66.986273      44.116121      52.079720      81.892827      89.856426
2 |210000003901 2023-12-30 00:30:00  61.534644      40.231499      46.371637      76.697651      82.837788
3 |210000003901 2023-12-30 00:45:00  55.279120      34.012068      40.644038      69.914202      76.546172
4 |210000003901 2023-12-30 01:00:00  52.130294      28.080968      37.955519      66.305068      76.179619
timegpt.plot(
    df[['unique_id', 'ds', 'y']],
    timegpt_fcst_ex_vars_df,
    max_insample_length=365,
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.shape
timegpt_fcst_ex_vars_df['ds'] = pd.to_datetime(timegpt_fcst_ex_vars_df['ds'])
real_data = future_ex_vars_df_origin[future_ex_vars_df_origin['DAY'] == 364]
real_data['ds'] = pd.to_datetime(real_data['date'] + ' ' + real_data['DATA_TIME'])
real_data = real_data.rename(columns={'CONS_ID': 'unique_id'})
cols = ['unique_id', 'ds', 'LOAD']
real_data = real_data[cols]
print(real_data.shape)
real_data.head()
merge_data = pd.merge(real_data, timegpt_fcst_ex_vars_df, how='left', on=['unique_id', 'ds'])
print(merge_data.isnull().sum())
merge_data.head()

Output:

复制代码
(96, 3)
unique_id        0
ds               0
LOAD             0
TimeGPT          0
TimeGPT-lo-90    0
TimeGPT-lo-80    0
TimeGPT-hi-80    0
TimeGPT-hi-90    0
dtype: int64
       unique_id                  ds   LOAD   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000003901 2023-12-30 00:00:00  87.44  75.555785      59.431775      66.383720      84.727850      91.679795
1 |210000003901 2023-12-30 00:15:00  53.65  66.986273      44.116121      52.079720      81.892827      89.856426
2 |210000003901 2023-12-30 00:30:00  52.12  61.534644      40.231499      46.371637      76.697651      82.837788
3 |210000003901 2023-12-30 00:45:00  52.81  55.279120      34.012068      40.644038      69.914202      76.546172
4 |210000003901 2023-12-30 01:00:00  52.56  52.130294      28.080968      37.955519      66.305068      76.179619
import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(merge_data['ds'], merge_data['LOAD'], label='real_data', color='red', marker='o', linestyle='-')
ax.plot(merge_data['ds'], merge_data['TimeGPT'], label='predict_data', color='blue', marker='s', linestyle='--')
plt.show()

References

* https://nixtla.dev/docs/intro

* https://blog.csdn.net/qq_53123875/article/details/136620743

* https://zhuanlan.zhihu.com/p/674814155

* https://github.com/marcopeix/time-series-analysis/blob/master/TimeGPT.ipynb

相关推荐
RestCloud1 天前
产品更新丨谷云科技 iPaaS 集成平台 V7.5 版本发布
数据仓库·系统安全·api·数字化转型·ipaas·数据集成平台·集成平台
伊织code3 天前
PyTorch API 5 - 全分片数据并行、流水线并行、概率分布
pytorch·python·ai·api·-·5
伊织code3 天前
PyTorch API 7 - TorchScript、hub、矩阵、打包、profile
人工智能·pytorch·python·ai·矩阵·api
伊织code4 天前
PyTorch API 6 - 编译、fft、fx、函数转换、调试、符号追踪
pytorch·python·ai·api·-·6
伊织code6 天前
PyTorch API 9 - masked, nested, 稀疏, 存储
pytorch·python·ai·api·-·9·masked
伊织code7 天前
PyTorch API 1 - 概述、数学运算、nn、实用工具、函数、张量
人工智能·pytorch·python·深度学习·ai·api
伊织code7 天前
PyTorch API 4 - 分布式通信、分布式张量
pytorch·python·ai·api·-·4·分布式通信、分布式张量
伊织code7 天前
PyTorch API 8 - 工具集、onnx、option、复数、DDP、量化、分布式 RPC、NeMo
pytorch·python·ai·api·-·8
伊织code8 天前
PyTorch API 10 - benchmark、data、批处理、命名张量
pytorch·python·ai·api·-·10