工具系列：TimeGPT_(4)预测区间数据

文章目录

- 预测区间
- 历史预测

预测区间

预测区间提供了对预测值的不确定性的度量。在时间序列预测中，预测区间根据您设置的置信水平或不确定性，给出了一个估计的范围，未来观测值将在其中。这种不确定性水平对于做出明智决策、风险评估和规划至关重要。

例如，95%的预测区间意味着在100次中，有95次实际未来值将落在估计范围内。因此，较宽的区间表示对预测的不确定性更大，而较窄的区间则表示更高的置信度。

在使用TimeGPT进行时间序列预测时，您可以根据需求设置预测区间的水平。TimeGPT使用符合性预测来校准这些区间。

python 复制代码

# Importing the necessary module
from nixtlats.utils import colab_badge
colab_badge('docs/tutorials/4_prediction_intervals')

python 复制代码

#| hide
from itertools import product

from fastcore.test import test_eq, test_fail, test_warns
from dotenv import load_dotenv

python 复制代码

# 加载环境变量
load_dotenv()

复制代码

True

python 复制代码

import pandas as pd
from nixtlats import TimeGPT

复制代码

/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

python 复制代码

# 定义TimeGPT对象，并传入token参数，该参数默认为os.environ.get("TIMEGPT_TOKEN")，也可以手动提供一个token
timegpt = TimeGPT(
    token = 'my_token_provided_by_nixtla'
)

python 复制代码

# 创建一个TimeGPT对象，用于生成时间相关的文本
timegpt = TimeGPT()

使用TimeGPT进行时间序列预测时，您可以根据您的需求设置预测区间的级别（或级别）。以下是您可以执行此操作的方法：

python 复制代码

# 从指定的URL读取CSV文件，并将其存储在DataFrame中
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')

# 显示DataFrame的前几行数据
df.head()

| | timestamp | value |
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |

4	1949-05-01	121

python 复制代码

# 导入所需模块和函数

# 使用timegpt模型对数据进行预测
# 参数说明：
# - df: 输入的数据框，包含时间戳和目标值
# - h: 预测的时间步长，这里设置为12
# - level: 预测的置信水平，这里设置为[80, 90, 99.7]
# - time_col: 时间戳列的名称，这里设置为'timestamp'
# - target_col: 目标值列的名称，这里设置为'value'
# 返回值为预测结果的数据框
timegpt_fcst_pred_int_df = timegpt.forecast(
    df=df, h=12, level=[80, 90, 99.7], 
    time_col='timestamp', target_col='value',
)

# 打印预测结果的前几行
timegpt_fcst_pred_int_df.head()

复制代码

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

| | timestamp | TimeGPT | TimeGPT-lo-99.7 | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99.7 |
| 0 | 1961-01-01 | 437.837921 | 415.826453 | 423.783707 | 431.987061 | 443.688782 | 451.892136 | 459.849389 |
| 1 | 1961-02-01 | 426.062714 | 402.833523 | 407.694061 | 412.704926 | 439.420502 | 444.431366 | 449.291904 |
| 2 | 1961-03-01 | 463.116547 | 423.434062 | 430.316862 | 437.412534 | 488.820560 | 495.916231 | 502.799032 |
| 3 | 1961-04-01 | 478.244507 | 444.885193 | 446.776764 | 448.726837 | 507.762177 | 509.712250 | 511.603821 |

4	1961-05-01	505.646484	465.736694	471.976787	478.409872	532.883096	539.316182	545.556275

python 复制代码

# 使用timegpt模型对数据进行预测
# 预测6个时间步长的数据
# 预测置信度分别为80%, 90%, 99.7%
# 时间列为'timestamp'，目标列为'value'
level_short_horizon_df = timegpt.forecast(
    df=df, h=6, level=[80, 90, 99.7], 
    time_col='timestamp', target_col='value',
)

# 检查预测结果的形状是否为(6, 8)
test_eq(
    level_short_horizon_df.shape,
    (6, 8)
)

复制代码

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

python 复制代码

# 定义一个列表test_level，包含两个元素80和90.5
test_level = [80, 90.5]

# 调用timegpt模块的forecast函数，对数据框df进行预测
# 预测的时间步长为12，置信水平为80和90.5
# 时间列为'timestamp'，目标列为'value'
cols_fcst_df = timegpt.forecast(
    df=df, h=12, level=[80, 90.5], 
    time_col='timestamp', target_col='value',
).columns

# 使用assert语句进行断言，判断是否满足条件
# 条件为所有的字符串'TimeGPT-{pos}-{lv}'都在cols_fcst_df中
# pos取值为'lo'和'hi'，lv取值为test_level中的元素
assert all(f'TimeGPT-{pos}-{lv}' for pos, lv in product(test_level, ['lo', 'hi']) )

复制代码

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Restricting input...
INFO:nixtlats.timegpt:Calling Forecast Endpoint...

python 复制代码

# 导入timegpt模块中的plot函数

# 调用plot函数，传入以下参数：
# - df: 数据框，包含时间戳和值的列
# - timegpt_fcst_pred_int_df: 数据框，包含时间戳、预测值和置信区间的列
# - time_col: 时间戳列的名称
# - target_col: 值列的名称
# - level: 置信区间的水平，以列表形式提供，例如[80, 90]表示80%和90%的置信区间
timegpt.plot(
    df, timegpt_fcst_pred_int_df, 
    time_col='timestamp', target_col='value',
    level=[80, 90],
)

请注意，预测区间水平的选择取决于您的具体用例。对于高风险预测，您可能希望选择更宽的区间以考虑更多的不确定性。对于不太关键的预测，较窄的区间可能是可以接受的。

历史预测

您还可以通过添加add_history=True参数来计算历史预测的预测区间。

python 复制代码

# 使用TimeGPT进行预测
# df: 输入的数据框，包含时间戳和目标值
# h: 预测的时间步长
# level: 置信水平，用于计算预测区间
# time_col: 时间戳列的名称
# target_col: 目标值列的名称
# add_history: 是否在预测结果中添加历史数据
timegpt_fcst_pred_int_historical_df = timegpt.forecast(
    df=df, h=12, level=[80, 90], 
    time_col='timestamp', target_col='value',
    add_history=True,
)

# 显示预测结果的前几行
timegpt_fcst_pred_int_historical_df.head()

复制代码

INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: MS
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
INFO:nixtlats.timegpt:Calling Historical Forecast Endpoint...

| | timestamp | TimeGPT | TimeGPT-lo-80 | TimeGPT-lo-90 | TimeGPT-hi-80 | TimeGPT-hi-90 |
| 0 | 1951-01-01 | 135.483673 | 111.937767 | 105.262830 | 159.029579 | 165.704516 |
| 1 | 1951-02-01 | 144.442413 | 120.896508 | 114.221571 | 167.988319 | 174.663256 |
| 2 | 1951-03-01 | 157.191910 | 133.646004 | 126.971067 | 180.737815 | 187.412752 |
| 3 | 1951-04-01 | 148.769379 | 125.223473 | 118.548536 | 172.315284 | 178.990221 |

4	1951-05-01	140.472946	116.927041	110.252104	164.018852	170.693789

python 复制代码

# 绘制时间序列图
# 参数：
# df：原始数据集
# timegpt_fcst_pred_int_historical_df：时间序列预测结果的置信区间数据集
# time_col：时间列的列名
# target_col：目标列的列名
# level：置信区间的水平，可以是单个值或列表形式，表示置信区间的百分比
timegpt.plot(
    df, timegpt_fcst_pred_int_historical_df, 
    time_col='timestamp', target_col='value',
    level=[80, 90],
)