
Conformal bootstrap prediction intervals through time series cross-validation forecasting
Source:R/prediction_intervals.R
cb_cvforecast.RdCompute prediction intervals by applying the conformal bootstrap method to subsets of time series data using a rolling forecast origin.
Usage
cb_cvforecast(
object,
data,
yvar,
neighbour = 0,
predictor.vars,
h = 1,
ncal = 100,
num.futures = 1000,
level = c(80, 95),
forward = TRUE,
initial = 1,
window = NULL,
roll.length = 1,
exclude.trunc = NULL,
recursive = FALSE,
recursive_colNames = NULL,
na.rm = TRUE,
nacheck_frac_numerator = 2,
nacheck_frac_denominator = 3,
verbose = list(solver = FALSE, progress = FALSE),
...
)Arguments
- object
Fitted model object of class
smimodel,backward,gaimFitorpprFit.- data
Data set. Must be a data set of class
tsibble.(Make sure there are no additional date or time related variables except for theindexof thetsibble). If multiple models are fitted, the grouping variable should be thekeyof thetsibble. If akeyis not specified, a dummy key with only one level will be created.- yvar
Name of the response variable as a character string.
- neighbour
If multiple models are fitted: Number of neighbours of each key (i.e. grouping variable) to be considered in model fitting to handle smoothing over the key. Should be an
integer. Ifneighbour = x,xnumber of keys before the key of interest andxnumber of keys after the key of interest are grouped together for model fitting. The default isneighbour = 0(i.e. no neighbours are considered for model fitting).- predictor.vars
A character vector of names of the predictor variables.
- h
Forecast horizon.
- ncal
Length of a calibration window.
- num.futures
Number of possible future sample paths to be generated in bootstrap.
- level
Confidence level for prediction intervals.
- forward
If
TRUE, the final forecast origin for forecasting is \(y_T\). Otherwise, the final forecast origin is \(y_{T-1}\).- initial
Initial period of the time series where no cross-validation forecasting is performed.
- window
Length of the rolling window. If
NULL, a rolling window will not be used.- roll.length
Number of observations by which each rolling/expanding window should be rolled forward.
- exclude.trunc
The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.)
- recursive
Whether to obtain recursive forecasts or not (default -
FALSE).- recursive_colNames
If
recursive = TRUE, a character vector giving the names of the columns in test data to be filled with forecasts. Recursive/autoregressive forecasting is required when the lags of the response variable itself are used as predictor variables into the model. Make sure such lagged variables are positioned together in increasing lag order (i.e.lag_1, lag_2, ..., lag_m,lag_m =maximum lag used) indata, with no break in the lagged variable sequence even if some of the intermediate lags are not used as predictors.- na.rm
logical; if
TRUE(default), anyNAandNaN's are removed from the sample before the quantiles are computed.- nacheck_frac_numerator
Numerator of the fraction of non-missing values that is required in a test set.
- nacheck_frac_denominator
Denominator of the fraction of non-missing values that is required in a test set.
- verbose
A named list controlling verbosity options. Defaults to
list(solver = FALSE, progress = FALSE).- solver
Logical. If TRUE, prints detailed solver output when the SMI model is used.
- progress
Logical. If TRUE, prints cross-validation progress messages (all models) and optimisation algorithm progress messages (SMI model only).
- ...
Other arguments not currently used.
Value
An object of class cb_cvforecast, which is a list that
contains following elements:
- x
The original time series.
- method
A character string "cb_cvforecast".
- fit_times
The number of times the model is fitted in cross-validation.
- mean
Point forecasts as a multivariate time series, where the \(h^{th}\) column holds the point forecasts for forecast horizon \(h\). The time index corresponds to the period for which the forecast is produced.
- error
Forecast errors given by \(e_{t+h|t} = y_{t+h} - \hat{y}_{t+h|t}\).
- res
The matrix of in-sample residuals produced in cross-validation.
- level
The confidence levels associated with the prediction intervals.
- cal_times
The number of calibration windows considered in cross-validation.
- num_cal
The number of non-missing multi-step forecast errors in each calibration window.
- skip_cal
An indicator vector indicating whether a calibration window is skipped without constructing prediction intervals due to missing model or missing data in the test set.
- lower
A list containing lower bounds for prediction intervals for each level. Each element within the list will be a multivariate time series with the same dimensional characteristics as
mean.- upper
A list containing upper bounds for prediction intervals for each level. Each element within the list will be a multivariate time series with the same dimensional characteristics as
mean.- possible_futures
A list of matrices containing future sample paths generated at each calibration step.
Examples
# \donttest{
if(requireNamespace("gurobi", quietly = TRUE)){
library(dplyr)
library(ROI)
library(tibble)
library(tidyr)
library(tsibble)
# Simulate data
n = 1105
set.seed(123)
sim_data <- tibble(x_lag_000 = runif(n)) |>
mutate(
# Add x_lags
x_lag = lag_matrix(x_lag_000, 5)) |>
unpack(x_lag, names_sep = "_") |>
mutate(
# Response variable
y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 +
(0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1),
# Add an index to the data set
inddd = seq(1, n)) |>
drop_na() |>
select(inddd, y, starts_with("x_lag")) |>
# Make the data set a `tsibble`
as_tsibble(index = inddd)
# Index variables
index.vars <- colnames(sim_data)[3:8]
# Training set
sim_train <- sim_data[1:1000, ]
# Test set
sim_test <- sim_data[1001:1100, ]
# Model fitting
smimodel_ppr <- model_smimodel(data = sim_train,
yvar = "y",
index.vars = index.vars,
initialise = "ppr")
# Conformal bootstrap prediction intervals (3-steps-ahead interval forecasts)
set.seed(12345)
smimodel_ppr_cb <- cb_cvforecast(object = smimodel_ppr,
data = sim_data,
yvar = "y",
predictor.vars = index.vars,
h = 3,
ncal = 30,
num.futures = 100,
window = 1000)
}
# }