Conformal bootstrap prediction intervals through time series cross-validation forecasting

Compute prediction intervals by applying the conformal bootstrap method to subsets of time series data using a rolling forecast origin.

Usage

cb_cvforecast(
  object,
  data,
  yvar,
  neighbour = 0,
  predictor.vars,
  h = 1,
  ncal = 100,
  num.futures = 1000,
  level = c(80, 95),
  forward = TRUE,
  initial = 1,
  window = NULL,
  roll.length = 1,
  exclude.trunc = NULL,
  recursive = FALSE,
  recursive_colNames = NULL,
  na.rm = TRUE,
  nacheck_frac_numerator = 2,
  nacheck_frac_denominator = 3,
  verbose = list(solver = FALSE, progress = FALSE),
  ...
)

Arguments

object

Fitted model object of class smimodel, backward, gaimFit or pprFit.

data

Data set. Must be a data set of class tsibble.(Make sure there are no additional date or time related variables except for the index of the tsibble). If multiple models are fitted, the grouping variable should be the key of the tsibble. If a key is not specified, a dummy key with only one level will be created.

yvar

Name of the response variable as a character string.

neighbour

If multiple models are fitted: Number of neighbours of each key (i.e. grouping variable) to be considered in model fitting to handle smoothing over the key. Should be an integer. If neighbour = x, x number of keys before the key of interest and x number of keys after the key of interest are grouped together for model fitting. The default is neighbour = 0 (i.e. no neighbours are considered for model fitting).

predictor.vars

A character vector of names of the predictor variables.

h

Forecast horizon.

ncal

Length of a calibration window.

num.futures

Number of possible future sample paths to be generated in bootstrap.

level

Confidence level for prediction intervals.

forward

If TRUE, the final forecast origin for forecasting is \(y_T\). Otherwise, the final forecast origin is \(y_{T-1}\).

initial

Initial period of the time series where no cross-validation forecasting is performed.

window

Length of the rolling window. If NULL, a rolling window will not be used.

roll.length

Number of observations by which each rolling/expanding window should be rolled forward.

exclude.trunc

The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.)

recursive

Whether to obtain recursive forecasts or not (default - FALSE).

recursive_colNames

If recursive = TRUE, a character vector giving the names of the columns in test data to be filled with forecasts. Recursive/autoregressive forecasting is required when the lags of the response variable itself are used as predictor variables into the model. Make sure such lagged variables are positioned together in increasing lag order (i.e. lag_1, lag_2, ..., lag_m, lag_m = maximum lag used) in data, with no break in the lagged variable sequence even if some of the intermediate lags are not used as predictors.

na.rm

logical; if TRUE (default), any NA and NaN's are removed from the sample before the quantiles are computed.

nacheck_frac_numerator

Numerator of the fraction of non-missing values that is required in a test set.

nacheck_frac_denominator

Denominator of the fraction of non-missing values that is required in a test set.

verbose

A named list controlling verbosity options. Defaults to list(solver = FALSE, progress = FALSE).

solver: Logical. If TRUE, prints detailed solver output when the SMI model is used.
progress: Logical. If TRUE, prints cross-validation progress messages (all models) and optimisation algorithm progress messages (SMI model only).

...

Other arguments not currently used.

Value

An object of class cb_cvforecast, which is a list that contains following elements:

x: The original time series.
method: A character string "cb_cvforecast".
fit_times: The number of times the model is fitted in cross-validation.
mean: Point forecasts as a multivariate time series, where the \(h^{th}\) column holds the point forecasts for forecast horizon \(h\). The time index corresponds to the period for which the forecast is produced.
error: Forecast errors given by \(e_{t+h|t} = y_{t+h} - \hat{y}_{t+h|t}\).
res: The matrix of in-sample residuals produced in cross-validation.
level: The confidence levels associated with the prediction intervals.
cal_times: The number of calibration windows considered in cross-validation.
num_cal: The number of non-missing multi-step forecast errors in each calibration window.
skip_cal: An indicator vector indicating whether a calibration window is skipped without constructing prediction intervals due to missing model or missing data in the test set.
lower: A list containing lower bounds for prediction intervals for each level. Each element within the list will be a multivariate time series with the same dimensional characteristics as mean.
upper: A list containing upper bounds for prediction intervals for each level. Each element within the list will be a multivariate time series with the same dimensional characteristics as mean.
possible_futures: A list of matrices containing future sample paths generated at each calibration step.

Examples

# \donttest{
if(requireNamespace("gurobi", quietly = TRUE)){
  library(dplyr)
  library(ROI)
  library(tibble)
  library(tidyr)
  library(tsibble)

  # Simulate data
  n = 1105
  set.seed(123)
  sim_data <- tibble(x_lag_000 = runif(n)) |>
    mutate(
      # Add x_lags
      x_lag = lag_matrix(x_lag_000, 5)) |>
    unpack(x_lag, names_sep = "_") |>
    mutate(
      # Response variable
      y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 +
      (0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1),
      # Add an index to the data set
      inddd = seq(1, n)) |>
    drop_na() |>
    select(inddd, y, starts_with("x_lag")) |>
    # Make the data set a `tsibble`
    as_tsibble(index = inddd)

  # Index variables
  index.vars <- colnames(sim_data)[3:8]

  # Training set
  sim_train <- sim_data[1:1000, ]
  # Test set
  sim_test <- sim_data[1001:1100, ]

  # Model fitting
  smimodel_ppr <- model_smimodel(data = sim_train,
                                yvar = "y",
                                index.vars = index.vars,
                                initialise = "ppr")

  # Conformal bootstrap prediction intervals (3-steps-ahead interval forecasts)
  set.seed(12345)
  smimodel_ppr_cb <- cb_cvforecast(object = smimodel_ppr,
                                  data = sim_data,
                                  yvar = "y",
                                  predictor.vars = index.vars,
                                  h = 3,
                                  ncal = 30,
                                  num.futures = 100,
                                  window = 1000)
 }
# }

Conformal bootstrap prediction intervals through time series cross-validation forecasting

Usage

Arguments

Value

See also

Examples