Timeseries configuration¶
time_series_recipe
¶
Time-series lag-based recipe (Boolean)
Default value True
Enable time series lag-based recipe with lag transformers. If disabled, the same train-test gap and periods are used, but no lag transformers are enabled. If disabled, the set of feature transformations is quite limited without lag transformers, so consider setting enable_time_unaware_transformers to true in order to treat the problem as more like an IID type problem.
time_series_leaderboard_mode
¶
Control the automatic time-series leaderboard mode (String)
Default value 'diverse'
‘diverse’: explore a diverse set of models built using various expert settings. Note that it’s possible to rerun another such diverse leaderboard on top of the best-performing model(s), which will effectively help you compose these expert settings. ‘sliding_window’: If the forecast horizon is N periods, create a separate model for each of the (gap, horizon) pairs of (0,n), (n,n), (2*n,n), …, (2*N-1, n) in units of time periods. The number of periods to predict per model n is controlled by the expert setting ‘time_series_leaderboard_periods_per_model’, which defaults to 1.
time_series_leaderboard_periods_per_model
¶
Number of periods per model if time_series_leaderboard_mode is ‘sliding_window’. (Number)
Default value 1
Fine-control to limit the number of models built in the ‘sliding_window’ mode. Larger values lead to fewer models.
time_series_merge_splits
¶
Larger validation splits for lag-based recipe (Boolean)
Default value True
Whether to create larger validation splits that are not bound to the length of the forecast horizon.
merge_splits_max_valid_ratio
¶
Maximum ratio of training data samples used for validation (-1 = auto) (Float)
Default value -1.0
Maximum ratio of training data samples used for validation across splits when larger validation splits are created.
fixed_size_train_timespan
¶
Fixed-size train timespan across splits (Boolean)
Default value False
- Whether to keep a fixed-size train timespan across time-based splits.
That leads to roughly the same amount of train samples in every split.
time_series_validation_fold_split_datetime_boundaries
¶
Custom validation splits for time-series experiments (String)
Default value ''
Provide date or datetime timestamps (in same format as the time column) for custom training and validation splits like this: “tr_start1, tr_end1, va_start1, va_end1, …, tr_startN, tr_endN, va_startN, va_endN”
time_series_validation_splits
¶
Number of time-based splits for internal model validation (-1 = auto) (Number)
Default value -1
Set fixed number of time-based splits for internal model validation (actual number of splits allowed can be less and is determined at experiment run-time).
time_series_splits_max_overlap
¶
Maximum overlap between two time-based splits. (Float)
Default value 0.5
Maximum overlap between two time-based splits. Higher values increase the amount of possible splits.
holiday_features
¶
Generate holiday features (Boolean)
Default value True
Automatically generate is-holiday features from date columns
holiday_countries
¶
Country code(s) for holiday features (List)
Default value ['UnitedStates', 'UnitedKingdom', 'EuropeanCentralBank', 'Germany', 'Mexico', 'Japan']
List of countries for which to look up holiday calendar and to generate is-Holiday features for
sample_lag_sizes
¶
Whether to sample lag sizes (Boolean)
Default value False
If enabled, sample from a set of possible lag sizes (e.g., lags=[1, 4, 8]) for each lag-based transformer, to no more than max_sampled_lag_sizes lags. Can help reduce overall model complexity and size.
max_sampled_lag_sizes
¶
Number of sampled lag sizes. -1 for auto. (Number)
Default value -1
If sample_lag_sizes is enabled, sample from a set of possible lag sizes (e.g., lags=[1, 4, 8]) for each lag-based transformer, to no more than max_sampled_lag_sizes lags. Can help reduce overall model complexity and size. Defaults to -1 (auto), in which case it’s the same as the feature interaction depth controlled by max_feature_interaction_depth.
override_lag_sizes
¶
Time-series lags override, e.g. [7, 14, 21] (List)
Default value []
Override lags to be used e.g. [7, 14, 21] # this exact list e.g. 21 # produce from 1 to 21 e.g. 21:3 produce from 1 to 21 in step of 3 e.g. 5-21 produce from 5 to 21 e.g. 5-21:3 produce from 5 to 21 in step of 3
override_ufapt_lag_sizes
¶
Lags override for features that are not known ahead of time (List)
Default value []
Override lags to be used for features that are not known ahead of time e.g. [7, 14, 21] # this exact list e.g. 21 # produce from 1 to 21 e.g. 21:3 produce from 1 to 21 in step of 3 e.g. 5-21 produce from 5 to 21 e.g. 5-21:3 produce from 5 to 21 in step of 3
override_non_ufapt_lag_sizes
¶
Lags override for features that are known ahead of time (List)
Default value []
Override lags to be used for features that are known ahead of time e.g. [7, 14, 21] # this exact list e.g. 21 # produce from 1 to 21 e.g. 21:3 produce from 1 to 21 in step of 3 e.g. 5-21 produce from 5 to 21 e.g. 5-21:3 produce from 5 to 21 in step of 3
min_lag_size
¶
Smallest considered lag size (-1 = auto) (Number)
Default value -1
Smallest considered lag size
allow_time_column_as_feature
¶
Enable feature engineering from time column (Boolean)
Default value True
Whether to enable feature engineering based on selected time column, e.g. Date~weekday.
allow_time_column_as_numeric_feature
¶
Allow integer time column as numeric feature (Boolean)
Default value False
Whether to enable integer time column to be used as a numeric feature. If using time series recipe, using time column (numeric time stamps) as input features can lead to model that memorizes the actual time stamps instead of features that generalize to the future.
datetime_funcs
¶
Allowed date and date-time transformations (List)
Default value ['year', 'quarter', 'month', 'week', 'weekday', 'day', 'dayofyear', 'hour', 'minute', 'second']
- Allowed date or date-time transformations.
Date transformers include: year, quarter, month, week, weekday, day, dayofyear, num. Date transformers also include: hour, minute, second. Features in DAI will show up as get_ + transformation name. E.g. num is a direct numeric value representing the floating point value of time, which can lead to over-fitting if used on IID problems. So this is turned off by default.
filter_datetime_funcs
¶
Auto filtering of date and date-time transformations (Boolean)
Default value True
Whether to filter out date and date-time transformations that lead to unseen values in the future.
allow_tgc_as_features
¶
Consider time groups columns as standalone features (Boolean)
Default value False
- Whether to consider time groups columns (tgc) as standalone features.
Note that ‘time_column’ is treated separately via ‘Allow to engineer features from time column’. Use allowed_coltypes_for_tgc_as_features for control per feature type.
allowed_coltypes_for_tgc_as_features
¶
Which tgc feature types to consider as standalone features (List)
Default value ['numeric', 'categorical', 'ohe_categorical', 'datetime', 'date', 'text']
Which time groups columns (tgc) feature types to consider as standalone features, if the corresponding flag “Consider time groups columns as standalone features” is set to true. E.g. all column types would be [“numeric”, “categorical”, “ohe_categorical”, “datetime”, “date”, “text”] Note that ‘time_column’ is treated separately via ‘Allow to engineer features from time column’. Note that if lag-based time series recipe is disabled, then all tgc are allowed features.
enable_time_unaware_transformers
¶
Enable time unaware transformers (String)
Default value 'auto'
Whether various transformers (clustering, truncated SVD) are enabled, that otherwise would be disabled for time series due to potential to overfit by leaking across time within the fit of each fold.
tgc_only_use_all_groups
¶
Always group by all time groups columns for creating lag features (Boolean)
Default value True
Whether to group by all time groups columns for creating lag features, instead of sampling from them
tgc_allow_target_encoding
¶
Target encoding of time groups (Boolean)
Default value False
Whether to allow target encoding of time groups. This can be useful if there are many groups
time_series_holdout_preds
¶
Generate Time-Series Holdout Predictions (Boolean)
Default value True
- Enable creation of holdout predictions on training data
using moving windows (useful for MLI, but can be slow)
time_series_max_holdout_splits
¶
Maximum number of splits used for creating final time-series model’s holdout predictions (Number)
Default value -1
Max number of splits used for creating final time-series model’s holdout/backtesting predictions. With the default value ‘-1’ the same amount of splits as during model validation will be used. Use ‘time_series_validation_splits’ to control amount of time-based splits used for model validation.
mli_ts_fast_approx
¶
Whether to speed up calculation of Time-Series Holdout Predictions (Boolean)
Default value False
Whether to speed up time-series holdout predictions for back-testing on training data (used for MLI and metrics calculation). Can be slightly less accurate.
mli_ts_fast_approx_contribs
¶
Whether to speed up calculation of Shapley values for Time-Series Holdout Predictions (Boolean)
Default value True
Whether to speed up Shapley values for time-series holdout predictions for back-testing on training data (used for MLI). Can be slightly less accurate.
mli_ts_holdout_contribs
¶
Generate Shapley values for Time-Series Holdout Predictions at the time of experiment (Boolean)
Default value True
- Enable creation of Shapley values for holdout predictions on training data
using moving windows (useful for MLI, but can be slow), at the time of the experiment. If disabled, MLI will generate Shapley values on demand.
time_series_min_interpretability
¶
Lower limit on interpretability setting for time-series experiments, implicitly enforced. (Number)
Default value 5
Values of 5 or more can improve generalization by more aggressive dropping of least important features. Set to 1 to disable.
lags_dropout
¶
Dropout mode for lag features (String)
Default value 'dependent'
Dropout mode for lag features in order to achieve an equal n.a.-ratio between train and validation/test. The independent mode performs a simple feature-wise dropout, whereas the dependent one takes lag-size dependencies per sample/row into account.
prob_lag_non_targets
¶
Probability to create non-target lag features (-1.0 = auto) (Float)
Default value -1.0
Normalized probability of choosing to lag non-targets relative to targets (-1.0 = auto)
rolling_test_method
¶
Method to create rolling test set predictions (String)
Default value 'tta'
Method to create rolling test set predictions, if the forecast horizon is shorter than the time span of the test set. One can choose between test time augmentation (TTA) and a successive refitting of the final pipeline.
rolling_test_method_max_splits
¶
Max number of splits for ‘refit’ method to avoid OOM/slowness, both for GA and final refit. In GA, will fall back to fast_tta, in final will fail with error msg. (Number)
Default value 1000
fast_tta_internal
¶
Fast TTA for internal validation (feature evolution and holdout predictions) (Boolean)
Default value True
Apply TTA in one pass instead of using rolling windows for internal validation split predictions.
fast_tta_test
¶
Fast TTA for test set predictions (Boolean)
Default value True
Apply TTA in one pass instead of using rolling windows for test set predictions.
prob_default_lags
¶
Probability for new time-series transformers to use default lags (-1.0 = auto) (Float)
Default value -1.0
Probability for new Lags/EWMA gene to use default lags (determined by frequency/gap/horizon, independent of data) (-1.0 = auto)
prob_lagsinteraction
¶
Probability of exploring interaction-based lag transformers (-1.0 = auto) (Float)
Default value -1.0
Unnormalized probability of choosing other lag time-series transformers based on interactions (-1.0 = auto)
prob_lagsaggregates
¶
Probability of exploring aggregation-based lag transformers (-1.0 = auto) (Float)
Default value -1.0
Unnormalized probability of choosing other lag time-series transformers based on aggregations (-1.0 = auto)
ts_target_trafo
¶
Time series centering or detrending transformation (String)
Default value 'none'
Time series centering or detrending transformation. The free parameter(s) of the trend model are fitted and the trend is removed from the target signal, and the pipeline is fitted on the residuals. Predictions are made by adding back the trend. The robust centering or linear detrending variants use RANSAC to achieve a higher tolerance w.r.t. outliers. The Epidemic target transformer uses the SEIR model: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SEIR_model
ts_target_trafo_epidemic_params_dict
¶
Custom bounds for SEIRD epidemic model parameters (Dict)
Default value {}
Dictionary to control Epidemic SEIRD model for de-trending of target per time series group. Note: The target column must correspond to I(t), the infected cases as a function of time.
For each training split and time series group, the SEIRD model is fitted to the target signal (by optimizing the free parameters shown below for each time series group).
Then, the SEIRD model’s value is subtracted from the training response, and the residuals are passed to the feature engineering and modeling pipeline. For predictions, the SEIRD model’s value is added to the residual predictions from the pipeline, for each time series group.
Note: Careful selection of the bounds for the free parameters N, beta, gamma, delta, alpha, rho, lockdown, beta_decay, beta_decay_rate is extremely important for good results.
S(t) : susceptible/healthy/not immune
E(t) : exposed/not yet infectious
I(t) : infectious/active <= target column
R(t) : recovered/immune
D(t) : deceased
### Free parameters: - N : total population, N=S+E+I+R+D - beta : rate of exposure (S -> E) - gamma : rate of recovering (I -> R) - delta : incubation period - alpha : fatality rate - rho : rate at which people die - lockdown : day of lockdown (-1 => no lockdown) - beta_decay : beta decay due to lockdown - beta_decay_rate : speed of beta decay
### Dynamics: if lockdown >= 0:
beta_min = beta * (1 - beta_decay) beta = (beta - beta_min) / (1 + np.exp(-beta_decay_rate * (-t + lockdown))) + beta_min
dSdt = -beta * S * I / N dEdt = beta * S * I / N - delta * E dIdt = delta * E - (1 - alpha) * gamma * I - alpha * rho * I dRdt = (1 - alpha) * gamma * I dDdt = alpha * rho * I
Provide lower/upper bounds for each parameter you want to control the bounds for. Valid parameters are: N_min, N_max, beta_min, beta_max, gamma_min, gamma_max, delta_min, delta_max, alpha_min, alpha_max, rho_min, rho_max, lockdown_min, lockdown_max, beta_decay_min, beta_decay_max, beta_decay_rate_min, beta_decay_rate_max. You can change any subset of parameters, e.g., ts_target_trafo_epidemic_params_dict=”{‘N_min’: 1000, ‘beta_max’: 0.2}”
To get SEIR model (in cases where death rates are very low, can speed up calculations significantly): set alpha_min=alpha_max=rho_min=rho_max=beta_decay_rate_min=beta_decay_rate_max=0, lockdown_min=lockdown_max=-1.
ts_target_trafo_epidemic_target
¶
Which SEIRD model component the target column corresponds to: I: Infected, R: Recovered, D: Deceased. (String)
Default value 'I'
ts_lag_target_trafo
¶
Time series lag-based target transformation (String)
Default value 'none'
Time series lag-based target transformation. One can choose between difference and ratio of the current and a lagged target target.The corresponding lag size can be set via ‘Target transformation lag size’.
ts_target_trafo_lag_size
¶
Lag size used for time series target transformation (Number)
Default value -1
Lag size used for time series target transformation. See setting ‘Time series target transformation’.
timeseries_split_suggestion_timeout
¶
Timeout in seconds for time-series properties detection in UI. (Float)
Default value 30.0
Timeout in seconds for time-series properties detection in UI.