Models configuration¶
enable_constant_model
¶
Constant models (String)
Default value 'auto'
Whether to enable constant models (‘auto’/’on’/’off’)
enable_decision_tree
¶
Decision Tree models (String)
Default value 'auto'
Whether to enable Decision Tree models (‘auto’/’on’/’off’). ‘auto’ disables decision tree unless only non-constant model chosen.
enable_glm
¶
GLM models (String)
Default value 'auto'
Whether to enable GLM models (‘auto’/’on’/’off’)
enable_xgboost_gbm
¶
XGBoost GBM models (String)
Default value 'auto'
Whether to enable XGBoost GBM models (‘auto’/’on’/’off’)
enable_lightgbm
¶
LightGBM models (String)
Default value 'auto'
Whether to enable LightGBM models (‘auto’/’on’/’off’)
enable_tensorflow
¶
TensorFlow models (String)
Default value 'auto'
Whether to enable TensorFlow models (‘auto’/’on’/’off’)
enable_grownet
¶
PyTorch GrowNet models (String)
Default value 'auto'
Whether to enable PyTorch-based GrowNet models (‘auto’/’on’/’off’)
enable_ftrl
¶
FTRL models (String)
Default value 'auto'
Whether to enable FTRL support (follow the regularized leader) model (‘auto’/’on’/’off’)
enable_rulefit
¶
RuleFit models (String)
Default value 'auto'
Whether to enable RuleFit support (beta version, no mojo) (‘auto’/’on’/’off’)
enable_zero_inflated_models
¶
Zero-Inflated models (String)
Default value 'auto'
Whether to enable automatic addition of zero-inflated models for regression problems with zero-inflated target values that meet certain conditions: y >= 0, y.std() > y.mean()
enable_xgboost_rapids
¶
Enable RAPIDS-cudf extensions to XGBoost GBM/Dart (Boolean)
Default value False
Whether to enable RAPIDS extensions to XGBoost GBM/Dart. If selected, python scoring package can only be used on GPU system.
enable_rapids_cuml_models
¶
Whether to enable RAPIDS CUML GPU models (no mojo) (Boolean)
Default value False
Whether to enable GPU-based RAPIDS CUML models. No mojo support, but python scoring is supported. In alpha testing status.
enable_rapids_models_dask
¶
Whether to enable RAPIDS CUML GPU models to use Dask (no mojo) (Boolean)
Default value False
Whether to enable Multi-GPU mode for capable RAPIDS CUML models. No mojo support, but python scoring is supported. In alpha testing status.
enable_xgboost_rf
¶
Enable XGBoost RF mode (String)
Default value 'auto'
- Whether to enable XGBoost RF mode without early stopping.
Disabled unless switched on.
enable_xgboost_gbm_dask
¶
Enable dask_cudf (multi-GPU) XGBoost GBM/RF (String)
Default value 'auto'
- Whether to enable dask_cudf (multi-GPU) version of XGBoost GBM/RF.
Disabled unless switched on. Only applicable for single final model without early stopping. No Shapley possible.
enable_lightgbm_dask
¶
Enable dask (multi-node) LightGBM (String)
Default value 'auto'
- Whether to enable multi-node LightGBM.
Disabled unless switched on.
hyperopt_shift_leak
¶
Whether to do hyperopt for leakage/shift (Boolean)
Default value False
- If num_inner_hyperopt_trials_prefinal > 0,
then whether to do hyper parameter tuning during leakage/shift detection. Might be useful to find non-trivial leakage/shift, but usually not necessary.
hyperopt_shift_leak_per_column
¶
Whether to do hyperopt for leakage/shift for each column (Boolean)
Default value False
- If num_inner_hyperopt_trials_prefinal > 0,
then whether to do hyper parameter tuning during leakage/shift detection, when checking each column.
num_inner_hyperopt_trials_prefinal
¶
Number of trials for hyperparameter optimization during model tuning only (Number)
Default value 0
- Number of trials for Optuna hyperparameter optimization for tuning and evolution models.
0 means no trials. For small data, 100 is ok choice, while for larger data smaller values are reasonable if need results quickly. If using RAPIDS or DASK, hyperparameter optimization keeps data on GPU entire time. Currently applies to XGBoost GBM/Dart and LightGBM. Useful when there is high overhead of DAI outside inner model fit/predict, so this tunes without that overhead. However, can overfit on a single fold when doing tuning or evolution, and if using CV then averaging the fold hyperparameters can lead to unexpected results.
num_inner_hyperopt_trials_final
¶
Number of trials for hyperparameter optimization for final model only (Number)
Default value 0
- Number of trials for Optuna hyperparameter optimization for final models.
0 means no trials. For small data, 100 is ok choice, while for larger data smaller values are reasonable if need results quickly. Applies to final model only even if num_inner_hyperopt_trials=0. If using RAPIDS or DASK, hyperparameter optimization keeps data on GPU entire time. Currently applies to XGBoost GBM/Dart and LightGBM. Useful when there is high overhead of DAI outside inner model fit/predict, so this tunes without that overhead. However, for final model each fold is independently optimized and can overfit on each fold, after which predictions are averaged (so no issue with averaging hyperparameters when doing CV with tuning or evolution).
num_hyperopt_individuals_final
¶
Number of individuals in final ensemble to use Optuna on (Number)
Default value -1
Number of individuals in final model (all folds/repeats for given base model) to optimize with Optuna hyperparameter tuning.
-1 means all. 0 is same as choosing no Optuna trials. Might be only beneficial to optimize hyperparameters of best individual (i.e. value of 1) in ensemble.
optuna_pruner
¶
Optuna Pruners (String)
Default value 'MedianPruner'
Optuna Pruner to use (applicable to XGBoost and LightGBM that support Optuna callbacks). To disable choose None.
optuna_sampler
¶
Optuna Samplers (String)
Default value 'TPESampler'
Optuna Pruner to use (applicable to XGBoost and LightGBM that support Optuna callbacks).
enable_xgboost_hyperopt_callback
¶
Enable Optuna XGBoost Pruning callback (Boolean)
Default value True
Whether to enable Optuna’s XGBoost Pruning callback to abort unpromising runs. Not done if tuning learning rate.
enable_lightgbm_hyperopt_callback
¶
Enable Optuna LightGBM Pruning callback (Boolean)
Default value True
Whether to enable Optuna’s LightGBM Pruning callback to abort unpromising runs. Not done if tuning learning rate.
enable_xgboost_dart
¶
XGBoost Dart models (String)
Default value 'auto'
Whether to enable XGBoost Dart models (‘auto’/’on’/’off’)
enable_xgboost_dart_dask
¶
Enable dask_cudf (multi-GPU) XGBoost Dart (String)
Default value 'auto'
- Whether to enable dask_cudf (multi-GPU) version of XGBoost GBM/Dart.
Disabled unless switched on. If have only 1 GPU, then only uses dask_cudf if use_dask_for_1_gpu is True
Only applicable for single final model without early stopping. No Shapley possible.
enable_lightgbm_boosting_types
¶
LightGBM Boosting types (List)
Default value ['gbdt']
Which boosting types to enable for LightGBM (gbdt = boosted trees, rf_early_stopping = random forest with early stopping rf = random forest (no early stopping), dart = drop-out boosted trees with no early stopping
enable_lightgbm_multiclass_balancing
¶
LightGBM multiclass balancing (String)
Default value 'auto'
Whether to enable automatic class weighting for imbalanced multiclass problems. Can make worse probabilities, but improve confusion-matrix based scorers for rare classes without the need to manually calibrate probabilities or fine-tune the label creation process.
enable_lightgbm_cat_support
¶
LightGBM categorical support (Boolean)
Default value False
Whether to enable LightGBM categorical feature support (runs in CPU mode even if GPUs enabled, and no MOJO built)
enable_lightgbm_linear_tree
¶
LightGBM linear_tree mode (Boolean)
Default value False
Whether to enable LightGBM linear_tree handling (only CPU mode currently, no L1 regularization – mae objective, and no MOJO build).
enable_lightgbm_extra_trees
¶
LightGBM extra trees mode (Boolean)
Default value False
Whether to enable LightGBM extra trees mode to help avoid overfitting
lightgbm_monotone_constraints_method
¶
Method to use for monotonicity constraints for LightGBM (String)
Default value 'intermediate'
basic: as fast as when no constraints applied, but over-constrains the predictions. intermediate: very slightly slower, but much less constraining while still holding monotonicity and should be more accurate than basic. advanced: slower, but even more accurate than intermediate.
lightgbm_monotone_penalty
¶
LightGBM Monotone Penalty (Float)
Default value 0.0
Forbids any monotone splits on the first x (rounded down) level(s) of the tree. The penalty applied to monotone splits on a given depth is a continuous, increasing function the penalization parameter. https://lightgbm.readthedocs.io/en/latest/Parameters.html#monotone_penalty
enable_lightgbm_cuda_support
¶
LightGBM CUDA support (Boolean)
Default value False
- Whether to enable LightGBM CUDA implementation instead of OpenCL.
CUDA with LightGBM only supported for Pascal+ (compute capability >=6.0)
show_constant_model
¶
Whether to show constant models in iteration panel even when not best model (Boolean)
Default value False
Whether to show constant models in iteration panel even when not best model.
xgboost_reg_objectives
¶
Select XGBoost regression objectives. (List)
Default value ['reg:squarederror']
- Select objectives allowed for XGBoost.
Added to allowed mutations (the default reg:squarederror is in sample list 3 times) Note: tweedie, gamma, poisson are only valid for targets with positive values. Note: The objective relates to the form of the (regularized) loss function,
used to determine the split with maximum information gain, while the metric is the non-regularized metric
measured on the validation set (external or internally generated by DAI).
xgboost_reg_metrics
¶
Select XGBoost regression metrics. (List)
Default value ['rmse', 'mae']
- Select metrics allowed for XGBoost.
Added to allowed mutations (the default rmse and mae are in sample list twice). Note: tweedie, gamma, poisson are only valid for targets with positive values.
xgboost_binary_metrics
¶
Select XGBoost binary metrics. (List)
Default value ['logloss', 'auc', 'aucpr', 'error']
- Select which objectives allowed for XGBoost.
Added to allowed mutations (all evenly sampled).
lightgbm_reg_objectives
¶
Select LightGBM regression objectives. (List)
Default value ['mse', 'mae']
- Select objectives allowed for LightGBM.
Added to allowed mutations (the default mse is in sample list 2 times if selected). Note: If choose quantile/huber or fair and data is not normalized, recommendation is to use params_lightgbm to specify reasonable value of alpha (for quantile or huber) or fairc (for fair) to LightGBM. Note: mse is same as rmse correponding to L2 loss. mae is L1 loss. Note: tweedie, gamma, poisson are only valid for targets with positive values. Note: The objective relates to the form of the (regularized) loss function,
used to determine the split with maximum information gain, while the metric is the non-regularized metric
measured on the validation set (external or internally generated by DAI).
lightgbm_reg_metrics
¶
Select LightGBM regression metrics. (List)
Default value ['rmse', 'mse', 'mae']
- Select metrics allowed for LightGBM.
Added to allowed mutations (the default rmse is in sample list three times if selected). Note: If choose huber or fair and data is not normalized, recommendation is to use params_lightgbm to specify reasonable value of alpha (for huber or quantile) or fairc (for fair) to LightGBM. Note: tweedie, gamma, poisson are only valid for targets with positive values.
lightgbm_binary_objectives
¶
Select LightGBM binary objectives. (List)
Default value ['binary', 'xentropy']
- Select objectives allowed for LightGBM.
Added to allowed mutations (the default binary is in sample list 2 times if selected)
lightgbm_binary_metrics
¶
Select LightGBM binary metrics. (List)
Default value ['binary', 'binary', 'auc']
- Select which binary metrics allowed for LightGBM.
Added to allowed mutations (all evenly sampled).
lightgbm_multi_metrics
¶
Select LightGBM multiclass metrics. (List)
Default value ['multiclass', 'multi_error']
- Select which metrics allowed for multiclass LightGBM.
Added to allowed mutations (evenly sampled if selected).
tweedie_variance_power_list
¶
tweedie_variance_power parameters (List)
Default value [1.5, 1.2, 1.9]
- tweedie_variance_power parameters to try for XGBoostModel and LightGBMModel if tweedie is used.
First value is default.
huber_alpha_list
¶
huber parameters (List)
Default value [0.9, 0.3, 0.5, 0.6, 0.7, 0.8, 0.1, 0.99]
- huber parameters to try for LightGBMModel if huber is used.
First value is default.
fair_c_list
¶
fair c parameters (List)
Default value [1.0, 0.1, 0.5, 0.9]
- fair c parameters to try for LightGBMModel if fair is used.
First value is default.
poisson_max_delta_step_list
¶
poisson_max_delta_step parameters (List)
Default value [0.7, 0.9, 0.5, 0.2]
- poisson max_delta_step parameters to try for LightGBMModel if poisson is used.
First value is default.
quantile_alpha
¶
quantile alpha parameters (List)
Default value [0.9, 0.95, 0.99, 0.6]
- quantile alpha parameters to try for LightGBMModel if quantile is used.
First value is default.
reg_lambda_glm_default
¶
default reg_lambda regularization parameter (Float)
Default value 0.0004
Default reg_lambda regularization for XGBoost and LightGBM.
params_lightgbm
¶
params_lightgbm (Dict)
Default value {}
Parameters for LightGBM to override DAI parameters
e.g. 'eval_metric'
instead of 'metric'
should be used
e.g. params_lightgbm="{'objective': 'binary', 'n_estimators': 100, 'max_leaves': 64, 'random_state': 1234}"
e.g. params_lightgbm="{'n_estimators': 600, 'learning_rate': 0.1, 'reg_alpha': 0.0, 'reg_lambda': 0.5, 'gamma': 0, 'max_depth': 0, 'max_bin': 128, 'max_leaves': 256, 'scale_pos_weight': 1.0, 'max_delta_step': 3.469919910597877, 'min_child_weight': 1, 'subsample': 0.9, 'colsample_bytree': 0.3, 'tree_method': 'gpu_hist', 'grow_policy': 'lossguide', 'min_data_in_bin': 3, 'min_child_samples': 5, 'early_stopping_rounds': 20, 'num_classes': 2, 'objective': 'binary', 'eval_metric': 'binary', 'random_state': 987654, 'early_stopping_threshold': 0.01, 'monotonicity_constraints': False, 'silent': True, 'debug_verbose': 0, 'subsample_freq': 1}"
avoid including “system”-level parameters like 'n_gpus': 1, 'gpu_id': 0, , 'n_jobs': 1, 'booster': 'lightgbm'
also likely should avoid parameters like: ‘objective’: ‘binary’, unless one really knows what one is doing (e.g. alternative objectives)
See: https://xgboost.readthedocs.io/en/latest/parameter.html
And see: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst
Can also pass objective parameters if choose (or in case automatically chosen) certain objectives
https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters
params_xgboost
¶
params_xgboost (Dict)
Default value {}
Parameters for XGBoost to override DAI parameters
similar parameters as LightGBM since LightGBM parameters are transcribed from XGBoost equivalent versions
e.g. params_xgboost="{'n_estimators': 100, 'max_leaves': 64, 'max_depth': 0, 'random_state': 1234}"
See: https://xgboost.readthedocs.io/en/latest/parameter.html
params_dart
¶
params_dart (Dict)
Default value {}
Like params_xgboost but for XGBoost’s dart method
params_tensorflow
¶
Parameters for TensorFlow (Dict)
Default value {}
Parameters for TensorFlow to override DAI parameters
e.g. params_tensorflow="{'lr': 0.01, 'add_wide': False, 'add_attention': True, 'epochs': 30, 'layers': (100, 100), 'activation': 'selu', 'batch_size': 64, 'chunk_size': 1000, 'dropout': 0.3, 'strategy': '1cycle', 'l1': 0.0, 'l2': 0.0, 'ort_loss': 0.5, 'ort_loss_tau': 0.01, 'normalize_type': 'streaming'}"
See: https://keras.io/ , e.g. for activations: https://keras.io/activations/
Example layers: (500, 500, 500), (100, 100, 100), (100, 100), (50, 50)
Strategies: '1cycle'
or 'one_shot'
, See: https://github.com/fastai/fastai
‘one_shot” is not allowed for ensembles.
normalize_type: ‘streaming’ or ‘global’ (using sklearn StandardScaler)
params_gblinear
¶
params_gblinear (Dict)
Default value {}
Parameters for XGBoost’s gblinear to override DAI parameters
e.g. params_gblinear="{'n_estimators': 100}"
See: https://xgboost.readthedocs.io/en/latest/parameter.html
params_decision_tree
¶
params_decision_tree (Dict)
Default value {}
Parameters for Decision Tree to override DAI parameters
parameters should be given as XGBoost equivalent unless unique LightGBM parameter
e.g. 'eval_metric'
instead of 'metric'
should be used
e.g. params_decision_tree="{'objective': 'binary', 'n_estimators': 100, 'max_leaves': 64, 'random_state': 1234}"
e.g. params_decision_tree="{'n_estimators': 1, 'learning_rate': 1, 'reg_alpha': 0.0, 'reg_lambda': 0.5, 'gamma': 0, 'max_depth': 0, 'max_bin': 128, 'max_leaves': 256, 'scale_pos_weight': 1.0, 'max_delta_step': 3.469919910597877, 'min_child_weight': 1, 'subsample': 0.9, 'colsample_bytree': 0.3, 'tree_method': 'gpu_hist', 'grow_policy': 'lossguide', 'min_data_in_bin': 3, 'min_child_samples': 5, 'early_stopping_rounds': 20, 'num_classes': 2, 'objective': 'binary', 'eval_metric': 'logloss', 'random_state': 987654, 'early_stopping_threshold': 0.01, 'monotonicity_constraints': False, 'silent': True, 'debug_verbose': 0, 'subsample_freq': 1}"
avoid including “system”-level parameters like 'n_gpus': 1, 'gpu_id': 0, , 'n_jobs': 1, 'booster': 'lightgbm'
also likely should avoid parameters like: 'objective': 'binary:logistic'
, unless one really knows what one is doing (e.g. alternative objectives)
See: https://xgboost.readthedocs.io/en/latest/parameter.html
And see: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst
Can also pass objective parameters if choose (or in case automatically chosen) certain objectives
https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters
params_rulefit
¶
params_rulefit (Dict)
Default value {}
Parameters for Rulefit to override DAI parameters
e.g. params_rulefit="{'max_leaves': 64}"
See: https://xgboost.readthedocs.io/en/latest/parameter.html
params_ftrl
¶
params_ftrl (Dict)
Default value {}
Parameters for FTRL to override DAI parameters
params_grownet
¶
params_grownet (Dict)
Default value {}
Parameters for GrowNet to override DAI parameters
params_tune_mode
¶
Mode to handle params_tune_ tomls (String)
Default value 'override'
How to handle tomls like params_tune_lightgbm. override: For any key in the params_tune_ toml dict, use the list of values instead of DAI’s list of values. exclusive: Only tune the keys in the params_tune_ toml dict, unless no keys are present. Otherwise use DAI’s default values. In order to fully control hyperparameter tuning, either one should set “constrain” mode and include every hyperparameter and at least one value in each list within the dictionary, or choose exclusive and then rely upon DAI unchanging default values for any keys not given. For custom recipes, one can use recipe_dict to pass hyperparameters and if using the “get_one()” function in a custom recipe, and if user_tune passed contains the hyperparameter dictionary equivalent of params_tune_ tomls, then this params_tune_mode will also work for custom recipes.
params_tune_lightgbm
¶
params_tune_lightgbm (Dict)
Default value {}
Dictionary of key:lists of values to use for LightGBM tuning, overrides DAI’s choice per key
e.g. params_tune_lightgbm="{'min_child_samples': [1,2,5,100,1000], 'min_data_in_bin': [1,2,3,10,100,1000]}"
params_tune_xgboost
¶
params_tune_xgboost (Dict)
Default value {}
Like params_tune_lightgbm but for XGBoost
e.g. params_tune_xgboost="{'max_leaves': [8, 16, 32, 64]}"
params_tune_dart
¶
params_tune_dart (Dict)
Default value {}
Like params_tune_lightgbm but for XGBoost’s Dart
e.g. params_tune_dart="{'max_leaves': [8, 16, 32, 64]}"
params_tune_tensorflow
¶
params_tune_tensorflow (Dict)
Default value {}
Like params_tune_lightgbm but for TensorFlow
e.g. params_tune_tensorflow="{'layers': [(10,10,10), (10, 10, 10, 10)]}"
params_tune_gblinear
¶
params_tune_gblinear (Dict)
Default value {}
Like params_tune_lightgbm but for gblinear
e.g. params_tune_gblinear="{'reg_lambda': [.01, .001, .0001, .0002]}"
params_tune_rulefit
¶
params_tune_rulefit (Dict)
Default value {}
Like params_tune_lightgbm but for rulefit
e.g. params_tune_rulefit="{'max_depth': [4, 5, 6]}"
params_tune_ftrl
¶
params_tune_ftrl (Dict)
Default value {}
Like params_tune_lightgbm but for ftrl
params_tune_grownet
¶
params_tune_grownet (Dict)
Default value {}
Like params_tune_lightgbm but for GrowNet
e.g. params_tune_grownet="{'input_dropout': [0.2, 0.5]}"
max_nestimators
¶
Max. number of trees/iterations (Number)
Default value 3000
Maximum number of GBM trees or GLM iterations. Can be reduced for lower accuracy and/or higher interpretability. Early-stopping usually chooses less. Ignored if fixed_max_nestimators is > 0.
fixed_max_nestimators
¶
Fixed max. number of trees/iterations (-1 = auto mode) (Number)
Default value -1
Fixed maximum number of GBM trees or GLM iterations. If > 0, ignores max_nestimators and disables automatic reduction due to lower accuracy or higher interpretability. Early-stopping usually chooses less.
n_estimators_list_no_early_stopping
¶
n_estimators list to sample from for model mutations for models that do not use early stopping (List)
Default value [50, 100, 150, 200, 250, 300]
LightGBM dart mode and normal rf mode do not use early stopping, and they will sample from these values for n_estimators. XGBoost Dart mode will also sample from these n_estimators. Also applies to XGBoost Dask models that do not yet support early stopping or callbacks. For default parameters it chooses first value in list, while mutations sample from the list.
min_learning_rate_final
¶
Minimum learning rate for final ensemble GBM models (Float)
Default value 0.01
Lower limit on learning rate for final ensemble GBM models. In some cases, the maximum number of trees/iterations is insufficient for the final learning rate, which can lead to no early stopping triggered and poor final model performance. Then, one can try increasing the learning rate by raising this minimum, or one can try increasing the maximum number of trees/iterations.
max_learning_rate_final
¶
Maximum learning rate for final ensemble GBM models (Float)
Default value 0.05
Upper limit on learning rate for final ensemble GBM models
max_nestimators_feature_evolution_factor
¶
Reduction factor for max. number of trees/iterations during feature evolution (Float)
Default value 0.2
factor by which max_nestimators is reduced for tuning and feature evolution
min_learning_rate
¶
Min. learning rate for feature engineering GBM models (Float)
Default value 0.05
Lower limit on learning rate for feature engineering GBM models
max_learning_rate
¶
Max. learning rate for feature engineering GBM models (Float)
Default value 0.5
Upper limit on learning rate for GBM models If want to override min_learning_rate and min_learning_rate_final, set this to smaller value
tune_learning_rate
¶
Whether to tune learning rate even for GBM algorithms with early stopping (Boolean)
Default value False
Whether to tune learning rate for GBM algorithms (if not doing just single final model). If tuning with Optuna, might help isolate optimal learning rate.
max_epochs
¶
Max. number of epochs for TensorFlow / FTRL (Number)
Default value 10
Max. number of epochs for TensorFlow and FTRL models
max_max_depth
¶
Max. tree depth (and Max. max_leaves as 2**max_max_depth) (Number)
Default value 12
Maximum tree depth (and corresponding max max_leaves as 2**max_max_depth)
max_max_bin
¶
Max. max_bin for tree features (Number)
Default value 256
Maximum max_bin for tree features
rulefit_max_num_rules
¶
Max. number of rules for RuleFit (-1 for all) (Number)
Default value -1
Max number of rules to be used for RuleFit models (-1 for all)
rulefit_max_tree_depth
¶
rulefit_max_tree_depth (Number)
Default value 6
Max tree depth for RuleFit models
rulefit_max_num_trees
¶
rulefit_max_num_trees (Number)
Default value 100
Max number of trees for RuleFit models
fixed_ensemble_level
¶
Ensemble level for final modeling pipeline (Number)
Default value -1
Fixed ensemble_level -1 = auto, based upon ensemble_accuracy_switch, accuracy, size of data, etc. 0 = No ensemble, only final single model on validated iteration/tree count 1 = 1 model, multiple ensemble folds (cross-validation) >=2 = >=2 models, multiple ensemble folds (cross-validation)
cross_validate_single_final_model
¶
Cross-validate single final model (Boolean)
Default value True
- If enabled, use cross-validation to determine optimal parameters for single final model,
and to be able to create training holdout predictions.
ensemble_meta_learner
¶
Type of ensemble meta learner. Blender is recommended for most use cases. (String)
Default value 'blender'
Model to combine base model predictions, for experiments that create a final pipeline consisting of multiple base models.
blender: Creates a linear blend with non-negative weights that add to 1 (blending) - recommended extra_trees: Creates a tree model to non-linearly combine the base models (stacking) - experimental, and recommended to also set enable cross_validate_meta_learner. neural_net: Creates a neural net model to non-linearly combine the base models (stacking) - experimental, and recommended to also set enable cross_validate_meta_learner.
cross_validate_meta_learner
¶
Cross-validate meta learner for final ensemble. (Boolean)
Default value False
If enabled, use cross-validation to create an ensemble for the meta learner itself. Especially recommended for
ensemble_meta_learner='extra_trees'
, to make unbiased training holdout predictions.
Will disable MOJO if enabled. Not needed for ensemble_meta_learner='blender'
.”
parameter_tuning_num_models
¶
Number of models during tuning phase (-1 = auto) (Number)
Default value -1
Number of models to tune during pre-evolution phase
Can make this lower to avoid excessive tuning, or make higher to do enhanced tuning.
-1 : auto
imbalance_sampling_method
¶
Sampling method for imbalanced binary classification problems (String)
Default value 'off'
Sampling method for imbalanced binary classification problems. Choices are: “auto”: sample both classes as needed, depending on data “over_under_sampling”: over-sample the minority class and under-sample the majority class, depending on data “under_sampling”: under-sample the majority class to reach class balance “off”: do not perform any sampling
imbalance_sampling_threshold_min_rows_original
¶
Threshold for minimum number of rows in original training data to allow imbalanced sampling techniques. For smaller data, will disable imbalanced sampling, no matter what imbalance_sampling_method is set to. (Number)
Default value 100000
For smaller data, there’s no generally no benefit in using imbalanced sampling methods.
imbalance_ratio_sampling_threshold
¶
Ratio of majority to minority class for imbalanced binary classification to trigger special sampling techniques if enabled (Number)
Default value 5
For imbalanced binary classification: ratio of majority to minority class equal and above which to enable special imbalanced models with sampling techniques (specified by imbalance_sampling_method) to attempt to improve model performance.
heavy_imbalance_ratio_sampling_threshold
¶
Ratio of majority to minority class for heavily imbalanced binary classification to only enable special sampling techniques if enabled (Number)
Default value 25
For heavily imbalanced binary classification: ratio of majority to minority class equal and above which to enable only special imbalanced models on full original data, without upfront sampling.
imbalance_sampling_number_of_bags
¶
Number of bags for sampling methods for imbalanced binary classification (if enabled). -1 for automatic. (Number)
Default value -1
-1: automatic
imbalance_sampling_max_number_of_bags
¶
Hard limit on number of bags for sampling methods for imbalanced binary classification. (Number)
Default value 10
-1: automatic
imbalance_sampling_max_number_of_bags_feature_evolution
¶
Hard limit on number of bags for sampling methods for imbalanced binary classification during feature evolution phase. (Number)
Default value 3
- Only for shift/leakage/tuning/feature evolution models. Not used for final models. Final models can
be limited by imbalance_sampling_max_number_of_bags.
imbalance_sampling_max_multiple_data_size
¶
Max. size of data sampled during imbalanced sampling (in terms of dataset size) (Float)
Default value 1.0
- Max. size of data sampled during imbalanced sampling (in terms of dataset size),
controls number of bags (approximately). Only for imbalance_sampling_number_of_bags == -1.
imbalance_sampling_target_minority_fraction
¶
Target fraction of minority class after applying under/over-sampling techniques. -1.0 for automatic (Float)
Default value -1.0
- A value of 0.5 means that models/algorithms will be presented a balanced target class distribution
after applying under/over-sampling techniques on the training data. Sometimes it makes sense to choose a smaller value like 0.1 or 0.01 when starting from an extremely imbalanced original target distribution. -1.0: automatic
ftrl_max_interaction_terms_per_degree
¶
Max. number of automatic FTRL interactions terms for 2nd, 3rd, 4th order interactions terms (each) (Number)
Default value 10000
Samples the number of automatic FTRL interactions terms to no more than this value (for each of 2nd, 3rd, 4th order terms)
enable_bootstrap
¶
Whether to enable bootstrap sampling for validation and test scores. (Boolean)
Default value True
Whether to enable bootstrap sampling. Provides error bars to validation and test scores based on the standard error of the bootstrap mean.
tensorflow_num_classes_switch
¶
For classification problems with this many classes, default to TensorFlow (Number)
Default value 10
- Number of classes above which to always use TensorFlow (if TensorFlow is enabled),
instead of others models set on ‘auto’ (models set to ‘on’ are still used).
prediction_intervals
¶
Compute prediction intervals (Boolean)
Default value True
Compute empirical prediction intervals (based on holdout predictions).
prediction_intervals_alpha
¶
Confidence level for prediction intervals (Float)
Default value 0.9
Confidence level for prediction intervals.
pred_labels
¶
Output labels for predictions created during the experiment for classification problems. (Boolean)
Default value True
- Appends one extra output column with predicted target class (after the per-class probabilities).
Uses argmax for multiclass, and the threshold defined by the optimal scorer controlled by the ‘threshold_scorer’ expert setting for binary problems. This setting controls the training, validation and test set predictions (if applicable) that are created by the experiment. MOJO, scoring pipeline and client APIs control this behavior via their own version of this parameter.
max_abs_score_delta_train_valid
¶
Max. absolute delta between training and validation scores for tree models. (Float)
Default value 0.0
- Modify early stopping behavior for tree-based models (LightGBM, XGBoostGBM, CatBoost) such
that training score (on training data, not holdout) and validation score differ no more than this absolute value (i.e., stop adding trees once abs(train_score - valid_score) > max_abs_score_delta_train_valid). Keep in mind that the meaning of this value depends on the chosen scorer and the dataset (i.e., 0.01 for LogLoss is different than 0.01 for MSE). Experimental option, only for expert use to keep model complexity low. To disable, set to 0.0
max_rel_score_delta_train_valid
¶
Max. relative delta between training and validation scores for tree models. (Float)
Default value 0.0
- Modify early stopping behavior for tree-based models (LightGBM, XGBoostGBM, CatBoost) such
that training score (on training data, not holdout) and validation score differ no more than this relative value (i.e., stop adding trees once abs(train_score - valid_score) > max_rel_score_delta_train_valid * abs(train_score)). Keep in mind that the meaning of this value depends on the chosen scorer and the dataset (i.e., 0.01 for LogLoss is different than 0.01 for MSE). Experimental option, only for expert use to keep model complexity low. To disable, set to 0.0
glm_lambda_search
¶
Do lambda search for GLM (String)
Default value 'auto'
- Whether to search for optimal lambda for given alpha for XGBoost GLM.
If ‘auto’, disabled if training data has more rows * cols than final_pipeline_data_size or for multiclass experiments. Disabled always for ensemble_level = 0. Not always a good approach, can be slow for little payoff compared to grid search.
glm_lambda_search_by_eval_metric
¶
Do lambda search for GLM by exact eval metric (Boolean)
Default value False
- If XGBoost GLM lambda search is enabled, whether to do search by the eval metric (True)
or using the actual DAI scorer (False).
enable_early_stopping_threshold
¶
Early stopping threshold (String)
Default value 'auto'
- Whether to enable early stopping threshold for LightGBM, varying by accuracy.
Stops training once validation score changes by less than the threshold. This leads to fewer trees, usually avoiding wasteful trees, but may lower accuracy. ‘off’ leads to value of 0 used. ‘on’ leads to higher values for lower accuracy dial. ‘auto’ leads to ‘off’, unless reduce_mojo_size is true.
glm_optimal_refit
¶
glm_optimal_refit (Boolean)
Default value True
dump_modelparams_every_scored_indiv
¶
Enable detailed scored model info (Boolean)
Default value True
Whether to dump every scored individual’s model parameters to csv/tabulated/json file produces files like: individual_scored.params.[txt, csv, json]