Recipes configuration¶
included_transformers
¶
Include specific transformers (List)
Default value []
Transformer display names to indicate which transformers to use in experiment. More information for these transformers can be viewed here: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/transformations.html This section allows including/excluding these transformations and may be useful when simpler (more interpretable) models are sought at the expense of accuracy. the interpretability setting) for multi-class: ‘[‘NumCatTETransformer’, ‘TextLinModelTransformer’, ‘FrequentTransformer’, ‘CVTargetEncodeTransformer’, ‘ClusterDistTransformer’, ‘WeightOfEvidenceTransformer’, ‘TruncSVDNumTransformer’, ‘CVCatNumEncodeTransformer’, ‘DatesTransformer’, ‘TextTransformer’, ‘OriginalTransformer’, ‘NumToCatWoETransformer’, ‘NumToCatTETransformer’, ‘ClusterTETransformer’, ‘InteractionsTransformer’]’
for regression/binary: ‘[‘TextTransformer’, ‘ClusterDistTransformer’, ‘OriginalTransformer’, ‘TextLinModelTransformer’, ‘NumToCatTETransformer’, ‘DatesTransformer’, ‘WeightOfEvidenceTransformer’, ‘InteractionsTransformer’, ‘FrequentTransformer’, ‘CVTargetEncodeTransformer’, ‘NumCatTETransformer’, ‘NumToCatWoETransformer’, ‘TruncSVDNumTransformer’, ‘ClusterTETransformer’, ‘CVCatNumEncodeTransformer’]’
This list appears in the experiment logs (search for ‘Transformers used’)
included_models
¶
Include specific models (List)
Default value []
included_scorers
¶
Include specific scorers (List)
Default value []
included_pretransformers
¶
Include specific preprocessing transformers (List)
Default value []
Select transformers to be used for preprocessing before other transformers operate. Pre-processing transformers can potentially take any original features and output arbitrary features, which will then be used by the normal layer of transformers whose selection is controlled by toml included_transformers or via the GUI “Include specific transformers”. Notes: 1) preprocessing transformers (and all other layers of transformers) are part of the python and (if applicable) mojo scoring packages. 2) any BYOR transformer recipe or native DAI transformer can be used as a preprocessing transformer. So, e.g., a preprocessing transformer can do interactions, string concatenations, date extractions as a preprocessing step,
and next layer of Date and DateTime transformers will use that as input data.
Caveats: 1) one cannot currently do a time-series experiment on a time_column that hasn’t yet been made (setup of experiment only knows about original data, not transformed)
However, one can use a run-time data recipe to (e.g.) convert a float date-time into string date-time, and this will be used by DAIs Date and DateTime transformers as well as auto-detection of time series.
in order to do a time series experiment with the GUI/client auto-selecting groups, periods, etc. the dataset must have time column and groups prepared ahead of experiment by user or via a one-time data recipe.
num_pipeline_layers
¶
Number of pipeline layers (Number)
Default value 1
- Number of full pipeline layers
(not including preprocessing layer when included_pretransformers is not empty).
included_datas
¶
Include specific data recipes during experiment (List)
Default value []
There are 2 data recipes: 1) that adds new dataset or modifies dataset outside experiment by file/url (pre-experiment data recipe) 2) that modifies dataset during experiment and python scoring (run-time data recipe) This list applies to the 2nd case. One can use the same data recipe code for either case, but note: A) the 1st case can make any new data, but is not part of scoring package. B) the 2nd case modifies data during the experiment, so needs some original dataset.
- The recipe can still create all new features, as long as it has same name for:
target, weight_column, fold_column, time_column, time group columns.
included_individuals
¶
Include specific individuals (List)
Default value []
Custom individuals to use in experiment. DAI contains most information about model type, model hyperparameters, data science types for input features, transformers used, and transformer parameters an Individual Recipe (an object that is evolved by mutation within the context of DAI’s genetic algorithm).
Every completed experiment auto-generates python code for the experiment that corresponds to the individual(s) used to build the final model. This auto-generated python code can be edited offline and uploaded as a recipe, or it can be edited within the custom recipe management editor and saved. This allowed one a code-first access to a significant portion of DAI’s internal transformer and model generation.
Choices are: * Empty means all individuals are freshly generated and treated by DAI’s AutoML as a container of model and transformer choices. * Recipe display names of custom individuals, usually chosen via the UI. If the number of included custom individuals is less than DAI would need, then the remaining individuals are freshly generated. The expert experiment-level option fixed_num_individuals can be used to enforce how many individuals to use in evolution stage. The expert experiment-level option fixed_ensemble_level can be used to enforce how many individuals (each with one base model) will be used in the final model.
These individuals act in similar way as the feature brain acts for restart and retrain/refit, and one can retrain/refit custom individuals (i.e. skip the tuning and evolution stages) to use them in building a final model.
See toml make_python_code for more details.
make_python_code
¶
Generate python code for individual (String)
Default value 'auto'
Whether to generate python code for the best individuals for the experiment. This python code contains a CustomIndividual class that is a recipe that can be edited and customized. The CustomIndividual class itself can also be customized for expert use.
By default, ‘auto’ means on.
At the end of an experiment, the summary zip contains auto-generated python code for the individuals used in the experiment, including the last best population (best_population_indivXX.py where XX iterates the population), last best individual (best_individual.py), final base models (final_indivYY.py where YY iterates the final base models). The summary zip also contains an example_indiv.py file that generates other transformers that may be useful that did not happen to be used in the experiment. In addition, the GUI and python client allow one to generate custom individuals from an aborted or finished experiment. For finished experiments, this will provide a zip file containing the final_indivYY.py files, and for aborted experiments this will contain the best population and best individual files.
See included_individuals for more details.
make_json_code
¶
Generate json code for individual (String)
Default value 'auto'
Whether to generate json code for the best individuals for the experiment. This python code contains the essential attributes from the internal DAI individual class. Reading the json code as a recipe is not supported. By default, ‘auto’ means off.
python_code_ngenes_max
¶
Max. Num. genes for example auto-generated individual (Number)
Default value 100
Maximum number of genes to make for example auto-generated custom individual, called example_indiv.py in the summary zip file.
python_code_ngenes_min
¶
Min. Num. genes for example auto-generated individual (Number)
Default value 100
Minimum number of genes to make for example auto-generated custom individual, called example_indiv.py in the summary zip file.
threshold_scorer
¶
Scorer to optimize threshold to be used in other confusion-matrix based scorers (for binary classification) (String)
Default value 'AUTO'
Select the scorer to optimize the binary probability threshold that is being used in related Confusion Matrix based scorers such as: Precision, Recall, FalsePositiveRate, FalseDiscoveryRate, FalseOmissionRate, TrueNegativeRate, FalseNegativeRate, NegativePredictiveValue. Use F1 if the target class matters more, and MCC if all classes are equally important. AUTO will try to sync the threshold scorer with the scorer used for the experiment, otherwise falls back to F1.
prob_add_genes
¶
Probability to add transformers (Float)
Default value 0.5
Unnormalized probability to add genes or instances of transformers with specific attributes. If no genes can be added, other mutations (mutating models hyper parmaters, pruning genes, pruning features, etc.) are attempted.
prob_addbest_genes
¶
Probability to add best shared transformers (Float)
Default value 0.5
Unnormalized probability, conditioned on prob_add_genes, to add genes or instances of transformers with specific attributes that have shown to be beneficial to other individuals within the population.
prob_prune_genes
¶
Probability to prune transformers (Float)
Default value 0.5
Unnormalized probability to prune genes or instances of transformers with specific attributes. If a variety of transformers with many attributes exists, default value is reasonable. However, if one has fixed set of transformers that should not change or no new transformer attributes can be added, then setting this to 0.0 is reasonable to avoid undesired loss of transformations.
prob_perturb_xgb
¶
Probability to mutate model parameters (Float)
Default value 0.25
Unnormalized probability change model hyper parameters.
prob_prune_by_features
¶
Probability to prune weak features (Float)
Default value 0.25
Unnormalized probability to prune features that have low variable importance, as opposed to pruning entire instances of genes/transformers when prob_prune_genes used. If prob_prune_genes=0.0 and prob_prune_by_features==0.0, then genes/transformers and transformed features are only pruned if they are: 1) inconsistent with the genome 2) inconsistent with the column data types 3) had no signal (for interactions and cv_in_cv for target encoding) 4) transformation failed E.g. these are toml settings are then ignored: 1) ngenes_max 2) limit_features_by_interpretability 3) varimp_threshold_at_interpretability_10 4) features_allowed_by_interpretability 5) remove_scored_0gain_genes_in_postprocessing_above_interpretability 6) nfeatures_max_threshold 7) features_cost_per_interp So this acts similar to no_drop_features, except no_drop_features also applies to shift and leak detection, constant columns are not dropped, ID columns are not dropped.
prob_prune_pretransformer_genes
¶
Probability to prune pretransformers (Float)
Default value 0.5
Like prob_prune_genes but only for pretransformers, i.e. those transformers in layers except last layer that connects to model.
prob_prune_pretransformer_by_features
¶
Probability to prune weak pretransformer features (Float)
Default value 0.25
Like prob_prune_by_features but only for pretransformers, i.e. those transformers in layers except last layer that connects to model.
skip_transformer_failures
¶
Whether to skip failures of transformers (Boolean)
Default value True
Skipping just avoids the failed transformer. Sometimes python multiprocessing swallows exceptions, so skipping and logging exceptions is also more reliable way to handle them. Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error. Features that fail are pruned from the individual. If that leaves no features in the individual, then backend tuning, feature/model tuning, final model building, etc. will still fail since DAI should not continue if all features are from a failed state.
skip_model_failures
¶
Whether to skip failures of models (Boolean)
Default value True
Skipping just avoids the failed model. Failures are logged depending upon detailed_skip_failure_messages_level.” Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error.
skip_scorer_failures
¶
Whether to skip failures of scorers (Boolean)
Default value True
Skipping just avoids the failed scorer if among many scorers. Failures are logged depending upon detailed_skip_failure_messages_level.” Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error. Default is True to avoid failing in, e.g., final model building due to a single scorer.
skip_data_recipe_failures
¶
Whether to skip runtime data recipe failures (Boolean)
Default value False
Skipping avoids the failed recipe. Failures are logged depending upon detailed_skip_failure_messages_level.” Default is False because runtime data recipes are one-time at start of experiment and expected to work by default.
detailed_skip_failure_messages_level
¶
Level to log (0=simple message 1=code line plus message 2=detailed stack traces) for skipped failures. (Number)
Default value 1
- How much verbosity to log failure messages for failed and then skipped transformers or models.
Full failures always go to disk as *.stack files, which upon completion of experiment goes into details folder within experiment log zip file.
notify_failures
¶
Whether to notify about failures of transformers or models or other recipe failures (Boolean)
Default value True
Whether to not just log errors of recipes (models and transformers) but also show high-level notification in GUI.
enable_custom_recipes
¶
enable_custom_recipes (Boolean)
Default value True
Enable custom recipes.
enable_custom_recipes_upload
¶
enable_custom_recipes_upload (Boolean)
Default value True
Enable uploading of custom recipes.
enable_custom_recipes_from_url
¶
enable_custom_recipes_from_url (Boolean)
Default value True
Enable downloading of custom recipes from external URL.
enable_custom_recipes_from_zip
¶
enable_custom_recipes_from_zip (Boolean)
Default value True
Enable upload recipe files to be zip, containing custom recipe(s) in root folder, while any other code or auxillary files must be in some sub-folder.
must_have_custom_transformers
¶
must_have_custom_transformers (Boolean)
Default value False
must_have_custom_transformers_2
¶
must_have_custom_transformers_2 (Boolean)
Default value False
must_have_custom_transformers_3
¶
must_have_custom_transformers_3 (Boolean)
Default value False
must_have_custom_models
¶
must_have_custom_models (Boolean)
Default value False
must_have_custom_scorers
¶
must_have_custom_scorers (Boolean)
Default value False
enable_recreate_custom_recipes_env
¶
enable_recreate_custom_recipes_env (Boolean)
Default value True
When set to true, it enable downloading custom recipes third party packages from the web, otherwise the python environment will be transferred from main worker.
extra_migration_custom_recipes_missing_modules
¶
Whether to enable extra attempt to migrate custom modules during preview to show preview. Can lead to slow preview loading. (Boolean)
Default value False
include_custom_recipes_by_default
¶
include_custom_recipes_by_default (Boolean)
Default value False
Include custom recipes in default inclusion lists (warning: enables all custom recipes)
force_include_custom_recipes_by_default
¶
force_include_custom_recipes_by_default (Boolean)
Default value False
enable_h2o_recipes
¶
enable_h2o_recipes (Boolean)
Default value True
Enable h2o recipes server.
h2o_recipes_url
¶
h2o_recipes_url (String)
Default value 'None'
URL of H2O instance for use by transformers, models, or scorers.
h2o_recipes_ip
¶
h2o_recipes_ip (String)
Default value 'None'
IP of H2O instance for use by transformers, models, or scorers.
h2o_recipes_port
¶
h2o_recipes_port (Number)
Default value 50361
Port of H2O instance for use by transformers, models, or scorers. No other instances must be on that port or on next port.
h2o_recipes_name
¶
h2o_recipes_name (String)
Default value 'None'
Name of H2O instance for use by transformers, models, or scorers.
h2o_recipes_nthreads
¶
h2o_recipes_nthreads (Number)
Default value 8
Number of threads for H2O instance for use by transformers, models, or scorers. -1 for all.
h2o_recipes_log_level
¶
h2o_recipes_log_level (String)
Default value 'None'
Log Level of H2O instance for use by transformers, models, or scorers.
h2o_recipes_max_mem_size
¶
h2o_recipes_max_mem_size (String)
Default value 'None'
Maximum memory size of H2O instance for use by transformers, models, or scorers.
h2o_recipes_min_mem_size
¶
h2o_recipes_min_mem_size (String)
Default value 'None'
Minimum memory size of H2O instance for use by transformers, models, or scorers.
h2o_recipes_kwargs
¶
h2o_recipes_kwargs (Dict)
Default value {}
General user overrides of kwargs dict to pass to h2o.init() for recipe server.
h2o_recipes_start_trials
¶
h2o_recipes_start_trials (Number)
Default value 5
Number of trials to give h2o-3 recipe server to start.
h2o_recipes_start_sleep0
¶
h2o_recipes_start_sleep0 (Number)
Default value 1
Number of seconds to sleep before starting h2o-3 recipe server.
h2o_recipes_start_sleep
¶
h2o_recipes_start_sleep (Number)
Default value 5
Number of seconds to sleep between trials of starting h2o-3 recipe server.
custom_recipes_lock_to_git_repo
¶
custom_recipes_lock_to_git_repo (Boolean)
Default value False
- Lock source for recipes to a specific github repo.
If True then all custom recipes must come from the repo specified in setting: custom_recipes_git_repo
custom_recipes_git_repo
¶
custom_recipes_git_repo (String)
Default value 'https://github.com/h2oai/driverlessai-recipes'
If custom_recipes_lock_to_git_repo is set to True, only this repo can be used to pull recipes from
custom_recipes_git_branch
¶
custom_recipes_git_branch (String)
Default value 'None'
Branch constraint for recipe source repo. Any branch allowed if unset or None
custom_recipes_excluded_filenames_from_repo_download
¶
basenames of files to exclude from repo download (List)
Default value []
allow_old_recipes_use_datadir_as_data_directory
¶
Allow use of deprecated get_global_directory() method from custom recipes for backward compatibility of recipes created before 1.9.0. Disable to force separation of custom recipes per user (in which case user_dir() should be used instead). (Boolean)
Default value True
recipe_dict
¶
recipe_dict (Dict)
Default value {}
- Dictionary to control recipes for each experiment and particular custom recipes.
E.g. if inserting into the GUI as any toml string, can use: “”recipe_dict=”{‘key1’: 2, ‘key2’: ‘value2’}””” E.g. if putting into config.toml as a dict, can use: recipe_dict=”{‘key1’: 2, ‘key2’: ‘value2’}”
mutation_dict
¶
mutation_dict (Dict)
Default value {}
- Dictionary to control some mutation parameters.
E.g. if inserting into the GUI as any toml string, can use: “”mutation_dict=”{‘key1’: 2, ‘key2’: ‘value2’}””” E.g. if putting into config.toml as a dict, can use: mutation_dict=”{‘key1’: 2, ‘key2’: ‘value2’}”
enable_custom_transformers
¶
enable_custom_transformers (Boolean)
Default value True
enable_custom_pretransformers
¶
enable_custom_pretransformers (Boolean)
Default value True
enable_custom_models
¶
enable_custom_models (Boolean)
Default value True
enable_custom_scorers
¶
enable_custom_scorers (Boolean)
Default value True
enable_custom_datas
¶
enable_custom_datas (Boolean)
Default value True
enable_custom_explainers
¶
enable_custom_explainers (Boolean)
Default value True
enable_custom_individuals
¶
enable_custom_individuals (Boolean)
Default value True
raise_on_invalid_included_list
¶
Whether to validate recipe names (Boolean)
Default value False
Whether to validate recipe names provided in included lists, like included_models, or (if False) whether to just log warning to server logs and ignore any invalid names of recipes.
contrib_relative_directory
¶
Base directory for recipes within data directory. (String)
Default value 'contrib'
contrib_env_relative_directory
¶
contrib_env_relative_directory (String)
Default value 'contrib/env'
location of custom recipes packages installed (relative to data_directory) We will try to install packages dynamically, but can also do (before or after server started): (inside docker running docker instance if running docker, or as user server is running as (e.g. dai user) if deb/tar native installation: PYTHONPATH=<full tmp dir>/<contrib_env_relative_directory>/lib/python3.6/site-packages/ <path to dai>dai-env.sh python -m pip install –prefix=<full tmp dir>/<contrib_env_relative_directory> <packagename> –upgrade –upgrade-strategy only-if-needed –log-file pip_log_file.log where <path to dai> is /opt/h2oai/dai/ for native rpm/deb installation Note can also install wheel files if <packagename> is name of wheel file or archive.
ignore_package_version
¶
ignore_package_version (List)
Default value []
List of package versions to ignore. Useful when small version change but likely to function still with old package version.
clobber_package_version
¶
clobber_package_version (List)
Default value ['catboost']
List of package versions to remove if encounter conflict. Useful when want new version of package, and old recipes likely to function still.
swap_package_version
¶
swap_package_version (Dict)
Default value {'catboost==0.26.1': 'catboost==1.0.5', 'catboost==0.25.1': 'catboost==1.0.5', 'catboost==0.24.1': 'catboost==1.0.5', 'catboost==1.0.4': 'catboost==1.0.5'}
List of package versions to remove if encounter conflict. Useful when want new version of package, and old recipes likely to function still. Also useful when do not need to use old versions of recipes even if they would no longer function.
allow_version_change_user_packages
¶
allow_version_change_user_packages (Boolean)
Default value False
- If user uploads recipe with changes to package versions,
allow upgrade of package versions. If DAI protected packages are attempted to be changed, can try using pip_install_options toml with [‘–no-deps’]. Or to ignore entirely DAI versions of packages, can try using pip_install_options toml with [‘–ignore-installed’]. Any other experiments relying on recipes with such packages will be affected, use with caution.
pip_install_overall_retries
¶
pip_install_overall_retries (Number)
Default value 2
pip install retry for call to pip. Sometimes need to try twice
pip_install_verbosity
¶
pip_install_verbosity (Number)
Default value 2
pip install verbosity level (number of -v’s given to pip, up to 3
pip_install_timeout
¶
pip_install_timeout (Number)
Default value 15
pip install timeout in seconds, Sometimes internet issues would mean want to fail faster
pip_install_retries
¶
pip_install_retries (Number)
Default value 5
pip install retry count
pip_install_use_constraint
¶
pip_install_use_constraint (Boolean)
Default value True
Whether to use DAI constraint file to help pip handle versions. pip can make mistakes and try to install updated packages for no reason.
pip_install_options
¶
pip_install_options (List)
Default value []
pip install options: string of list of other options, e.g. [‘–proxy’, ‘http://user:password@proxyserver:port’]
enable_basic_acceptance_tests
¶
enable_basic_acceptance_tests (Boolean)
Default value True
Whether to enable basic acceptance testing. Tests if can pickle the state, etc.
enable_acceptance_tests
¶
enable_acceptance_tests (Boolean)
Default value True
Whether acceptance tests should run for custom genes / models / scorers / etc.
acceptance_tests_use_weather_data
¶
acceptance_tests_use_weather_data (Boolean)
Default value False
skip_disabled_recipes
¶
skip_disabled_recipes (Boolean)
Default value False
Whether to skip disabled recipes (True) or fail and show GUI message (False).
acceptance_test_timeout
¶
Timeout in minutes for testing acceptance of each recipe (Float)
Default value 20.0
Minutes to wait until a recipe’s acceptance testing is aborted. A recipe is rejected if acceptance testing is enabled and times out. One may also set timeout for a specific recipe by setting the class’s staticmethod function called acceptance_test_timeout to return number of minutes to wait until timeout doing acceptance testing. This timeout does not include the time to install required packages.
contrib_reload_and_recheck_server_start
¶
contrib_reload_and_recheck_server_start (Boolean)
Default value True
Whether to re-check recipes during server startup (if per_user_directories == false)
or during user login (if per_user_directories == true). If any inconsistency develops, the bad recipe will be removed during re-doing acceptance testing. This process can make start-up take alot longer for many recipes, but in LTS releases the risk of recipes becoming out of date is low. If set to false, will disable acceptance re-testing during sever start but note that previews or experiments may fail if those inconsistent recipes are used. Such inconsistencies can occur when API changes for recipes or more aggressive acceptance tests are performed.
contrib_install_packages_server_start
¶
contrib_install_packages_server_start (Boolean)
Default value True
Whether to at least install packages required for recipes during server startup (if per_user_directories == false)
or during user login (if per_user_directories == true). Important to keep True so any later use of recipes (that have global packages installed) will work.
data_recipe_isolate
¶
Whether to isolate (in fork) data recipe in case imports change needs across. (Boolean)
Default value True
server_recipe_url
¶
server_recipe_url (String)
Default value ''
Space-separated string list of URLs for recipes that are loaded at user login time
num_rows_acceptance_test_custom_transformer
¶
num_rows_acceptance_test_custom_transformer (Number)
Default value 200
num_rows_acceptance_test_custom_model
¶
num_rows_acceptance_test_custom_model (Number)
Default value 100
recipe_activation
¶
Recipe Activation List (Dict)
Default value {}
List of recipes (per dict key by type) that are applicable for given experiment. This is especially relevant for situations such as new experiment with same params where the user should be able to use the same recipe versions as the parent experiment if he/she wishes to.