插件配置¶
included_transformers¶
Include specific transformers (List)
默认值 []
Transformer display names to indicate which transformers to use in experiment. More information for these transformers can be viewed here: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/transformations.html This section allows including/excluding these transformations and may be useful when simpler (more interpretable) models are sought at the expense of accuracy. the interpretability setting) for multi-class: ‘[‘NumCatTETransformer’, ‘TextLinModelTransformer’, ‘FrequentTransformer’, ‘CVTargetEncodeTransformer’, ‘ClusterDistTransformer’, ‘WeightOfEvidenceTransformer’, ‘TruncSVDNumTransformer’, ‘CVCatNumEncodeTransformer’, ‘DatesTransformer’, ‘TextTransformer’, ‘OriginalTransformer’, ‘NumToCatWoETransformer’, ‘NumToCatTETransformer’, ‘ClusterTETransformer’, ‘InteractionsTransformer’]’
对于回归/二元:’[‘TextTransformer’, ‘ClusterDistTransformer’, ‘OriginalTransformer’, ‘TextLinModelTransformer’, ‘NumToCatTETransformer’, ‘DatesTransformer’, ‘WeightOfEvidenceTransformer’, ‘InteractionsTransformer’, ‘FrequentTransformer’, ‘CVTargetEncodeTransformer’, ‘NumCatTETransformer’, ‘NumToCatWoETransformer’, ‘TruncSVDNumTransformer’, ‘ClusterTETransformer’, ‘CVCatNumEncodeTransformer’]’
此列表显示在实验日志中(搜索 ‘Transformers used’)
included_models¶
Include specific models (List)
默认值 []
included_scorers¶
Include specific scorers (List)
默认值 []
included_pretransformers¶
Include specific preprocessing transformers (List)
默认值 []
选择在其他转换器运作之前要用于预处理的转换器。预处理转换器可潜在转换任何原始特征并输出任意特征,这些特征接着由常规层转换器使用,这些转换器的选择由 toml included_transformers 或通过 GUI “包括特定转换器” 控制。请注意:1) 预处理转换器(以及其他所有层的转换器)是 Python 和(如果适用)MOJO 评分包的一部分。2) 任何 BYOR 转换器插件或本机 DAI 转换器都可以用作预处理转换器。因此,例如,一个预处理转换器可以执行交互、字符串连接、日期提取,作为预处理步骤,
而下一层“日期”和“日期时间”转换器会将其用作输入数据。
警告:1) 目前无法对还未创建的 time_column 执行 Time Series 实验(实验设置只知道原始数据而非转换后的数据)。
但是,可以使用运行时数据插件(例如)将浮动日期时间转换为字符串日期时间,以供由 DAI“日期”和“日期时间”转换器以及 Time Series 的自动检测使用。
为了对 GUI/客户端自动选择分组、期限等执行 Time Series 实验,数据集必须让用户或通过一次性的数据插件在实验之前准备时间列和分组。
num_pipeline_layers¶
Number of pipeline layers (Number)
默认值 1
- 完整管道层的数量
(当 included_pretransformers 不为空时,不包括预处理层)。
included_datas¶
Include specific data recipes during experiment (List)
默认值 []
有两个数据插件:1) 用于在实验之外按文件/URL 添加新数据集或修改数据集(实验前数据插件) 2) 用于在实验和 Python 评分过程中修改数据集(运行时数据插件)。此列表适用于第二种情况。可以将相同的数据插件代码用于任一情况,但请注意:A) 第一种情况可以生成任何新数据,但不是评分包的一部分。B) 第二种情况在实验过程中修改数据,因此需要某个原始数据集。
- 插件仍可以创建所有新特征,只要它针对以下列有相同 名称 即可:
target、weight_column、fold_column、time_column、时间分组列。
included_individuals¶
Include specific individuals (List)
默认值 []
Custom individuals to use in experiment. DAI contains most information about model type, model hyperparameters, data science types for input features, transformers used, and transformer parameters an Individual Recipe (an object that is evolved by mutation within the context of DAI’s genetic algorithm).
Every completed experiment auto-generates python code for the experiment that corresponds to the individual(s) used to build the final model. This auto-generated python code can be edited offline and uploaded as a recipe, or it can be edited within the custom recipe management editor and saved. This allowed one a code-first access to a significant portion of DAI’s internal transformer and model generation.
Choices are: * Empty means all individuals are freshly generated and treated by DAI’s AutoML as a container of model and transformer choices. * Recipe display names of custom individuals, usually chosen via the UI. If the number of included custom individuals is less than DAI would need, then the remaining individuals are freshly generated. The expert experiment-level option fixed_num_individuals can be used to enforce how many individuals to use in evolution stage. The expert experiment-level option fixed_ensemble_level can be used to enforce how many individuals (each with one base model) will be used in the final model.
These individuals act in similar way as the feature brain acts for restart and retrain/refit, and one can retrain/refit custom individuals (i.e. skip the tuning and evolution stages) to use them in building a final model.
See toml make_python_code for more details.
make_python_code¶
Generate python code for individual (String)
Default value 'auto'
Whether to generate python code for the best individuals for the experiment. This python code contains a CustomIndividual class that is a recipe that can be edited and customized. The CustomIndividual class itself can also be customized for expert use.
By default, ‘auto’ means on.
At the end of an experiment, the summary zip contains auto-generated python code for the individuals used in the experiment, including the last best population (best_population_indivXX.py where XX iterates the population), last best individual (best_individual.py), final base models (final_indivYY.py where YY iterates the final base models). The summary zip also contains an example_indiv.py file that generates other transformers that may be useful that did not happen to be used in the experiment. In addition, the GUI and python client allow one to generate custom individuals from an aborted or finished experiment. For finished experiments, this will provide a zip file containing the final_indivYY.py files, and for aborted experiments this will contain the best population and best individual files.
See included_individuals for more details.
make_json_code¶
Generate json code for individual (String)
Default value 'auto'
Whether to generate json code for the best individuals for the experiment. This python code contains the essential attributes from the internal DAI individual class. Reading the json code as a recipe is not supported. By default, ‘auto’ means off.
python_code_ngenes_max¶
Max. Num. genes for example auto-generated individual (Number)
默认值 100
Maximum number of genes to make for example auto-generated custom individual, called example_indiv.py in the summary zip file.
python_code_ngenes_min¶
Min. Num. genes for example auto-generated individual (Number)
默认值 100
Minimum number of genes to make for example auto-generated custom individual, called example_indiv.py in the summary zip file.
threshold_scorer¶
Scorer to optimize threshold to be used in other confusion-matrix based scorers (for binary classification) (String)
默认值 'AUTO'
选择用于优化在基于混淆矩阵的相关评分器中使用的二元概率阈值的评分器(如精确率、召回率、FalsePositiveRate、FalseDiscoveryRate、FalseOmissionRate、TrueNegativeRate、FalseNegativeRate 和 NegativePredictiveValue)。如果目标类别更重要,则使用 F1;如果所有类别同等重要,则使用 MCC。使用 AUTO 会尝试将阈值评分器与用于实验的评分器同步,否则退回到 F1。
prob_add_genes¶
Probability to add transformers (Float)
默认值 0.5
非规范化概率,用于添加具有特定属性的基因或转换器实例。如果无法添加任何基因,将尝试其他突变(突变模型超参数、修剪基因、修剪特征,等等)。
prob_addbest_genes¶
Probability to add best shared transformers (Float)
默认值 0.5
以 prob_add_genes 为条件的非规范化概率,用于添加具有被证明为对群体内其他个体有利的特定属性的基因或转换器实例。
prob_prune_genes¶
Probability to prune transformers (Float)
默认值 0.5
非规范化概率,用于修剪具有特定属性的基因或转换器实例。如果存在具有许多属性的各种转换器,默认值是合理的。但是,如果有一组不应该发生改变的固定转换器,或者无法添加新的转换器属性,则将其设置为 0.0 是合理的,可避免意外的转换损失。
prob_perturb_xgb¶
Probability to mutate model parameters (Float)
默认值 0.25
非规范化概率更改模型超参数。
prob_prune_by_features¶
Probability to prune weak features (Float)
默认值 0.25
非规范化概率,用于修剪变量重要性较低的特征,而不是修剪整个基因/转换器实例。
skip_transformer_failures¶
Whether to skip failures of transformers (Boolean)
默认值 True
Skipping just avoids the failed transformer. Sometimes python multiprocessing swallows exceptions, so skipping and logging exceptions is also more reliable way to handle them. Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error. Features that fail are pruned from the individual. If that leaves no features in the individual, then backend tuning, feature/model tuning, final model building, etc. will still fail since DAI should not continue if all features are from a failed state.
skip_model_failures¶
Whether to skip failures of models (Boolean)
默认值 True
跳过仅仅会避免发生故障的模型。故障会根据 detailed_skip_failure_messages_level 进行记录。” 插件可以引发 h2oaicore.systemutils.IgnoreError,以忽略错误并避免记录错误。
skip_scorer_failures¶
Whether to skip failures of scorers (Boolean)
默认值 True
Skipping just avoids the failed scorer if among many scorers. Failures are logged depending upon detailed_skip_failure_messages_level.” Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error. Default is True to avoid failing in, e.g., final model building due to a single scorer.
skip_data_recipe_failures¶
Whether to skip runtime data recipe failures (Boolean)
默认值 False
Skipping avoids the failed recipe. Failures are logged depending upon detailed_skip_failure_messages_level.” Default is False because runtime data recipes are one-time at start of experiment and expected to work by default.
detailed_skip_failure_messages_level¶
Level to log (0=simple message 1=code line plus message 2=detailed stack traces) for skipped failures. (Number)
默认值 1
- 为发生故障并跳过的转换器或模型记录故障消息日志的详细程度。
完整的故障信息始终作为 *.stack 文件存储到磁盘中,这些文件在实验完成后进入实验日志 zip 文件中的 details 文件夹。
notify_failures¶
Whether to notify about failures of transformers or models or other recipe failures (Boolean)
默认值 True
是否不仅记录插件(模型和转换器)的错误,还在 GUI 中显示高级别通知。
enable_custom_recipes¶
enable_custom_recipes (Boolean)
默认值 True
启用自定义插件。
enable_custom_recipes_upload¶
enable_custom_recipes_upload (Boolean)
默认值 True
启用自定义插件的上传。
enable_custom_recipes_from_url¶
enable_custom_recipes_from_url (Boolean)
默认值 True
允许从外部 URL 下载自定义插件。
enable_custom_recipes_from_zip¶
enable_custom_recipes_from_zip (Boolean)
默认值 True
Enable upload recipe files to be zip, containing custom recipe(s) in root folder, while any other code or auxillary files must be in some sub-folder.
must_have_custom_transformers¶
must_have_custom_transformers (Boolean)
默认值 False
must_have_custom_transformers_2¶
must_have_custom_transformers_2 (Boolean)
默认值 False
must_have_custom_transformers_3¶
must_have_custom_transformers_3 (Boolean)
默认值 False
must_have_custom_models¶
must_have_custom_models (Boolean)
默认值 False
must_have_custom_scorers¶
must_have_custom_scorers (Boolean)
默认值 False
enable_recreate_custom_recipes_env¶
enable_recreate_custom_recipes_env (Boolean)
默认值 True
设置为 true 时,它允许从 Web 下载自定义插件第三方包,否则将从主工作线程传输 Python 环境。
extra_migration_custom_recipes_missing_modules¶
Whether to enable extra attempt to migrate custom modules during preview to show preview. Can lead to slow preview loading. (Boolean)
默认值 False
include_custom_recipes_by_default¶
include_custom_recipes_by_default (Boolean)
默认值 False
将自定义插件包括在默认的包含列表中(警告:这会启用所有自定义插件)
force_include_custom_recipes_by_default¶
force_include_custom_recipes_by_default (Boolean)
默认值 False
enable_h2o_recipes¶
enable_h2o_recipes (Boolean)
默认值 True
启用 h2o 插件服务器。
h2o_recipes_url¶
h2o_recipes_url (String)
默认值 'None'
供转换器、模型或评分器使用的 H2O 实例的 URL。
h2o_recipes_ip¶
h2o_recipes_ip (String)
默认值 'None'
供转换器、模型或评分器使用的 H2O 实例的 IP。
h2o_recipes_port¶
h2o_recipes_port (Number)
默认值 50361
供转换器、模型或评分器使用的 H2O 实例的端口。不得有其他任何实例位于该端口或下一个端口上。
h2o_recipes_name¶
h2o_recipes_name (String)
默认值 'None'
供转换器、模型或评分器使用的 H2O 实例的名称。
h2o_recipes_nthreads¶
h2o_recipes_nthreads (Number)
默认值 8
供转换器、模型或评分器使用的 H2O 实例的线程数量。-1 表示全部。
h2o_recipes_log_level¶
h2o_recipes_log_level (String)
默认值 'None'
供转换器、模型或评分器使用的 H2O 实例的日志级别。
h2o_recipes_max_mem_size¶
h2o_recipes_max_mem_size (String)
默认值 'None'
供转换器、模型或评分器使用的 H2O 实例的最大内存大小。
h2o_recipes_min_mem_size¶
h2o_recipes_min_mem_size (String)
默认值 'None'
供转换器、模型或评分器使用的 H2O 实例的最小内存大小。
h2o_recipes_kwargs¶
h2o_recipes_kwargs (Dict)
默认值 {}
要传递给插件服务器的 h2o.init() 的 kwargs 字典的一般用户覆盖。
h2o_recipes_start_trials¶
h2o_recipes_start_trials (Number)
默认值 5
要让 h2o-3 插件服务器尝试启动的次数。
h2o_recipes_start_sleep0¶
h2o_recipes_start_sleep0 (Number)
默认值 1
启动 h2o-3 插件服务器之前要休眠的秒数。
h2o_recipes_start_sleep¶
h2o_recipes_start_sleep (Number)
默认值 5
启动 h2o-3 插件服务器的相邻尝试之间要休眠的秒数。
custom_recipes_lock_to_git_repo¶
custom_recipes_lock_to_git_repo (Boolean)
默认值 False
- 将插件源锁定到特定 github 资料库。
如果为 True,则所有自定义插件都必须来自设置 custom_recipes_git_repo 中指定的资料库
custom_recipes_git_repo¶
custom_recipes_git_repo (String)
默认值 'https://github.com/h2oai/driverlessai-recipes'
如果 custom_recipes_lock_to_git_repo 设置为 True,只有此资料库可用于从中提取插件
custom_recipes_git_branch¶
custom_recipes_git_branch (String)
默认值 'None'
插件源资料库的分支约束。如果未设置或为 None,则允许任何分支
custom_recipes_excluded_filenames_from_repo_download¶
basenames of files to exclude from repo download (List)
默认值 []
allow_old_recipes_use_datadir_as_data_directory¶
Allow use of deprecated get_global_directory() method from custom recipes for backward compatibility of recipes created before 1.9.0. Disable to force separation of custom recipes per user (in which case user_dir() should be used instead). (Boolean)
默认值 True
recipe_dict¶
recipe_dict (Dict)
默认值 {}
- 用于控制每个实验的插件和特定自定义插件的字典。
例如,如果作为任何 toml 字符串插入到 GUI 中,可以使用:””recipe_dict=”{‘key1’: 2, ‘key2’: ‘value2’}””” 例如,如果放入 config.toml 中作为字典,可以使用:recipe_dict=”{‘key1’: 2, ‘key2’: ‘value2’}”
enable_custom_transformers¶
enable_custom_transformers (Boolean)
默认值 True
enable_custom_pretransformers¶
enable_custom_pretransformers (Boolean)
默认值 True
enable_custom_models¶
enable_custom_models (Boolean)
默认值 True
enable_custom_scorers¶
enable_custom_scorers (Boolean)
默认值 True
enable_custom_datas¶
enable_custom_datas (Boolean)
默认值 True
enable_custom_explainers¶
enable_custom_explainers (Boolean)
默认值 True
enable_custom_individuals¶
enable_custom_individuals (Boolean)
默认值 True
raise_on_invalid_included_list¶
Whether to validate recipe names (Boolean)
默认值 False
是否验证包含的列表(如 included_models)中提供的插件名称,或者(如果为 False)是否仅将警告记录到服务器日志并忽略所有无效的插件名称。
contrib_relative_directory¶
Base directory for recipes within data directory. (String)
默认值 'contrib'
contrib_env_relative_directory¶
contrib_env_relative_directory (String)
默认值 'contrib/env'
安装的自定义插件包的位置(相对于 data_directory)。我们将尝试动态安装包,但也可能(在服务器启动之前或之后)安装:(如果运行 docker,则在运行 docker 实例的 docker 内,或者如果是 deb/tar 本机安装,则作为运行服务器的用户身份,例如 dai 用户):PYTHONPATH=<full tmp dir>/<contrib_env_relative_directory>/lib/python3.6/site-packages/ <path to dai>dai-env.sh python -m pip install –prefix=<full tmp dir>/<contrib_env_relative_directory> <packagename> –upgrade –upgrade-strategy only-if-needed –log-file pip_log_file.log,其中对于本机 rpm/deb 安装 <path to dai> 为 /opt/h2oai/dai/。请注意:如果 <packagename> 是 wheel 文件或存档的名称,还可以安装 wheel 文件。
ignore_package_version¶
ignore_package_version (List)
默认值 []
要忽略的包版本的列表。在进行较小版本更改,但仍可能正常运作时很有用。
allow_version_change_user_packages¶
allow_version_change_user_packages (Boolean)
默认值 False
- 如果用户上传的插件更改了包版本,
允许升级包版本。如果尝试更改受 DAI 保护的包,可以尝试使用带有 [‘–no-deps’] 的 pip_install_options toml。或者,要完全忽略包的 DAI 版本,可以尝试使用带有 [‘–ignore-installed’] 的 pip_install_options toml。依赖带有此类包的插件的其他任何实验都将受到影响,请谨慎使用。
pip_install_overall_retries¶
pip_install_overall_retries (Number)
默认值 2
针对 pip 调用的 pip 安装重试。有时需要尝试两次
pip_install_verbosity¶
pip_install_verbosity (Number)
默认值 2
pip 安装详细程度级别(向 pip 提供的 -v 数量,最多为 3)
pip_install_timeout¶
pip_install_timeout (Number)
默认值 15
pip 安装超时,以秒为单位,有时互联网问题会导致更快发生故障
pip_install_retries¶
pip_install_retries (Number)
默认值 5
pip 安装重试计数
pip_install_use_constraint¶
pip_install_use_constraint (Boolean)
默认值 True
是否使用 DAI 约束文件来帮助 pip 处理版本。pip 可能犯错,从而无故尝试安装更新的包。
pip_install_options¶
pip_install_options (List)
默认值 []
pip 安装选项:其他选项的列表的字符串,例如 [‘–proxy’, ‘http://user:password@proxyserver:port’]
enable_basic_acceptance_tests¶
enable_basic_acceptance_tests (Boolean)
默认值 True
是否启用基本验收测试。测试是否可以对状态执行 pickle,等等。
enable_acceptance_tests¶
enable_acceptance_tests (Boolean)
默认值 True
是否应该为自定义基因/模型/评分器/等等运行验收测试。
acceptance_tests_use_weather_data¶
acceptance_tests_use_weather_data (Boolean)
默认值 False
skip_disabled_recipes¶
skip_disabled_recipes (Boolean)
默认值 False
是跳过禁用的插件 (True) 还是失败后显示 GUI 消息 (False)。
acceptance_test_timeout¶
Timeout in minutes for testing acceptance of each recipe (Float)
默认值 20.0
在中止插件的验收测试之前需要等待的分钟数。如果验收测试已启用并且超时,则拒绝插件。还可以为特定插件设置超时,方法是设置类的静态方法函数 acceptance_test_timeout 以返回在执行验收测试超时之前需要等待的分钟数。此超时不包括安装必需包的时间。
contrib_reload_and_recheck_server_start¶
contrib_reload_and_recheck_server_start (Boolean)
默认值 True
是否在服务器启动过程中(如果 per_user_directories == false)
或用户登录过程中(如果 per_user_directories == true)重复检查插件。如果出现任何不一致情况,将在重做验收测试过程中移除不合适的插件。此过程可能会使许多插件的启动时间长得多,但在 LTS 版本中,插件过时的风险很低。如果设置为 false,将禁用服务器启动过程中的验收重复测试,但请注意,如果使用这些不一致的插件,预览或实验可能会失败。执行插件的 API 更改或更激进的验收测试时,可能会发生此类不一致情况。
contrib_install_packages_server_start¶
contrib_install_packages_server_start (Boolean)
默认值 True
是否至少安装在服务器启动过程中(如果 per_user_directories == false)
或用户登录过程中(如果 per_user_directories == true)插件所需的包。请务必保持为 True,以便后续使用插件(安装了全局包)时可正常运作。
data_recipe_isolate¶
Whether to isolate (in fork) data recipe in case imports change needs across. (Boolean)
默认值 True
num_rows_acceptance_test_custom_transformer¶
num_rows_acceptance_test_custom_transformer (Number)
默认值 200
num_rows_acceptance_test_custom_model¶
num_rows_acceptance_test_custom_model (Number)
默认值 100
recipe_activation¶
Recipe Activation List (Dict)
默认值 {}
适用于给定实验的插件(每个字典键,按类型)的列表。这对于新 实验采用相同参数 的情况尤其相关,在这些情况中,用户应该能够根据需要使用与父实验相同的插件版本。