实验配置¶

`max_runtime_minutes`¶

`max_runtime_minutes_until_abort`¶

`time_abort`¶

`time_abort_format`¶

`time_abort_timezone`¶

`delete_model_dirs_and_files`¶

`recipe`¶

Pipeline Building Recipe (String)

默认值 'auto'

# 插件类型 ## 插件覆盖任何 GUI 设置 - ‘auto’：所有模型和特征由实验设置、toml 设置和 feature_engineering_effort 自动确定

‘compliant’类似于 ‘auto’，不同之处是：
- 可解释性=10 （以避免复杂性，将覆盖为可解释性选择的 GUI 或 Python Client）
- enable_glm=’on’ （其余为 ‘关闭’ 状态，以避免复杂性并可与 MLI 支持的算法兼容）
- fixed_ensemble_level=0：不使用任何集成
- feature_brain_level=0 （不使用任何特征大脑（以确保每次重启都完全相同））
- max_feature_interaction_depth=1：将交互深度设置为 1（无多特征交互，以避免复杂性）
- target_transformer=’identity’：用于回归问题（以避免复杂性）
- check_distribution_shift_drop=’off’：不在训练、验证和测试数据之间使用分布位移来丢弃特征（如果不进行微调，会有一定风险）
‘monotonic_gbm’类似于 ‘auto’，不同之处是：
- monotonicity_constraints_interpretability_switch=1：启用 Monotonicity Constraints
- self.config.monotonicity_constraints_correlation_threshold = 0.01：见下文
- Monotonicity_constraints_drop_low_correlation_features=true：丢弃与目标的相关性差异至少达 0.01 的特征（由上述参数指定）
- fixed_ensemble_level=0：不使用任何集成（以避免复杂性）
- included_models=[‘LightGBMModel’]
- included_transformers=[‘OriginalTransformer’]：仅使用原始（数值）特征
- feature_brain_level=0：不使用任何特征大脑（以确保每次重启都完全相同）
- monotonicity_constraints_log_level=’high’
- Autodoc_pd_max_runtime=-1：在 AutoDoc 中创建 PDP 未超时
‘kaggle’类似于 ‘auto’，不同之处是：
- 外部验证集将与训练集串联，并且目标被标记为缺失
- 测试集将与训练集串联，并且目标被标记为缺失
- 不使用此目标的转换器将被允许对所有训练、验证和测试进行 fit_transform
- 多个配置 toml 专家选项开放限制（例如更多数值被视为分类值）
- 注意：如果有大量内存，可以：
  
  选择 kaggle 模式，然后将 fixed_feature_interaction_depth 更改为大的负数，
否则，提供给转换器的默认特征数默认限制为 50
选择 mutation_mode = “full”，使每个转换器一次完成的转换类型更多
‘nlp_model’：仅启用处理纯文本的 NLP 模型
‘nlp_transformer’：仅启用处理纯文本的 NLP 转换器，但允许任何模型类型
‘image_model’：仅启用处理纯图像的图像模型
‘image_transformer’：仅启用处理纯图像的 Image Transformer，但允许任何模型类型
‘unsupervised’：仅启用无监督转换器、模型和评分器
‘gpus_max’：最大限度利用 GPU（例如使用在 XGBoost、rapids、Optuna 超参数搜索等）
‘more_overfit_protection’：通过禁用目标编码，并使 GA 在树计数和学习率方面与最终模型相当，可以潜在地改善过拟合（对小数据尤其有效）

可以选择每种管道构建插件模式，然后使用各个专家设置进行微调。更改管道构建插件会将所有管道构建插件选项重设为默认值，然后重新应用新模式的特定规则，这将撤销对管道构建插件规则中的专家选项进行的任何微调。

如果选择根据父实验执行新/持续/调整/重新训练实验，则不会重新应用所有插件规则，而且保留任何微调。要重置插件行为，可以在“自动”和所需模式之间切换。这样，新的子实验将使用所选插件的默认设置。

`enable_genetic_algorithm`¶

`make_python_scoring_pipeline`¶

`make_mojo_scoring_pipeline`¶

`inject_mojo_for_predictions`¶

`mojo_for_predictions`¶

`mojo_for_predictions_max_rows`¶

`mojo_for_predictions_batch_size`¶

`mojo_acceptance_test_rtol`¶

`mojo_acceptance_test_atol`¶

`reduce_mojo_size`¶

`make_pipeline_visualization`¶

`make_python_pipeline_visualization`¶

`max_cols_make_autoreport_automatically`¶

`max_cols_make_pipeline_visualization_automatically`¶

`pass_env_to_deprecated_python_scoring`¶

`transformer_description_line_length`¶

`benchmark_mojo_latency`¶

`benchmark_mojo_latency_auto_size_limit`¶

`mojo_building_timeout`¶

`mojo_building_parallelism`¶

`max_workers`¶

`max_cores_dai`¶

`stall_subprocess_submission_dai_fork_threshold_count`¶

`stall_subprocess_submission_mem_threshold_pct`¶

`max_cores_by_physical`¶

`max_cores_limit`¶

`assumed_simultaneous_dt_forks_stats_openblas`¶

`max_max_dt_threads_stats_openblas`¶

`min_dt_threads_munging`¶

`min_dt_threads_final_munging`¶

`max_dt_threads_do_timeseries_split_suggestion`¶

`kaggle_username`¶

`kaggle_key`¶

`kaggle_timeout`¶

`kaggle_keep_submission`¶

`kaggle_competitions`¶

`ping_period`¶

`ping_autodl`¶

`disk_limit_gb`¶

`stall_disk_limit_gb`¶

`memory_limit_gb`¶

`min_num_rows`¶

`min_rows_per_class`¶

`min_rows_per_split`¶

`reproducibility_level`¶

`seed`¶

`missing_values`¶

`glm_nan_impute_training_data`¶

`glm_nan_impute_validation_data`¶

`glm_nan_impute_prediction_data`¶

`tf_nan_impute_value`¶

`statistical_threshold_data_size_small`¶

`statistical_threshold_data_size_large`¶

`aux_threshold_data_size_large`¶

`set_method_sampling_row_limit`¶

`performance_threshold_data_size_small`¶

`performance_threshold_data_size_large`¶

`max_relative_cols_mismatch_allowed`¶

`max_cols`¶

`max_rows_col_stats`¶

`max_rows_cv_in_cv_gini`¶

`max_rows_constant_model`¶

`max_rows_final_ensemble_base_model_fold_scores`¶

`max_rows_final_blender`¶

`max_rows_final_train_score`¶

`max_rows_final_roccmconf`¶

`max_rows_final_holdout_score`¶

`max_rows_final_holdout_bootstrap_score`¶

`max_rows_leak`¶

`max_workers_fs`¶

`max_workers_shift_leak`¶

`num_folds`¶

`fold_balancing_repeats_times_rows`¶

`max_fold_balancing_repeats`¶

`fixed_split_seed`¶

`show_fold_stats`¶

`allow_different_classes_across_fold_splits`¶

`full_cv_accuracy_switch`¶

`ensemble_accuracy_switch`¶

`num_ensemble_folds`¶

`save_validation_splits`¶

`fold_reps`¶

`max_num_classes_hard_limit`¶

`max_num_classes`¶

`max_num_classes_compute_roc`¶

`max_num_classes_client_and_gui`¶

`roc_reduce_type`¶

`min_roc_sample_size`¶

`max_rows_cm_ga`¶

`num_actuals_vs_predicted`¶

`use_feature_brain_new_experiments`¶

`feature_brain_level`¶

Model/Feature Brain Level (0..10) (Number)

默认值 2

是否显示（或使用）来自 H2O.ai 大脑的结果：对之前的实验执行本地缓存和智能重复使用，以便为新实验生成更多有用特征和模型。请参见 use_feature_brain_new_experiments，了解新实验如何默认不使用大脑缓存。此选项还可用于控制已暂停或中断的实验的检查点。如果缓存文件具有以下特征，DAI 会使用 H2O.ai 大脑缓存：a) 任何匹配的列名称和类型，用于类似实验类型 b) 完全匹配类 c) 完全匹配类标签 d) 匹配基本 Time Series 选择 e) 缓存的可解释性相同或更低 f) 新实验允许使用主模型（提升）。使用的大脑级别（用于选定级别，其中高级别也会自动完成所有低级别操作）-1 = 不使用任何大脑缓存，也不写入任何缓存 0 = 不使用任何大脑缓存，但仍然写入缓存

用例：想要保存模型留待后用，但想要不使用任何大脑模型构建当前模型

1 = 来自最新的最佳个体模型的智能检查点: 用例：想要使用最新的匹配模型，但匹配可能不严谨，因此需要注意
2 = 来自个体最佳模型的 H2O.ai 大脑缓存的智能检查点: 用例：DAI 扫描整个 H2O.ai 大脑缓存，寻找重启所用的最佳模型
3 = 智能检查点（类似于 1 级），但仅对整个群体设置。仅在大脑群体大小不够时才进行调优。: （将在单次迭代中对整个群体重新进行评分，因此完成首次迭代所需的时间看起来会更长）
4 = 智能检查点（类似于 2 级），但仅对整个群体设置。仅在大脑群体大小不够时才进行调优。: （将在单次迭代中对整个群体重新进行评分，因此完成首次迭代所需的时间看起来会更长）
5 = 类似于 4 级，但将扫描群体的整个大脑缓存，以获取最佳得分的个体: （若缓存很大，由于大脑缓存扫描之故，处理速度会较慢）
1000 + feature_brain_level（上述正值）= 使用 resumed_experiment_id 和实际 feature_brain_level，: 以将其他特定的实验作为个体或群体的基础，而非从任何旧实验中采样

GUI 有 3 个选项和对应设置：1) 新实验：使用默认 2 级特征大脑 2) 使用相同设置的新实验：重复使用与父级实验相同的特征大脑级别 3) 从上一个检查点重启：将特征大脑级别重置为 1003 并设置要从中恢复的实验 ID

（继续遗传算法迭代）

重新训练最终管道：类似于 Restart，但 time=0，因此会跳过所有调优，直接进入最终模型（假设父级实验中有至少一个调优迭代）

其他用例：a) 使用不同数据重启：使用相同的列名称和较少或较多的行数（适用于 1 - 5） b) 仅重新拟合最终管道：类似于 (a)，但选择 time=1，且 feature_brain_level=3 - 5 c) 使用更多列重启：添加列，使模型根据使用旧列名称 (1 - 5) 构建的旧模型进行构建 d) 重启时注重模型调优：重启，然后在专家设置中选择 feature_engineering_effort = 3 e) 可以重新训练最终模型，但忽略除最终管道中的特征之外的任何原始特征（正常重新训练，但设置 brain_add_features_for_new_columns=false）注意：1) 所有情况下，首先检查恢复的实验（若给出），然后是大脑缓存 2) 对于重启情况，可能需要将 min_dai_iterations 设置为非零值，以强制实施延迟早停法，否则可能没有足够迭代来查找更好的模型。3) 重启的 “使用相同设置的新实验” 会将 feature_brain_level=1003 用于默认重启模式（恢复至 2，甚至若是要以其他方式启动全新实验，可设置为 0）

`feature_brain_reset_score`¶

`enable_strict_confict_key_check_for_brain`¶

`allow_change_layer_count_brain`¶

`brain_maximum_diff_score`¶

`max_num_brain_indivs`¶

`feature_brain_save_every_iteration`¶

`which_iteration_brain`¶

`refit_same_best_individual`¶

`restart_refit_redo_origfs_shift_leak`¶

`brain_rel_dir`¶

`brain_max_size_GB`¶

`brain_add_features_for_new_columns`¶

`force_model_restart_to_defaults`¶

`early_stopping`¶

`early_stopping_per_individual`¶

`min_dai_iterations`¶

`tensorflow_nlp_have_gpus_in_production`¶

`bert_migration_timeout_secs`¶

`enable_bert_transformer_acceptance_test`¶

`enable_bert_model_acceptance_test`¶

`string_col_as_text_min_relative_cardinality`¶

`string_col_as_text_min_absolute_cardinality`¶

`supported_image_types`¶

`image_paths_absolute`¶

`text_dl_token_pad_percentile`¶

`text_dl_token_pad_max`¶

`tune_parameters_accuracy_switch`¶

`tune_target_transform_accuracy_switch`¶

`target_transformer`¶

`target_transformer_tuning_choices`¶

`tournament_style`¶

`tournament_uniform_style_interpretability_switch`¶

`tournament_uniform_style_accuracy_switch`¶

`tournament_model_style_accuracy_switch`¶

`tournament_feature_style_accuracy_switch`¶

`tournament_fullstack_style_accuracy_switch`¶

`tournament_use_feature_penalized_score`¶

`num_individuals`¶

`fixed_fold_reps`¶

`sanitize_natural_sort_limit`¶

`excluded_transformers`¶

`excluded_genes`¶

`excluded_models`¶

`excluded_pretransformers`¶

`include_all_as_pretransformers_if_none_selected`¶

`force_include_all_as_pretransformers_if_none_selected`¶

`excluded_datas`¶

`excluded_individuals`¶

`excluded_scorers`¶

`enable_glm_rapids`¶

`use_dask_for_1_gpu`¶

`dask_retrials_allreduce_empty_issue`¶

`optuna_pruner_kwargs`¶

`optuna_sampler_kwargs`¶

`use_xgboost_xgbfi`¶

`drop_constant_model_final_ensemble`¶

`xgboost_rf_exact_threshold_num_rows_x_cols`¶

`lossguide_drop_factor`¶

`lossguide_max_depth_extend_factor`¶

`params_lightgbm`¶

`params_xgboost`¶

`params_dart`¶

`params_gblinear`¶

`params_decision_tree`¶

`params_rulefit`¶

`params_ftrl`¶

`params_grownet`¶

`params_tune_lightgbm`¶

`params_tune_xgboost`¶

`params_tune_dart`¶

`params_tune_tensorflow`¶

`params_tune_gblinear`¶

`params_tune_rulefit`¶

`params_tune_ftrl`¶

`params_tune_grownet`¶

`params_tune_grow_policy_simple_trees`¶

`default_max_bin`¶

`default_lightgbm_max_bin`¶

`min_max_bin`¶

`scale_mem_for_max_bin`¶

`factor_rf`¶

`tensorflow_use_all_cores`¶

`tensorflow_use_all_cores_even_if_reproducible_true`¶

`tensorflow_disable_memory_optimization`¶

`tensorflow_cores`¶

`validate_meta_learner`¶

`validate_meta_learner_extra`¶

`fixed_num_folds_evolution`¶

`fixed_num_folds`¶

`fixed_only_first_fold_model`¶

`num_fold_ids_show`¶

`fold_scores_instability_warning_threshold`¶

`feature_evolution_data_size`¶

`final_pipeline_data_size`¶

`max_validation_to_training_size_ratio_for_final_ensemble`¶

`force_stratified_splits_for_imbalanced_threshold_binary`¶

`stratify_for_regression`¶

`imbalance_ratio_multiclass_threshold`¶

`heavy_imbalance_ratio_multiclass_threshold`¶

`imbalance_sampling_rank_averaging`¶

`imbalance_ratio_notification_threshold`¶

`nbins_ftrl_list`¶

`te_bin_list`¶

`woe_bin_list`¶

`ohe_bin_list`¶

`cols_to_drop_sanitized`¶

`cols_to_group_by_sanitized`¶

`default_knob_offset_accuracy`¶

`default_knob_offset_time`¶

`default_knob_offset_interpretability`¶

`shift_check_text`¶

`use_rf_for_shift_if_have_lgbm`¶

`shift_key_features_varimp`¶

`shift_check_reduced_features`¶

`shift_trees`¶

`shift_max_bin`¶

`shift_min_max_depth`¶

`shift_max_max_depth`¶

`detect_features_distribution_shift_threshold_auc`¶

`drop_features_distribution_shift_min_features`¶

`shift_high_notification_level`¶

`leakage_check_text`¶

`leakage_key_features_varimp`¶

`leakage_key_features_varimp_if_no_early_stopping`¶

`leakage_check_reduced_features`¶

`use_rf_for_leakage_if_have_lgbm`¶

`leakage_trees`¶

`leakage_max_bin`¶

`leakage_min_max_depth`¶

`leakage_max_max_depth`¶

`drop_features_leakage_min_features`¶

`leakage_train_test_split`¶

`check_system`¶

`abs_tol_for_perfect_score`¶

`data_ingest_timeout`¶

`gpu_locking_trust_pool_submission`¶

`gpu_locking_free_dead`¶

`tensorflow_allow_cpu_only`¶

`check_pred_contribs_sum`¶

`debug_daimodel_level`¶

`debug_debug_xgboost_splits`¶

`log_predict_info`¶

`log_fit_info`¶

`stalled_time_kill_ref`¶

`num_cpu_sockets_override`¶

`num_gpus_override`¶

`show_gpu_usage_only_if_locked`¶

`show_inapplicable_models_preview`¶

`show_inapplicable_transformers_preview`¶

`show_warnings_preview`¶

`show_warnings_preview_unused_map_features`¶

`max_cols_show_unused_features`¶

`max_cols_show_feature_transformer_mapping`¶

`warning_unused_feature_show_max`¶

`interaction_finder_max_rows_x_cols`¶

`interaction_finder_corr_threshold`¶

`min_bootstrap_samples`¶

`max_bootstrap_samples`¶

`min_bootstrap_sample_size_factor`¶

`max_bootstrap_sample_size_factor`¶

`bootstrap_final_seed`¶

`benford_mad_threshold_int`¶

`benford_mad_threshold_real`¶

`stabilize_features`¶

`fraction_std_bootstrap_ladder_factor`¶

`bootstrap_ladder_samples_limit`¶

`rdelta_percent_score_penalty_per_feature_by_interpretability`¶

`drop_low_meta_weights`¶

`meta_weight_allowed_by_interpretability`¶

`meta_weight_allowed_for_reference`¶

`show_full_pipeline_details`¶

`num_transformed_features_per_pipeline_show`¶

`fs_data_vary_for_interpretability`¶

`fs_data_frac`¶

`many_columns_count`¶

`columns_count_interpretable`¶

`round_up_indivs_for_busy_gpus`¶

`check_timeout_per_gpu`¶

`gpu_exit_if_fails`¶

`require_graphviz`¶

`fast_approx_max_num_trees_ever`¶

`fast_approx_num_trees`¶

`fast_approx_do_one_fold`¶

`fast_approx_do_one_model`¶

`fast_approx_contribs_num_trees`¶

`fast_approx_contribs_do_one_fold`¶

`fast_approx_contribs_do_one_model`¶

`use_187_prob_logic`¶

`enable_ohe_linear`¶

`max_absolute_feature_expansion`¶

`booster_for_fs_permute`¶

`model_class_name_for_fs_permute`¶

`switch_from_tree_to_lgbm_if_can`¶

`textlin_num_classes_switch`¶

`text_gene_dim_reduction_choices`¶

`text_gene_max_ngram`¶

`number_of_texts_to_cache_in_bert_transformer`¶

`gbm_early_stopping_rounds_min`¶

`gbm_early_stopping_rounds_max`¶

`max_varimp_to_save`¶

`max_num_varimp_to_log`¶

`max_num_varimp_shift_to_log`¶

`can_skip_final_upper_layer_failures`¶

`config_overrides`¶

`dump_modelparams_every_scored_indiv_feature_count`¶

`dump_modelparams_every_scored_indiv_mutation_count`¶

`dump_modelparams_separate_files`¶

`delete_preview_trans_timings`¶

`use_random_text_file`¶

`runtime_estimation_train_frame`¶

`enable_bad_scorer`¶

`debug_col_dict_prefix`¶

`return_early_debug_col_dict_prefix`¶

`return_early_debug_preview`¶

`autoviz_enable_recommendations`¶

`autoviz_recommended_transformation`¶

`last_recipe`¶

`make_mojo_scoring_pipeline_for_features_only`¶

`mojo_replace_target_encoding_with_grouped_input_cols`¶

`time_series_causal_split_recipe`¶

`use_lags_if_causal_recipe`¶

`min_ymd_timestamp`¶

`max_ymd_timestamp`¶

`max_rows_datetime_format_detection`¶

`disallowed_datetime_formats`¶

`use_datetime_cache`¶

`datetime_cache_min_rows`¶

`holiday_country`¶

`max_time_series_properties_sample_size`¶

`max_lag_sizes`¶

`min_lag_autocorrelation`¶

`max_signal_lag_sizes`¶

`single_model_vs_cv_score_reldiff`¶

`single_model_vs_cv_score_reldiff2`¶

`blend_in_link_space`¶

`tgc_via_ui_max_ncols`¶

`tgc_dup_tolerance`¶

实验配置¶

max_runtime_minutes¶

max_runtime_minutes_until_abort¶

time_abort¶

time_abort_format¶

time_abort_timezone¶

delete_model_dirs_and_files¶

recipe¶

enable_genetic_algorithm¶

make_python_scoring_pipeline¶

make_mojo_scoring_pipeline¶

inject_mojo_for_predictions¶

mojo_for_predictions¶

mojo_for_predictions_max_rows¶

mojo_for_predictions_batch_size¶

mojo_acceptance_test_rtol¶

mojo_acceptance_test_atol¶

reduce_mojo_size¶

make_pipeline_visualization¶

make_python_pipeline_visualization¶

max_cols_make_autoreport_automatically¶

max_cols_make_pipeline_visualization_automatically¶

pass_env_to_deprecated_python_scoring¶

transformer_description_line_length¶

benchmark_mojo_latency¶

benchmark_mojo_latency_auto_size_limit¶

mojo_building_timeout¶

mojo_building_parallelism¶

max_workers¶

max_cores_dai¶

stall_subprocess_submission_dai_fork_threshold_count¶

stall_subprocess_submission_mem_threshold_pct¶

max_cores_by_physical¶

max_cores_limit¶

assumed_simultaneous_dt_forks_stats_openblas¶

max_max_dt_threads_stats_openblas¶

min_dt_threads_munging¶

min_dt_threads_final_munging¶

max_dt_threads_do_timeseries_split_suggestion¶

kaggle_username¶

kaggle_key¶

kaggle_timeout¶

kaggle_keep_submission¶

kaggle_competitions¶

ping_period¶

ping_autodl¶

disk_limit_gb¶

stall_disk_limit_gb¶

memory_limit_gb¶

min_num_rows¶

min_rows_per_class¶

min_rows_per_split¶

reproducibility_level¶

seed¶

missing_values¶

glm_nan_impute_training_data¶

glm_nan_impute_validation_data¶

glm_nan_impute_prediction_data¶

tf_nan_impute_value¶

statistical_threshold_data_size_small¶

statistical_threshold_data_size_large¶

aux_threshold_data_size_large¶

set_method_sampling_row_limit¶

performance_threshold_data_size_small¶

performance_threshold_data_size_large¶

max_relative_cols_mismatch_allowed¶

max_cols¶

max_rows_col_stats¶

max_rows_cv_in_cv_gini¶

max_rows_constant_model¶

max_rows_final_ensemble_base_model_fold_scores¶

max_rows_final_blender¶

max_rows_final_train_score¶

max_rows_final_roccmconf¶

max_rows_final_holdout_score¶

max_rows_final_holdout_bootstrap_score¶

max_rows_leak¶

max_workers_fs¶

max_workers_shift_leak¶

num_folds¶