Mli configuration¶
h2o_mli_nthreads
¶
h2o_mli_nthreads (Number)
Default value 8
Number of threads for H2O instance for use by MLI.
mli_sample_above_for_scoring
¶
mli_sample_above_for_scoring (Number)
Default value 1000000
When number of rows are above this limit sample for MLI for scoring UI data.
mli_sample_above_for_training
¶
mli_sample_above_for_training (Number)
Default value 100000
When number of rows are above this limit sample for MLI for training surrogate models.
mli_interpreter_status_cache_size
¶
mli_interpreter_status_cache_size (Number)
Default value 1000
Maximum number of interpreters status cache entries.
mli_sample_training
¶
mli_sample_training (Boolean)
Default value True
not only sample training, but also sample scoring.
mli_strict_version_check
¶
mli_strict_version_check (Boolean)
Default value True
Strict version check for MLI
mli_cloud_name
¶
mli_cloud_name (String)
Default value 'H2O-MLI-DAI'
MLI cloud name
mli_ice_per_bin_strategy
¶
mli_ice_per_bin_strategy (Boolean)
Default value False
Compute original model ICE using per feature’s bin predictions (true) or use “one frame” strategy (false).
mli_dia_default_max_cardinality
¶
mli_dia_default_max_cardinality (Number)
Default value 10
By default DIA will run for categorical columns with cardinality <= mli_dia_default_max_cardinality.
mli_dia_default_min_cardinality
¶
mli_dia_default_min_cardinality (Number)
Default value 2
By default DIA will run for categorical columns with cardinality >= mli_dia_default_min_cardinality.
enable_mli_keeper
¶
enable_mli_keeper (Boolean)
Default value True
Enable MLI keeper which ensures efficient use of filesystem/memory/DB by MLI.
enable_mli_sa
¶
enable_mli_sa (Boolean)
Default value True
Enable MLI Sensitivity Analysis
enable_mli_priority_queues
¶
enable_mli_priority_queues (Boolean)
Default value True
Enable priority queues based explainers execution. Priority queues restrict available system resources and prevent system over-utilization. Interpretation execution time might be (significantly) slower.
mli_sequential_task_execution
¶
mli_sequential_task_execution (Boolean)
Default value True
Explainers are run sequentially by default. This option can be used to run all explainers in parallel which can - depending on hardware strength and the number of explainers - decrease interpretation duration. Consider explainer dependencies, random explainers order and hardware over utilization.
mli_dia_sample_size
¶
Sample size for Disparate Impact Analysis (Number)
Default value 100000
When number of rows are above this limit, then sample for Disparate Impact Analysis.
mli_pd_sample_size
¶
Sample size for Partial Dependence Plot (Number)
Default value 25000
When number of rows are above this limit, then sample for Partial Dependence Plot.
mli_pd_numcat_num_chart
¶
Unique feature values count driven Partial Dependence Plot binning and chart selection (Boolean)
Default value True
Use dynamic switching between Partial Dependence Plot numeric and categorical binning and UI chart selection in case of features which were used both as numeric and categorical by experiment.
mli_pd_numcat_threshold
¶
Threshold for Partial Dependence Plot binning and chart selection (<=threshold categorical, >threshold numeric) (Number)
Default value 11
If ‘mli_pd_numcat_num_chart’ is enabled, then use numeric binning and chart if feature unique values count is bigger than threshold, else use categorical binning and chart.
new_mli_list_only_explainable_datasets
¶
new_mli_list_only_explainable_datasets (Boolean)
Default value False
In New Interpretation screen show only datasets which can be used to explain a selected model. This can slow down the server significantly.
enable_mli_async_api
¶
enable_mli_async_api (Boolean)
Default value True
Enable async/await-based non-blocking MLI API
enable_mli_sa_main_chart_aggregator
¶
enable_mli_sa_main_chart_aggregator (Boolean)
Default value True
Enable main chart aggregator in Sensitivity Analysis
mli_sa_sampling_limit
¶
Sample size for SA (Number)
Default value 500000
When to sample for Sensitivity Analysis (number of rows after sampling).
mli_sa_main_chart_aggregator_limit
¶
mli_sa_main_chart_aggregator_limit (Number)
Default value 1000
Run main chart aggregator in Sensitivity Analysis when the number of dataset instances is bigger than given limit.
mli_predict_safe
¶
mli_predict_safe (Boolean)
Default value False
Use predict_safe() (true) or predict_base() (false) in MLI (PD, ICE, SA, …).
mli_max_surrogate_retries
¶
mli_max_surrogate_retries (Number)
Default value 5
Number of max retries should the surrogate model fail to build.
enable_mli_symlinks
¶
enable_mli_symlinks (Boolean)
Default value True
Allow use of symlinks (instead of file copy) by MLI explainer procedures.
h2o_mli_fraction_memory
¶
h2o_mli_fraction_memory (Float)
Default value 0.45
Fraction of memory to allocate for h2o MLI jar
excluded_mli_explainers
¶
Exclude specific explainers by explainer ID (List)
Default value []
To exclude e.g. Sensitivity Analysis explainer use: excluded_mli_explainers=[‘h2oaicore.mli.byor.recipes.sa_explainer.SaExplainer’].
enable_ws_perfmon
¶
enable_ws_perfmon (Boolean)
Default value False
Enable RPC API performance monitor.
mli_kernel_explainer_workers
¶
mli_kernel_explainer_workers (Number)
Default value 4
Number of parallel workers when scoring using MOJO in Kernel Explainer.
mli_run_kernel_explainer
¶
Use Kernel Explainer to obtain Shapley values for original features (Boolean)
Default value False
Use Kernel Explainer to obtain Shapley values for original features.
mli_kernel_explainer_sample
¶
Sample input dataset for Kernel Explainer (Boolean)
Default value True
Sample input dataset for Kernel Explainer.
mli_kernel_explainer_sample_size
¶
Sample size for input dataset passed to Kernel Explainer (Number)
Default value 1000
Sample size for input dataset passed to Kernel Explainer.
mli_kernel_explainer_nsamples
¶
Number of times to re-evaluate the model when explaining each prediction with Kernel Explainer. Default is determined internally (String)
Default value 'auto'
‘auto’ or int. Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the SHAP values. The ‘auto’ setting uses nsamples = 2 * X.shape[1] + 2048. This setting is disabled by default and DAI determines the right number internally.
mli_kernel_explainer_l1_reg
¶
L1 regularization for Kernel Explainer (String)
Default value 'aic'
‘num_features(int)’, ‘auto’ (default for now, but deprecated), ‘aic’, ‘bic’, or float. The l1 regularization to use for feature selection (the estimation procedure is based on a debiased lasso). The ‘auto’ option currently uses aic when less that 20% of the possible sample space is enumerated, otherwise it uses no regularization. THE BEHAVIOR OF ‘auto’ WILL CHANGE in a future version to be based on ‘num_features’ instead of AIC. The aic and bic options use the AIC and BIC rules for regularization. Using ‘num_features(int)’ selects a fix number of top features. Passing a float directly sets the alpha parameter of the sklearn.linear_model.Lasso model used for feature selection.
mli_kernel_explainer_max_runtime
¶
Max runtime for Kernel Explainer in seconds (Number)
Default value 900
Max runtime for Kernel Explainer in seconds. Default is 900, which equates to 15 minutes. Setting this parameter to -1 means to honor the Kernel Shapley sample size provided regardless of max runtime.
mli_nlp_tokenizer
¶
mli_nlp_tokenizer (String)
Default value 'tfidf'
Tokenizer used to extract tokens from text columns for MLI.
mli_image_enable
¶
mli_image_enable (Boolean)
Default value True
Enable MLI for image experiments.