»
Machine Learning Interpretability »
MLI for Regular (Non-Time-Series) Experiments »
Interpretation Expert Settings
Edit on GitHub

Interpretation Expert Settings¶

The following is a list of the Interpretation expert settings that are available when setting up a new interpretation from the MLI page. The name of each setting is preceded by its config.toml label.

MLI Tab
AutoDoc Tab

MLI Tab¶

`mli_lime_method`¶

LIME Method

Select a LIME method of either K-LIME (default) or LIME-SUP.

K-LIME (default): creates one global surrogate GLM on the entire training data and also creates numerous local surrogate GLMs on samples formed from k-means clusters in the training data. The features used for k-means are selected from the Random Forest surrogate model’s variable importance. The number of features used for k-means is the minimum of the top 25% of variables from the Random Forest surrogate model’s variable importance and the max number of variables that can be used for k-means, which is set by the user in the config.toml setting for mli_max_number_cluster_vars. (Note, if the number of features in the dataset are less than or equal to 6, then all features are used for k-means clustering.) The previous setting can be turned off to use all features for k-means by setting use_all_columns_klime_kmeans in the config.toml file to true. All penalized GLM surrogates are trained to model the predictions of the Driverless AI model. The number of clusters for local explanations is chosen by a grid search in which the \(R2\) between the Driverless AI model predictions and all of the local K-LIME model predictions is maximized. The global and local linear model’s intercepts, coefficients, \(R2\) values, accuracy, and predictions can all be used to debug and develop explanations for the Driverless AI model’s behavior.

LIME-SUP: explains local regions of the trained Driverless AI model in terms of the original variables. Local regions are defined by each leaf node path of the decision tree surrogate model instead of simulated, perturbed observation samples - as in the original LIME. For each local region, a local GLM model is trained on the original inputs and the predictions of the Driverless AI model. Then the parameters of this local GLM can be used to generate approximate, local explanations of the Driverless AI model.

`mli_use_raw_features`¶

Use Original Features for Surrogate Models

Specify whether to use original features or transformed features in the surrogate model for the new interpretation. This is enabled by default.

Note: When this setting is disabled, the K-LIME clustering column and quantile binning options are unavailable.

`mli_sample`¶

Sample All Explainers

Specify whether to perform the interpretation on a sample of the training data. By default, MLI will sample the training dataset if it is greater than 100k rows. (The equivalent config.toml setting is mli_sample_size.) This is enabled by default. Turn this toggle off to run MLI on the entire dataset.

`mli_dt_tree_depth`¶

Tree Depth for Decision Tree Surrogate Model

For KLIME interpretations, specify the depth that you want for your decision tree surrogate model. The tree depth value can be a value from 2-5 and defaults to 3. For LIME-SUP interpretations, specify the LIME-SUP tree depth. This can be a value from 2-5 and defaults to 3.

`mli_vars_to_pdp`¶

Number of Features for Partial Dependence Plot

Specify the maximum number of features to use when building the Partial Dependence Plot. Use -1 to calculate Partial Dependence Plot for all features. By default, this value is set to 10.

`mli_nfolds`¶

Cross-validation Folds for Surrogate Models

Specify the number of surrogate cross-validation folds to use (from 0 to 10). When running experiments, Driverless AI automatically splits the training data and uses the validation data to determine the performance of the model parameter tuning and feature engineering steps. For a new interpretation, Driverless AI uses 3 cross-validation folds by default for the interpretation.

`mli_qbin_count`¶

Number of Columns to Bin

Specify the number of columns to bin. This value defaults to 0.

`mli_custom`¶

Add to config.toml via TOML String

Use this input field to add to the Driverless AI server config.toml configuration file with TOML string.

`mli_enable_mojo_scorer`¶

Allow Use of MOJO Scoring Pipeline

Use this option to disable MOJO scoring pipeline. Scoring pipeline is chosen automatically (from MOJO and Python pipelines) by default. In case of certain models, MOJO vs. Python choice can impact pipeline performance and robustness.

`mli_sample_size`¶

Sample Size for Surrogate Models

When the number of rows is above this limit, sample for surrogate models. The default value is 100000.

`mli_shapley_sample_size`¶

Sample Size for Shapley (Original & Transformed)

When the number of rows is above this limit, sample for the MLI Shapley calculation. The default value is 100000.

`mli_sequential_task_execution`¶

Enable Sequential Explainers Execution (Parallel Execution When Disabled)

Specify whether to enable sequential explainers execution. This setting is enabled by default. When this setting is disabled, parallel execution is used.

`mli_dia_sample_size`¶

Sample Size for Disparate Impact Analysis

When the number of rows is above this limit, sample for Disparate Impact Analysis (DIA). The default value is 100000.

`mli_pd_sample_size`¶

Sample Size for Partial Dependence Plot

When number of rows is above this limit, sample for the Driverless AI partial dependence plot. The default value is 25000.

`mli_pd_numcat_num_chart`¶

Unique Feature Values Count Driven Partial Dependence Plot Binning and Chart Selection

Specify whether to use dynamic switching between PDP numeric and categorical binning and UI chart selection in cases where features were used both as numeric and categorical by the experiment. This is enabled by default.

`mli_pd_numcat_threshold`¶

Threshold for PD/ICE Binning and Chart Selection

If mli_pd_numcat_num_chart is enabled, and if the number of unique feature values is greater than the threshold, then numeric binning and chart is used. Otherwise, categorical binning and chart is used. The default threshold value is 11.

`mli_sa_sampling_limit`¶

Sample Size for Sensitivity Analysis (SA)

When the number of rows is above this limit, sample for Sensitivity Analysis (SA). The default value is 500000.

`mli_nlp_sample_limit`¶

Sample Size for NLP Surrogate Models

Specify the maximum number of records on which to perform MLI NLP. The default value is 10000.

`klime_cluster_col`¶

k-LIME Clustering Columns

For k-LIME interpretations, optionally specify which columns to have k-LIME clustering applied to.

Note: This setting is not found in the config.toml file.

`qbin_cols`¶

Quantile Binning Columns

For k-LIME interpretations, specify one or more columns to generate decile bins (uniform distribution) to help with MLI accuracy. Columns selected are added to top n columns for quantile binning selection. If a column is not numeric or not in the dataset (transformed features), then the column will be skipped.

Note: This setting is not found in the config.toml file.

AutoDoc Tab¶

`autodoc_report_name`¶

AutoDoc Name

Specify the name of the AutoDoc.

`autodoc_template`¶

AutoDoc Template Location

Specify the AutoDoc template path. Provide the full path to your custom AutoDoc template. To generate the standard AutoDoc, leave this field empty.

`autodoc_output_type`¶

AutoDoc File Output Type

Specify the AutoDoc file output type. Choose from docx (the default value) and md.

`autodoc_subtemplate_type`¶

AutoDoc Sub-Template Type

Specify the type of sub-templates to use. Choose from the following:

auto (Default)
md
docx

`autodoc_max_cm_size`¶

Confusion Matrix Max Number of Classes

Specify the maximum number of classes in the confusion matrix. This value defaults to 10.

`autodoc_num_features`¶

Number of Top Features to Document

Specify the number of top features to display in the document. To disable this setting, specify -1. This is set to 50 by default.

`autodoc_min_relative_importance`¶

Minimum Relative Feature Importance Threshold

Specify the minimum relative feature importance in order for a feature to be displayed. This value must be a float >= 0 and <= 1. This is set to 0.003 by default.

`autodoc_include_permutation_feature_importance`¶

Permutation Feature Importance

Specify whether to compute permutation-based feature importance. This is disabled by default.

`autodoc_feature_importance_num_perm`¶

Number of Permutations for Feature Importance

Specify the number of permutations to make per feature when computing feature importance. This is set to 1 by default.

`autodoc_feature_importance_scorer`¶

Feature Importance Scorer

Specify the name of the scorer to be used when calculating feature importance. Leave this setting unspecified to use the default scorer for the experiment.

`autodoc_pd_max_rows`¶

PDP and Shapley Summary Plot Max Rows

Specify the number of rows shown for the partial dependence plots (PDP) and Shapley values summary plot in the AutoDoc. Random sampling is used for datasets with more than the autodoc_pd_max_rows limit. This value defaults to 10000.

`autodoc_pd_max_runtime`¶

PDP Max Runtime in Seconds

Specify the maximum number of seconds Partial Dependency computation can take when generating a report. Set to -1 for no time limit.

`autodoc_out_of_range`¶

PDP Out of Range

Specify the number of standard deviations outside of the range of a column to include in partial dependence plots. This shows how the model reacts to data it has not seen before. This is set to 3 by default.

`autodoc_num_rows`¶

ICE Number of Rows

Specify the number of rows to include in PDP and ICE plots if individual rows are not specified. This is set to 0 by default.

`autodoc_population_stability_index`¶

Population Stability Index

Specify whether to include a population stability index if the experiment is a binary classification or regression problem. This is disabled by default.

`autodoc_population_stability_index_n_quantiles`¶

Population Stability Index Number of Quantiles

Specify the number of quantiles to use for the population stability index. This is set to 10 by default.

`autodoc_prediction_stats`¶

Prediction Statistics

Specify whether to include prediction statistics information if the experiment is a binary classification or regression problem. This value is disabled by default.

`autodoc_prediction_stats_n_quantiles`¶

Prediction Statistics Number of Quantiles

Specify the number of quantiles to use for prediction statistics. This is set to 20 by default.

`autodoc_response_rate`¶

Response Rates Plot

Specify whether to include response rates information if the experiment is a binary classification problem. This is disabled by default.

`autodoc_response_rate_n_quantiles`¶

Response Rates Plot Number of Quantiles

Specify the number of quantiles to use for response rates information. This is set to 10 by default.

`autodoc_gini_plot`¶

Show GINI Plot

Specify whether to show the GINI plot. This is disabled by default.

`autodoc_enable_shapley_values`¶

Enable Shapley Values

Specify whether to show Shapley values results in the AutoDoc. This is enabled by default.

`autodoc_global_klime_num_features`¶

Global k-LIME Number of Features

Specify the number of features to show in a k-LIME global GLM coefficients table. This value must be an integer greater than 0 or -1. To show all features, set this value to -1.

`autodoc_global_klime_num_tables`¶

Global k-LIME Number of Tables

Specify the number of k-LIME global GLM coefficients tables to show in the AutoDoc. Set this value to 1 to show one table with coefficients sorted by absolute value. Set this value to 2 to show two tables - one with the top positive coefficients and another with the top negative coefficients. This value is set to 1 by default.

`autodoc_data_summary_col_num`¶

Number of Features in Data Summary Table

Specify the number of features to be shown in the data summary table. This value must be an integer. To show all columns, specify any value lower than 1. This is set to -1 by default.

`autodoc_list_all_config_settings`¶

List All Config Settings

Specify whether to show all config settings. If this is disabled, only settings that have been changed are listed. All settings are listed when enabled. This is disabled by default.

`autodoc_keras_summary_line_length`¶

Keras Model Architecture Summary Line Length

Specify the line length of the Keras model architecture summary. This value must be either an integer greater than 0 or -1. To use the default line length, set this value to -1 (default).

`autodoc_transformer_architecture_max_lines`¶

NLP/Image Transformer Architecture Max Lines

Specify the maximum number of lines shown for advanced transformer architecture in the Feature section. Note that the full architecture can be found in the appendix.

`autodoc_full_architecture_in_appendix`¶

Appendix NLP/Image Transformer Architecture

Specify whether to show the full NLP/Image transformer architecture in the appendix. This is disabled by default.

`autodoc_coef_table_appendix_results_table`¶

Full GLM Coefficients Table in the Appendix

Specify whether to show the full GLM coefficient table(s) in the appendix. This is disabled by default.

`autodoc_coef_table_num_models`¶

GLM Coefficient Tables Number of Models

Specify the number of models for which a GLM coefficients table is shown in the AutoDoc. This value must be -1 or an integer >= 1. Set this value to -1 to show tables for all models. This is set to 1 by default.

`autodoc_coef_table_num_folds`¶

GLM Coefficient Tables Number of Folds Per Model

Specify the number of folds per model for which a GLM coefficients table is shown in the AutoDoc. This value must be be -1 (default) or an integer >= 1 (-1 shows all folds per model).

`autodoc_coef_table_num_coef`¶

GLM Coefficient Tables Number of Coefficients

Specify the number of coefficients to show within a GLM coefficients table in the AutoDoc. This is set to 50 by default. Set this value to -1 to show all coefficients.

`autodoc_coef_table_num_classes`¶

GLM Coefficient Tables Number of Classes

Specify the number of classes to show within a GLM coefficients table in the AutoDoc. Set this value to -1 to show all classes. This is set to 9 by default.

`autodoc_num_histogram_plots`¶

Number of Histograms to Show

Specify the number of top features for which to show histograms. This is set to 10 by default.

Next Previous

Built with Sphinx using a theme provided by Read the Docs.

Interpretation Expert Settings¶

MLI Tab¶

mli_lime_method¶

mli_use_raw_features¶

mli_sample¶

mli_dt_tree_depth¶

mli_vars_to_pdp¶

mli_nfolds¶

mli_qbin_count¶

mli_custom¶

mli_enable_mojo_scorer¶

mli_sample_size¶

mli_shapley_sample_size¶

mli_sequential_task_execution¶

mli_dia_sample_size¶

mli_pd_sample_size¶

mli_pd_numcat_num_chart¶

mli_pd_numcat_threshold¶

mli_sa_sampling_limit¶

mli_nlp_sample_limit¶

klime_cluster_col¶

qbin_cols¶

AutoDoc Tab¶

autodoc_report_name¶

autodoc_template¶

autodoc_output_type¶

autodoc_subtemplate_type¶

autodoc_max_cm_size¶

autodoc_num_features¶

autodoc_min_relative_importance¶

autodoc_include_permutation_feature_importance¶

autodoc_feature_importance_num_perm¶

autodoc_feature_importance_scorer¶

autodoc_pd_max_rows¶

autodoc_pd_max_runtime¶

autodoc_out_of_range¶

autodoc_num_rows¶

autodoc_population_stability_index¶

autodoc_population_stability_index_n_quantiles¶

autodoc_prediction_stats¶

autodoc_prediction_stats_n_quantiles¶

autodoc_response_rate¶

autodoc_response_rate_n_quantiles¶

autodoc_gini_plot¶

autodoc_enable_shapley_values¶

autodoc_global_klime_num_features¶

autodoc_global_klime_num_tables¶

autodoc_data_summary_col_num¶

autodoc_list_all_config_settings¶

autodoc_keras_summary_line_length¶

autodoc_transformer_architecture_max_lines¶

autodoc_full_architecture_in_appendix¶

autodoc_coef_table_appendix_results_table¶

autodoc_coef_table_num_models¶

autodoc_coef_table_num_folds¶

autodoc_coef_table_num_coef¶

autodoc_coef_table_num_classes¶

autodoc_num_histogram_plots¶