Using H2O AutoDoc from Steam ============================ Introduction ------------ Steam users can access H2O AutoDoc through the Steam Client Python API. **Requirements** - Steam Client Python package. Refer to the `Enterprise Steam Download page `__. - H2O-3 Python Client. Refer to the `H2O-3 Download page `__. Download or Save H2O AutoDoc ------------------------------- H2O AutoDoc can allow direct downloads to a client machine or limit downloads to files systems where Steam is running. Options: - :ref:`steam-download-autodoc-ref` - :ref:`steam-save-autodoc-ref` .. _steam-download-autodoc-ref: Download an H2O AutoDoc ~~~~~~~~~~~~~~~~~~~~~~~ This example shows how to download an AutoDoc to your client machine (e.g., if you connect to Steam from a Jupyter notebook on your laptop, the AutoDoc will download to the path you specify on your laptop). .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/autodoc_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") config = AutoDocConfig() cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-save-autodoc-ref: Save an H2O AutoDoc ~~~~~~~~~~~~~~~~~~~ This example shows how to save an AutoDoc to the remote server's file system, where Steam is running. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/autodoc_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") config = AutoDocConfig() cluster.save_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) Configure H2O AutoDoc --------------------- Rendering an H2O AutoDoc requires a running H2O Cluster, a trained model, and access to the datasets used to train the model. This section includes the code examples for setting up a model, along with basic and advanced H2O AutoDoc configurations. To experiment with a complete end-to-end example, run the :ref:`build-h2o-model-ref` code example before running one of the H2O-AutoDoc-specific examples. - Setup: - :ref:`steam-build-h2o-model-ref` - Basic configurations: - :ref:`steam-generate-default-autodoc-ref` - :ref:`steam-specify-report-name-ref` - :ref:`steam-specify-file-type-ref` - Advanced configurations: - :ref:`steam-specify-mli-frame-ref` - :ref:`steam-specify-pdp-features-ref` - :ref:`steam-specify-ice-frame-ref` - :ref:`steam-enable-shapley-values-ref` - :ref:`steam-specify-additional-testsets-ref` - :ref:`steam-specify-alternative-models-ref` .. _steam-build-h2o-model-ref: Building an H2O Model ~~~~~~~~~~~~~~~~~~~~~ First, connect to your Steam-launched H2O-3. .. tabs:: .. code-tab:: python H2O-3 Python # import h2o and connect to running H2O cluster on Steam import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator # import datasets for training and validation train_path = "https://s3.amazonaws.com/h2o-training/events/ibm_index/CreditCard_Cat-train.csv" valid_path ="https://s3.amazonaws.com/h2o-training/events/ibm_index/CreditCard_Cat-test.csv" # import the train and valid dataset train = h2o.import_file(train_path, destination_frame='CreditCard_Cat-train.csv') valid = h2o.import_file(valid_path, destination_frame='CreditCard_Cat-test.csv') # set predictors and response predictors = train.columns predictors.remove('ID') response = "DEFAULT_PAYMENT_NEXT_MONTH" # convert target to factor train[response] = train[response].asfactor() valid[response] = valid[response].asfactor() # assign IDs for later use h2o.assign(train, "CreditCard_TRAIN") h2o.assign(valid, "CreditCard_VALID") # build an H2O-3 GBM Model gbm = H2OGradientBoostingEstimator(model_id="gbm_model", seed=1234) gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid) .. _steam-generate-default-autodoc-ref: Generate a Default H2O AutoDoc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/autodoc_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # use default configuration settings config = AutoDocConfig() # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-specify-report-name-ref: Set the H2O AutoDoc Report Name ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/autodoc_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # use default configuration settings config = AutoDocConfig() # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-specify-file-type-ref: Set the H2O AutoDoc File Type ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The H2O AutoDoc can generate a Word document or markdown file. The default report is a Word document (e.g., docx). **Word Document** .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/autodoc_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # use default configuration settings config = AutoDocConfig() # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) **Markdown File** Note when the **main_template_type** is set to **"md"** a zip file is returned. This zip file contains the markdown file and any images that are linked in the markdown file. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_markdown_report.md" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # set the exported report to markdown ('md') main_template_type = "md" config = AutoDocConfig(main_template_type=main_template_type) # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-specify-mli-frame-ref: Model Interpretation Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The H2O AutoDoc report can include partial dependence plots (PDPs) and Shapley value feature importance. By default, these calculations are done on the training frame. You can use the **mli_frame** (short for machine learning interpretability dataframe) AutoDocConfig parameter to specify a different dataset on which to perform these calculations. In the example below, we will specify that the machine learning interpretability (MLI) calculations are done on our model's validation dataset, instead of the training dataset. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_mli_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # specify the frame id for the H2OFrame on which the partial dependence and Shapley values can be calculated # here 'valid' was created in the Build H2O Model code example mli_frame_id = valid.frame_id config = AutoDocConfig(mli_frame=mli_frame_id) # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-specify-pdp-features-ref: Partial Dependence Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The H2O AutoDoc report includes partial dependence plots (PDPs). By default, PDPs are shown for the top 20 features. This selection is based the model's built-in variable importance (referred to as Native Importance in the report). You can override the default behavior with the **pdp_feature_list** parameter, and specify your own list of features to show in the report. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_pdp_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # specify the features you want PDP plots # here the feature came from predictors used in the Build H2O Model code example. pdp_feature_list = ["EDUCATION", "LIMIT_BAL", "AGE"] config = AutoDocConfig(pdp_feature_list=pdp_feature_list) # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-specify-ice-frame-ref: Specify ICE Records ~~~~~~~~~~~~~~~~~~~~~ The H2O AutoDoc can overlay partial dependence plots with individual conditional expectation (ICE) plots. You can specify which observations (aka rows) you'd like to plot (manual selection), or you can let H2O AutoDoc automatically select observations. **Manual Selection** .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_manual_ice_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # specify the frame id for the H2OFrame composed of the records you want shown in the ICE plots # here 'valid' was created in the Build H2O Model code example - we use the first 2 rows. ice_frame_id = valid[:2, :].frame_id config = AutoDocConfig(ice_frame=ice_frame_id) # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) **Automatic Selection** The **num_ice_rows** AutoDocConfig parameter controls the number of observations selected for an ICE plot. This feature is disabled by default (i.e., set to 0). Observations are selected by binning the predictions into N quantiles and selecting the first observation in each quantile. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_auto_ice_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # specify the number of rows you want automatically selected for ICE plots num_ice_rows = 3 config = AutoDocConfig(num_ice_rows=num_ice_rows) # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-enable-shapley-values-ref: Enable/Disable Shapley Values ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Shapley values are provided for supported H2O-3 Algorithms. (For supported algorithms, see the `H2O-3 user guide `_.) **Note**: Shapley values are enabled by default. They can take a long time, however, to complete for wide datasets. You can disable the Shapley value calculation to speed up your AutoDoc generation. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_shapley_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # enable shapley values use_shapley = True config = AutoDocConfig(use_shapley=use_shapley) # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, ) .. _steam-specify-additional-testsets-ref: Provide Additional Testsets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can provide a list of additional testsets (each of which is an H2OFrame) to the **download_autodoc()** or **save_autodoc()** functions. Performance metrics, plots, and tables will be created for each of these additional datasets. .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_additional_testsets_report.docx" # Example Code: # import AutoDocConfig class from h2osteam import AutoDocConfig # get H2O-3 objects required to create an automatic report model = h2o.get_model("gbm_model") train = h2o.get_frame("CreditCard_TRAIN") # specify additional testsets full_test_data = h2o.import_file("https://s3.amazonaws.com/h2o-training/events/ibm_index/CreditCard_Cat-test.csv") test1, test2 = full_test_data.split_frame(ratios=[.5], seed=1234, destination_frames=['mytest1', 'mytest2']) # enable shapley values config = AutoDocConfig() # download an H2O AutoDoc cluster.download_autodoc( model=model, config=config, train_frame=train, path=output_file_path, additional_testsets=[test1, test2] ) .. _steam-specify-alternative-models-ref: Provide Alternative Models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can provide a list of alternative models to the **download_autodoc()** or **save_autodoc()** functions. This creates alternative model tables with parameters that a user can grid over (i.e, traditional hyperparameters plus parameters that you can grid over). **Code Example** .. tabs:: .. code-tab:: python Steam with H2O-3 Python # Parameters the User Must Set: output_file_path # specify the full path to where you want your AutoDoc saved # replace the path below with your own path output_file_path = "path/to/your/autodoc/my_alternative_models_report.docx" # Example Code: # run AutoML to create several models import h2o from h2o.automl import H2OAutoML from h2osteam import AutoDocConfig # import the titanic dataset from Amazon S3 titanic = h2o.import_file( "https://s3.amazonaws.com/h2o-public-test-data/" "smalldata/gbm_test/titanic.csv", destination_frame="titanic_all", ) # specify the predictors and response predictors = ["home.dest", "cabin", "embarked", "age"] response = "survived" titanic["survived"] = titanic["survived"].asfactor() # split the titanic dataset into train, valid, and test train, valid, test = titanic.split_frame( ratios=[0.8, 0.1], destination_frames=["titanic_train", "titanic_valid", "titanic_test"], ) # run AutoML automl = H2OAutoML(max_models=3, seed=1) automl.train( predictors, response, training_frame=train, validation_frame=valid, ) board = automl.leaderboard.as_data_frame() # build a report on the best performing model best_model = automl.leader # compare the best model to the other models in leaderboard models = [h2o.get_model(x) for x in board["model_id"][1:]] config = AutoDocConfig() # render a report with your best model and alternative models cluster.download_autodoc( model=best_model, config=config, path=output_file_path, train_frame=train, valid_frame=valid, test_frame=test, alternative_models=models, ) Download H2O AutoDoc Logs ------------------------- The H2O AutoDoc logs can be downloaded as a text file using ``download_autodoc_logs()``. Get H2O AutoDoc Logs ~~~~~~~~~~~~~~~~~~~~ .. tabs:: .. code-tab:: python Steam Python # Parameters the User Must Set: logs_path # specify the full path to where you want your AutoDoc logs saved # replace the path below with your own path logs_path = "path/to/your/autodoc/autodoc_report_logs.txt" # Example Code: cluster.download_autodoc_logs(path) # AutoDoc logs saved to autodoc_report_logs.txt