H2O Steam

h2osteam

h2osteam.login(url=None, username=None, password=None, verify_ssl=True, cacert=None, ca_cert=None)

Connect to an existing Enterprise Server server.

There are two ways to pass password to a server: either pass a server parameter containing an instance of an H2OLocalServer, or specify ip and port of the server that you want to connect to.

Parameters
  • url – Full URL (including schema and port) of the Steam server to connect to. Must use https schema.

  • username – Username of the connecting user.

  • password – Password or user access token of the connecting user.

  • verify_ssl – Setting this to False will disable SSL certificates verification.

  • cacert – (Optional) Path to a CA bundle file or a directory with certificates of trusted CAs.

  • ca_cert – (DEPRECATED) Path to a CA bundle file or a directory with certificates of trusted CAs.

Examples

>>> import h2osteam
>>> url = "https://steam.example.com:9555"
>>> username = "AzureDiamond"
>>> password = "hunter2"
>>> h2osteam.login(url=url, username=username, password=password, verify_ssl=True)
h2osteam.api()

Get direct access to the Steam API for expert users only.

Expert users can bypass the clients for each product and access the Steam API directly. This use-case is not supported and not recommended! If possible use the provided clients!

Examples

>>> import h2osteam
>>> h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="token-here", verify_ssl=True)
>>> api = h2osteam.api()
>>> api
h2osteam.print_profiles()

Prints profiles available to this user.

Prints details about the profiles available to the logged-in user.

Examples

>>> import h2osteam
>>> h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="token-here", verify_ssl=True)
>>> h2osteam.print_profiles()
>>> # Profile name: default-h2o
>>> # Profile type: h2o
>>> # Number of nodes: MIN=1 MAX=10
>>> # Node memory [GB]: MIN=1 MAX=30
>>> # Threads per node: MIN=0 MAX=0
>>> # Extra memory [%]: MIN=10 MAX=50
>>> # Max idle time [hrs]: MIN=1 MAX=24
>>> # Max uptime [hrs]: MIN=1 MAX=24
>>> # YARN virtual cores: MIN=0 MAX=0
>>> # YARN queues:
h2osteam.print_python_environments()

Prints Sparkling Water Python environments available to this user.

Prints details about Sparkling Water Python environments available to the logged-in user.

Examples

>>> import h2osteam
>>> h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="token-here", verify_ssl=True)
>>> h2osteam.print_python_environments()
>>> # Name: Python 2.7 default
>>> # Python Pyspark Path:
>>> # Conda Pack path: lib/conda-pack/python-27-default.tar.gz
>>> # ===
>>> # Name: Python 3.7 default
>>> # Python Pyspark Path:
>>> # Conda Pack path: lib/conda-pack/python-37-default.tar.gz
class h2osteam.AutoDocConfig(template_path=None, template_sections_path=None, sub_template_type=None, main_template_type='docx', float_format='{:6.4g}', data_summary_feat_num=- 1, num_features=20, plot_num_features=20, min_relative_importance=0, stats_quantiles=20, psi_quantiles=10, response_rate_quantiles=10, pdp_feature_list=None, mli_frame=None, ice_frame=None, num_ice_rows=0, cardinality_limit=25, pdp_out_of_range=3, pdp_num_bins=10, warning_shift_auc_threshold=0.8, include_hist=True, use_shapley=True, **kwargs)

This class configures the ml-autodoc. The only required parameter is the output_path, which specifies where the ml-autodoc should be saved. Additional parameters provide control over the document file type, plots, and number of features shown, among other options. While there are multiple configurations, the simplest configuration is to provide the ‘output_path’ alone. If you only specify the file name, the ml-autodoc saves to the document to the current directory.

Parameters
  • output_path – str: Path that specifies where to save the ml-autodoc (i.e, ‘User/username/my_report.docx’).

  • template_path – str, optional: Path to general or custom template. Defaults to None.

  • template_sections_path – str, optional: Path to general or custom template sections. Defaults to None.

  • sub_template_type – str, optional: The document type (e.g., ‘docx’ or ‘md’). Defaults to the main_template_type value.

  • main_template_type – str, optional: The subtemplate type (e.g., ‘docx’ or ‘md’). Defaults to ‘docx’.

  • float_format – str: Format string syntax. Defaults to “{:6.4g}”: total width of 6 with 4 digits after the decimal place, using ‘g’ general format.

  • data_summary_feat_num – int: Number of features to show in data summary. Value must be an integer. Values lower than 1, e.g., 0 or -1, indicate that.

  • num_features – int: The number of top features to display in the document tables. Defaults to 20.

  • plot_num_features – The number of top features to display in the document plots. Defaults to 20.

  • min_relative_importance – The minimum relative importance in order for a feature to be displayed in the feature importance table/plot. Defaults to 0.

  • stats_quantiles – int: The number of quantiles to use for prediction statistics computation. Defaults to 20.

  • psi_quantiles – int: The number of quantiles to use for population stability index computation. Defaults to 10.

  • response_rate_quantiles – int: The number of quantiles to use for response rates information computation. Defaults to 10.

  • pdp_feature_list – list: A list of feature names (str) for which to create partial dependence plots.

  • mli_frame – H2OFrame: An H2OFrame on which the partial dependence and Shapley values will be calculated. If no H2OFrame is specified the training frame is used. Defaults to None.

  • ice_frame – H2OFrame, optional: An H2OFrame on which the individual conditional expectation will be calculated. If no H2OFrame is specified then ice rows will be selected automatically.

  • num_ice_rows – int, optional: The number of rows to be automatically selected for independent conditional expectation from train data. This argument is ignored if ice_frame argument is provided.

  • cardinality_limit – int: The maximum number of categorical levels a feature can have, above which the partial dependence plot will not be generated. Defaults to 25.

  • use_hdfs – bool: Whether to save the document to HDFS. Requires that H2O or Sparkling Water cluster has access to HDFS. Defaults to False.

  • pdp_out_of_range – int: The number of standard deviations, outside of the range of a column, to include in partial dependence plots. This shows how the model will react to data it has not seen before. Defaults to 3.

  • pdp_num_bins – int: The number of bins for the partial dependence plot. Defaults to 10.

  • warning_shift_auc_threshold – float: The threshold for which a warning will be shown, if the auc is greater than or equal to this threshold. Defaults to 0.08.

  • use_shapley – bool: Whether to calculate Shapley values, for algorithms where it is available. Note Shapley value calculations may take a long time for very wide datasets. Defaults to False.

serialize()
class h2osteam.SteamClient(conn=None)

DEPRECATED! This class and its methods are deprecated and they will be removed in v1.8

create_pyspark_python_path_environment(name, path)

DEPRECATED! Create Python Pyspark Path environment.

delete_python_environment(environment_id)

DEPRECATED! Delete Python environment.

static get_h2o_cluster(cluster_name)

DEPRECATED! Get H2O cluster by name.

get_h2o_clusters()

DEPRECATED! Get H2O clusters.

get_python_environments()

DEPRECATED! Get Python environments.

static get_sparkling_cluster(cluster_name)

DEPRECATED! Get Sparkling Water cluster by name.

get_sparkling_clusters()

DEPRECATED! Get Sparkling Water clusters.

static show_profiles()

DEPRECATED! Prints profiles available to this user.

static start_external_sparkling_cluster(cluster_name=None, profile_name=None, h2o_version=None, driver_cores=0, driver_memory_gb=0, num_executors=0, executor_cores=0, executor_memory_gb=0, h2o_nodes=0, h2o_node_memory_gb=0, h2o_node_threads=0, start_timeout_sec=0, yarn_queue=None, python_environment_name='', spark_properties=None)

DEPRECATED! Launch Sparkling Water external backend cluster.

static start_h2o_cluster(cluster_name=None, profile_name=None, num_nodes=0, node_memory=None, v_cores=0, n_threads=0, max_idle_time=0, max_uptime=0, extramempercent=10, h2o_version=None, yarn_queue=None, callback_ip=None, node_id=0)

DEPRECATED! Launch a new H2O cluster.

static start_internal_sparkling_cluster(cluster_name=None, profile_name=None, h2o_version=None, driver_cores=0, driver_memory_gb=0, num_executors=0, executor_cores=0, executor_memory_gb=0, h2o_node_threads=0, start_timeout_sec=0, yarn_queue=None, python_environment_name='', spark_properties=None)

DEPRECATED! Launch Sparkling Water internal backend cluster.

static stop_h2o_cluster(config)

DEPRECATED! Stop H2O cluster.

static upload_conda_environment(name, path)

DEPRECATED! Upload Conda Python environments.

static upload_engine(path)

DEPRECATED! Upload H2O engine.

static upload_sparkling_engine(path)

DEPRECATED! Upload Sparkling Water engine.