Using Enterprise Steam with Python
----------------------------------
This section describes how to use the Enterprise Steam for Python. Note that each Python request will result in a warning message. These warnings can be ignored.
Downloading and Installing
~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Go to `https://www.h2o.ai/download-enterprise-steam/ `__.
2. Select the Python version that you want to download and install (for example, https://s3.amazonaws.com/steam-release/enterprise-steam/steam-api/STEAM-/h2osteam-1.1.0-py2.py3-none-any.whl).
3. Open a Terminal window, and navigate to the location where the Python .whl file was downloaded. For example:
::
cd ~/Downloads
4. Install Enterprise Steam for Python using ``pip install ``. For example:
::
pip install h2osteam-1.1.0-py2.py3-none-any.whl
``login``
~~~~~~~~~
In Python, use the ``login`` function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin.
::
$ python
>>> import h2osteam
>>> conn = h2osteam.login(url = "https://steam.0xdata.loc",
verify_ssl = False,
username="jsmith",
password="jsmith")
``start_h2o_cluster``
~~~~~~~~~~~~~~~~~~~~~
Use the ``start_h2o_cluster`` function to create a new cluster. This function takes the following parameters:
- ``cluster_name``: Specify a name for this cluster.
- ``profile_name``: Specify the profile to use for this cluster.
- ``num_nodes``: Specify the number of nodes for the cluster.
- ``node_memory``: Specify the amount of memory that should be available on each node.
- ``v_cores``: Specify the number of virtual cores.
- ``n_threads``: Specify the number of threads (CPUs) to use in the cluster. Specify 0 to use all available threads.
- ``max_idle_time``: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Specify 0 to turn off this setting and allow the cluster to remain idle for an unlimited amount of time.
- ``max_uptime``: Specify the maximum number of hours that the cluster can be running. Specify 0 to turn off this setting and allow the cluster to remain up for an unlimited amount of time.
- ``extramempercent``: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%.
- ``yarn_queue``: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
- ``callback_ip``: Optionally specify the IP address for callback messages from the mapper to the driver (driverif).
- ``h2o_version``: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
::
>>> cluster_config = conn.start_h2o_cluster(cluster_name = 'first-cluster-from-Python',
profile_name = 'default',
num_nodes = 2,
node_memory = '30g',
h2o_version = "3.10.4.1")
# Call the cluster to retrieve its ID and configuration params.
>>> cluster_config
{'id': 107, 'connect_params': {'cookies': [u'first-cluster-from-Python=YW5nZWxhOmdrZm53aGJsdWY='], 'ip': 'steam.0xdata.loc', 'context_path': u'jsmit_first-cluster-from-Python', 'verify_ssl_certificates': False, 'https': True, 'port': 9999}}
Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete Python example.
::
>>> import h2o
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator
>>> h2o.connect(config = cluster_config)
# import the cars dataset
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
>>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# convert response column to a factor
>>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
# set the predictor names and the response column name
>>> predictors = ["displacement","power","weight","acceleration","year"]
>>> response = "economy_20mpg"
# split into train and validation sets
>>> train, valid = cars.split_frame(ratios = [.8], seed = 1234)
# initialize your estimator
>>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234)
# train your model, specifying your 'x' predictors,
# your 'y' the response column, training_frame, and validation_frame
>>> cars_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# print the auc for the validation data
>>> cars_gbm.auc(valid=True)
``get_h2o_cluster``
~~~~~~~~~~~~~~~~~~~
Use the ``get_h2o_cluster`` to retrieve information about a specific cluster using the cluster name.
::
>>> conn.get_h2o_cluster('first-cluster-from-Python')
{'id': 108, 'connect_params': {'cookies': [u'first-cluster-from-Python=YW5nZWxhOnA1bHRreHN5amo='], 'ip': 'steam.0xdata.loc', 'context_path': u'jsmith_first-cluster-from-Python', 'verify_ssl_certificates': False, 'https': True, 'port': 9999}}
``get_h2o_clusters``
~~~~~~~~~~~~~~~~~~~~
Use the ``get_h2o_clusters`` to retrieve all running H2O clusters accessible to current user
::
>>> conn.get_h2o_clusters()
``stop_h2o_cluster``
~~~~~~~~~~~~~~~~~~~~
Use the ``stop_h2o_cluster`` function to stop a cluster.
::
>>> conn.stop_h2o_cluster(cluster_config)