Using Enterprise Steam with Python¶
This section describes how to use the Enterprise Steam for Python. Note that each Python request will result in a warning message. These warnings can be ignored.
Downloading and Installing¶
- Go to https://s3.amazonaws.com/steam-release/enterprise-steam/latest-stable.html to retrieve the latest version of Enterprise Steam.
- On the Steam API tab, select Python package that you want to download.
- Open a Terminal window, and navigate to the location where the Python package file was downloaded. For example:
cd ~/Downloads
- Install Enterprise Steam for Python using one of the following methods:
# Install Python whl pip install h2osteam-1.5.1-py2.py3-none-any.whl # Install Conda tar.bz2 # Replace version below with your desired Conda package/Python version conda install h2osteam-1.5.1-py27_0.tar.bz2
Available Functions¶
DaiInstance.client
¶
Use the client
function to connect to the Driverless AI instance via the h2oai_client.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> DaiInstance.client()
DaiInstance.download_logs
¶
Use the download_logs
function to download Driverless AI logs to the specified path.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> DaiInstance.download_logs(path="/dai/logs")
DaiInstance.start
¶
Use the start
function to start the Driverless AI instance after launching the instance.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> DaiInstance.start()
DaiInstance.status
¶
Use the status
function to view the status of a Driverless AI instance.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> DaiInstance.status()
DaiInstance.stop
¶
Use the stop
function to stop a Driverless AI instance that is running.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> DaiInstance.stop()
DaiInstance.terminate
¶
Use the terminate
function to terminate/delete a Driverless AI instance that is either running or stopped.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> DaiInstance.terminate()
get_dai_instance
¶
Use get_dai_instance
to retrieve information about a specific Driverless AI instance in Enterprise Steam using the unique instance name.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
>>> get_dai_instance(name="dai-1-8-0-instance")
get_h2o_cluster
¶
Use get_h2o_cluster
to retrieve information about a specific cluster using the cluster name. This function takes the following parameter:
>>> conn.get_h2o_cluster('first-cluster-from-Python')
{'id': 108, 'connect_params': {'cookies': [u'first-cluster-from-Python=YW5nZWxhOnA1bHRreHN5amo='], 'ip': 'steam.0xdata.loc', 'context_path': u'jsmith_first-cluster-from-Python', 'verify_ssl_certificates': False, 'https': True, 'port': 9999}}
get_h2o_clusters
¶
Use get_h2o_clusters
to retrieve all running H2O clusters accessible to current user.
>>> conn.get_h2o_clusters()
get_sparkling_cluster
¶
Use the get_sparkling_cluster
to retrieve information about a specific Sparkling Water cluster using the cluster name.
>>> conn.get_sparkling_cluster('sparkling-cluster-from-Python')
get_sparkling_clusters
¶
Use the get_sparkling_clusters
to retrieve all running Sparkling Water clusters accessible to current user.
>>> conn.get_sparkling_clusters()
launch_dai_instance
¶
Use the launch_dai_instance
function to start a DAI instance in Enterprise Steam. This function takes the following parameters:
name
: Specify a unique name for this instance.version
: Specify the Driverless AI version.max_server_wait_seconds
: Optionally specify the number of seconds that the server should wait during launch before timing out.
>>> launch_dai_instance(name="dai-1-8-0-instance",
version="1.8.0",
max_server_wait_sec=2*60)
login
¶
In Python, use the login
function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin. You can use your access token instead of a password. This function accepts the following parameters:
url
: Required. Specify the Enterprise Steam URLusername
: Specify the usernamepassword
: Specify the password or access tokenlogin_file
: Specify the path to the login file that contains the username. This can be used instead of the username.login_file_pass
: Specify the path to the login file that contains the password or access token. This can be used instead of specify a password or access token.verify_ssl
: Specify whether to verify SSL certificates. This defaults to True.
$ python
>>> import h2osteam
>>> conn = h2osteam.login(url = "https://steam.0xdata.loc",
verify_ssl = False,
username="jsmith",
password="jsmith")
show_profiles
¶
Use the show_profiles
to show available profiles.
>>> conn.show_profiles(cluster_config)
sparkling_cluster.detail
¶
Use the detail
function of sparkling water cluster to get an information about that sparkling water cluster.
>>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
>>> sparkling_cluster.detail()
sparkling_cluster.send_statement
¶
Use the send_statement
function of sparkling water cluster to send a single statement to the remote spark session.
>>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
>>> sparkling_cluster.send_statement("f_crimes = h2o.import_file(path ="../data/chicagoCrimes10k.csv",col_types =column_type)")
sparkling_cluster.session
¶
Use the session
function of sparkling water cluster to connect to the remote spark session and issue commands.
>>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
>>> sparkling_cluster.session()
sparkling_cluster.stop
¶
Use the stop
function of sparkling water cluster to stop the cluster.
>>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
>>> sparkling_cluster.stop()
start_external_sparkling_cluster
¶
Use the start_external_sparkling_cluster
function to create a new sparkling water cluster using external backend. This function takes the following parameters:
cluster_name
: Specify a name for this cluster.profile_name
: Specify the profile to use for this cluster.h2o_version
: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.driver_cores
: Number of Spark driver coresdriver_memory_gb
: Amount of Spark driver memory in GBnum_executors
: Number of Spark executorsexecutor_cores
: Number of Spark executor coresexecutor_memory_gb
: Amount of Spark executor memory in GBh2o_nodes
: Specify the number of H2O nodes for the cluster.h2o_node_memory_gb
: Specify the amount of memory that should be available on each H2O node.h2o_node_threads
: Specify the number of threads (CPUs) to use per node. Specify 0 to use all available threads.start_timeout_sec
: Specify start timeout in secondsyarn_queue
: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.python_environment_name
: Specify the Python environment name you want to use.spark_properties
: Specify additional spark properties as a Python dictionary.
>>> cluster = conn.start_external_sparkling_cluster(cluster_name="test",
profile_name="default-sparkling-external",
h2o_version="3.26.0.11",
driver_cores=1,
driver_memory_gb=1,
num_executors=1,
executor_cores=1,
executor_memory_gb=1,
h2o_nodes=1,
h2o_node_memory_gb=1,
h2o_node_threads=0,
start_timeout_sec=90,
yarn_queue=None,
python_environment_name="Python 3.7 default",
spark_properties={'spark.python.worker.reuse': 'true', 'key': 'val'})
start_h2o_cluster
¶
Use the start_h2o_cluster
function to create a new cluster. This function takes the following parameters:
cluster_name
: Specify a name for this cluster.profile_name
: Specify the profile to use for this cluster.num_nodes
: Specify the number of nodes for the cluster.node_memory
: Specify the amount of memory that should be available on each node.v_cores
: Specify the number of virtual cores.n_threads
: Specify the number of threads (CPUs) to use in the cluster. Specify 0 to use all available threads.max_idle_time
: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Specify 0 to turn off this setting and allow the cluster to remain idle for an unlimited amount of time.max_uptime
: Specify the maximum number of hours that the cluster can be running. Specify 0 to turn off this setting and allow the cluster to remain up for an unlimited amount of time.extramempercent
: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%.h2o_version
: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.yarn_queue
: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.callback_ip
: Optionally specify the IP address for callback messages from the mapper to the driver (driverif).node_id
: Optionally specify whether to connect to a different leader node.
>>> cluster_config = conn.start_h2o_cluster(cluster_name = 'first-cluster-from-Python',
profile_name = 'default',
num_nodes = 2,
node_memory = '30g',
h2o_version = "3.26.0.11",
max_idle_time = 1,
max_uptime = 1)
# Call the cluster to retrieve its ID and configuration params.
>>> cluster_config
{'id': 107, 'connect_params': {'cookies': [u'first-cluster-from-Python=YW5nZWxhOmdrZm53aGJsdWY='], 'ip': 'steam.0xdata.loc', 'context_path': u'jsmit_first-cluster-from-Python', 'verify_ssl_certificates': False, 'https': True, 'port': 9999}}
Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete Python example.
>>> import h2o
>>> from h2o.estimators.gbm import H2OGradientBoostingEstimator
>>> h2o.connect(config = cluster_config)
# import the cars dataset
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
>>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# convert response column to a factor
>>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
# set the predictor names and the response column name
>>> predictors = ["displacement","power","weight","acceleration","year"]
>>> response = "economy_20mpg"
# split into train and validation sets
>>> train, valid = cars.split_frame(ratios = [.8], seed = 1234)
# initialize your estimator
>>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234)
# train your model, specifying your 'x' predictors,
# your 'y' the response column, training_frame, and validation_frame
>>> cars_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# print the auc for the validation data
>>> cars_gbm.auc(valid=True)
start_internal_sparkling_cluster
¶
Use the start_internal_sparkling_cluster
function to create a new sparkling water cluster using internal backend. This function takes the following parameters:
cluster_name
: Specify a name for this cluster.profile_name
: Specify the profile to use for this cluster.h2o_version
: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.driver_cores
: Number of Spark driver coresdriver_memory_gb
: Amount of Spark driver memory in GBnum_executors
: Number of Spark executorsexecutor_cores
: Number of Spark executor coresexecutor_memory_gb
: Amount of Spark executor memory in GBh2o_node_threads
: Specify the number of threads (CPUs) to use per node. Specify 0 to use all available threads.start_timeout_sec
: Specify start timeout in secondsyarn_queue
: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.python_environment_name
: Specify the Python environment name you want to use.spark_properties
: Specify additional spark properties as a Python dictionary.
>>> cluster = conn.start_internal_sparkling_cluster(cluster_name="test",
profile_name="default-sparkling-internal",
h2o_version="3.26.0.11",
driver_cores=1,
driver_memory_gb=1,
num_executors=1,
executor_cores=1,
executor_memory_gb=1,
h2o_node_threads=0,
start_timeout_sec=90,
yarn_queue=None,
python_environment_name="Python 3.7 default",
spark_properties={'spark.python.worker.reuse': 'true', 'key': 'val'})
stop_h2o_cluster
¶
Use the stop_h2o_cluster
function to stop a cluster.
>>> conn.stop_h2o_cluster(cluster_config)
upload_engine
¶
Use the upload_engine
function to upload H2O engine to Steam.
>>> conn.upload_engine("~/Downloads/h2o-3.26.0.11-hdp2.4.zip")
upload_sparkling_engine
¶
Use the upload_sparkling_engine
function to upload Sparkling Water engine to Steam.
>>> conn.upload_sparkling_engine("~/Downloads/sparkling-water-2.3.17.zip")