Using Enterprise Steam with Python
----------------------------------

This section describes how to use the Enterprise Steam for Python. Note that each Python request will result in a warning message. These warnings can be ignored.

Downloading and Installing
~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Go to  `https://s3.amazonaws.com/steam-release/enterprise-steam/latest-stable.html <https://s3.amazonaws.com/steam-release/enterprise-steam/latest-stable.html>`__ to retrieve the latest version of Enterprise Steam.
2. On the Steam API tab, select Python package that you want to download. 
3. Open a Terminal window, and navigate to the location where the Python package file was downloaded. For example:

  ::

    cd ~/Downloads

4. Install Enterprise Steam for Python using one of the following methods:

  ::

    # Install Python whl
    pip install h2osteam-1.5.1-py2.py3-none-any.whl

    # Install Conda tar.bz2
    # Replace version below with your desired Conda package/Python version
    conda install h2osteam-1.5.1-py27_0.tar.bz2

Available Functions
~~~~~~~~~~~~~~~~~~~


``DaiInstance.client``
''''''''''''''''''''''

Use the ``client`` function to connect to the Driverless AI instance via the h2oai_client. 

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)

  >>> DaiInstance.client()


``DaiInstance.download_logs``
'''''''''''''''''''''''''''''

Use the ``download_logs`` function to download Driverless AI logs to the specified path.

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)

  >>> DaiInstance.download_logs(path="/dai/logs")


``DaiInstance.start``
'''''''''''''''''''''

Use the ``start`` function to start the Driverless AI instance after launching the instance.

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)

  >>> DaiInstance.start()


``DaiInstance.status``
''''''''''''''''''''''

Use the ``status`` function to view the status of a Driverless AI instance.

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)

  >>> DaiInstance.status()


``DaiInstance.stop``
''''''''''''''''''''

Use the ``stop`` function to stop a Driverless AI instance that is running.

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)

  >>> DaiInstance.stop()


``DaiInstance.terminate``
'''''''''''''''''''''''''

Use the ``terminate`` function to terminate/delete a Driverless AI instance that is either running or stopped. 

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)

  >>> DaiInstance.terminate()


``get_dai_instance``
''''''''''''''''''''

Use ``get_dai_instance`` to retrieve information about a specific Driverless AI instance in Enterprise Steam using the unique instance name.

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)
                          
  >>> get_dai_instance(name="dai-1-8-0-instance")


``get_h2o_cluster``
'''''''''''''''''''

Use ``get_h2o_cluster`` to retrieve information about a specific cluster using the cluster name. This function takes the following parameter:

::

  >>> conn.get_h2o_cluster('first-cluster-from-Python')
  {'id': 108, 'connect_params': {'cookies': [u'first-cluster-from-Python=YW5nZWxhOnA1bHRreHN5amo='], 'ip': 'steam.0xdata.loc', 'context_path': u'jsmith_first-cluster-from-Python', 'verify_ssl_certificates': False, 'https': True, 'port': 9999}}


``get_h2o_clusters``
''''''''''''''''''''

Use ``get_h2o_clusters`` to retrieve all running H2O clusters accessible to current user.

::

  >>> conn.get_h2o_clusters()


``get_sparkling_cluster``
'''''''''''''''''''''''''

Use the ``get_sparkling_cluster`` to retrieve information about a specific Sparkling Water cluster using the cluster name.

::

  >>> conn.get_sparkling_cluster('sparkling-cluster-from-Python')


``get_sparkling_clusters``
''''''''''''''''''''''''''

Use the ``get_sparkling_clusters`` to retrieve all running Sparkling Water clusters accessible to current user.

::

  >>> conn.get_sparkling_clusters()


``launch_dai_instance``
'''''''''''''''''''''''

Use the ``launch_dai_instance`` function to start a DAI instance in Enterprise Steam. This function takes the following parameters:

- ``name``: Specify a unique name for this instance.
- ``version``: Specify the Driverless AI version.
- ``max_server_wait_seconds``: Optionally specify the number of seconds that the server should wait during launch before timing out.

::

  >>> launch_dai_instance(name="dai-1-8-0-instance", 
                          version="1.8.0", 
                          max_server_wait_sec=2*60)


``login``
'''''''''

In Python, use the ``login`` function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin. You can use your access token instead of a password. This function accepts the following parameters:

- ``url``: Required. Specify the Enterprise Steam URL
- ``username``: Specify the username
- ``password``: Specify the password or access token
- ``login_file``: Specify the path to the login file that contains the username. This can be used instead of the username.
- ``login_file_pass``: Specify the path to the login file that contains the password or access token. This can be used instead of specify a password or access token.
- ``verify_ssl``: Specify whether to verify SSL certificates. This defaults to True.

::

  $ python
  >>> import h2osteam
  >>> conn = h2osteam.login(url = "https://steam.0xdata.loc",
                            verify_ssl = False,
                            username="jsmith",
                            password="jsmith")


``show_profiles``
'''''''''''''''''

Use the ``show_profiles`` to show available profiles.

::

  >>> conn.show_profiles(cluster_config)


``sparkling_cluster.detail``
''''''''''''''''''''''''''''

Use the ``detail`` function of sparkling water cluster to get an information about that sparkling water cluster.

::

  >>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
  >>> sparkling_cluster.detail()


``sparkling_cluster.send_statement``
''''''''''''''''''''''''''''''''''''

Use the ``send_statement`` function of sparkling water cluster to send a single statement to the remote spark session.

::

  >>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
  >>> sparkling_cluster.send_statement("f_crimes = h2o.import_file(path ="../data/chicagoCrimes10k.csv",col_types =column_type)")


``sparkling_cluster.session``
'''''''''''''''''''''''''''''

Use the ``session`` function of sparkling water cluster to connect to the remote spark session and issue commands.

::

  >>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
  >>> sparkling_cluster.session()


``sparkling_cluster.stop``
''''''''''''''''''''''''''

Use the ``stop`` function of sparkling water cluster to stop the cluster.

::

  >>> sparkling_cluster = conn.start_internal_sparkling_cluster(.......)
  >>> sparkling_cluster.stop()


``start_external_sparkling_cluster``
''''''''''''''''''''''''''''''''''''

Use the ``start_external_sparkling_cluster`` function to create a new sparkling water cluster using external backend. This function takes the following parameters:

- ``cluster_name``: Specify a name for this cluster.
- ``profile_name``: Specify the profile to use for this cluster.
- ``h2o_version``: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
- ``driver_cores``: Number of Spark driver cores
- ``driver_memory_gb``: Amount of Spark driver memory in GB
- ``num_executors``: Number of Spark executors
- ``executor_cores``: Number of Spark executor cores
- ``executor_memory_gb``: Amount of Spark executor memory in GB
- ``h2o_nodes``: Specify the number of H2O nodes for the cluster.
- ``h2o_node_memory_gb``: Specify the amount of memory that should be available on each H2O node.
- ``h2o_node_threads``: Specify the number of threads (CPUs) to use per node. Specify 0 to use all available threads.
- ``start_timeout_sec``: Specify start timeout in seconds
- ``yarn_queue``: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
- ``python_environment_name``: Specify the Python environment name you want to use.
- ``spark_properties``: Specify additional spark properties as a Python dictionary.

::

  >>> cluster = conn.start_external_sparkling_cluster(cluster_name="test",
                                                      profile_name="default-sparkling-external",
                                                      h2o_version="3.26.0.11",
                                                      driver_cores=1,
                                                      driver_memory_gb=1,
                                                      num_executors=1,
                                                      executor_cores=1,
                                                      executor_memory_gb=1,
                                                      h2o_nodes=1,
                                                      h2o_node_memory_gb=1,
                                                      h2o_node_threads=0,
                                                      start_timeout_sec=90,
                                                      yarn_queue=None,
                                                      python_environment_name="Python 3.7 default",
                                                      spark_properties={'spark.python.worker.reuse': 'true', 'key': 'val'})


``start_h2o_cluster``
'''''''''''''''''''''

Use the ``start_h2o_cluster`` function to create a new cluster. This function takes the following parameters:

- ``cluster_name``: Specify a name for this cluster.
- ``profile_name``: Specify the profile to use for this cluster.
- ``num_nodes``: Specify the number of nodes for the cluster.
- ``node_memory``: Specify the amount of memory that should be available on each node.
- ``v_cores``: Specify the number of virtual cores.
- ``n_threads``: Specify the number of threads (CPUs) to use in the cluster. Specify 0 to use all available threads.
- ``max_idle_time``: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Specify 0 to turn off this setting and allow the cluster to remain idle for an unlimited amount of time.
- ``max_uptime``: Specify the maximum number of hours that the cluster can be running. Specify 0 to turn off this setting and allow the cluster to remain up for an unlimited amount of time.
- ``extramempercent``: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%.
- ``h2o_version``: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
- ``yarn_queue``: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
- ``callback_ip``: Optionally specify the IP address for callback messages from the mapper to the driver (driverif).
- ``node_id``: Optionally specify whether to connect to a different leader node.

::

  >>> cluster_config = conn.start_h2o_cluster(cluster_name = 'first-cluster-from-Python',
                                              profile_name = 'default',
                                              num_nodes = 2,
                                              node_memory = '30g',
                                              h2o_version = "3.26.0.11",
                                              max_idle_time = 1,
                                              max_uptime = 1)

  # Call the cluster to retrieve its ID and configuration params.
  >>> cluster_config
  {'id': 107, 'connect_params': {'cookies': [u'first-cluster-from-Python=YW5nZWxhOmdrZm53aGJsdWY='], 'ip': 'steam.0xdata.loc', 'context_path': u'jsmit_first-cluster-from-Python', 'verify_ssl_certificates': False, 'https': True, 'port': 9999}}

Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete Python example.

::

  >>> import h2o
  >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator
  >>> h2o.connect(config = cluster_config)

  # import the cars dataset
  # this dataset is used to classify whether or not a car is economical based on
  # the car's displacement, power, weight, and acceleration, and the year it was made
  >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")

  # convert response column to a factor
  >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()

  # set the predictor names and the response column name
  >>> predictors = ["displacement","power","weight","acceleration","year"]
  >>> response = "economy_20mpg"

  # split into train and validation sets
  >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234)

  # initialize your estimator
  >>> cars_gbm = H2OGradientBoostingEstimator(seed = 1234)

  # train your model, specifying your 'x' predictors,
  # your 'y' the response column, training_frame, and validation_frame
  >>> cars_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

  # print the auc for the validation data
  >>> cars_gbm.auc(valid=True)


``start_internal_sparkling_cluster``
''''''''''''''''''''''''''''''''''''

Use the ``start_internal_sparkling_cluster`` function to create a new sparkling water cluster using internal backend. This function takes the following parameters:

- ``cluster_name``: Specify a name for this cluster.
- ``profile_name``: Specify the profile to use for this cluster.
- ``h2o_version``: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
- ``driver_cores``: Number of Spark driver cores
- ``driver_memory_gb``: Amount of Spark driver memory in GB
- ``num_executors``: Number of Spark executors
- ``executor_cores``: Number of Spark executor cores
- ``executor_memory_gb``: Amount of Spark executor memory in GB
- ``h2o_node_threads``: Specify the number of threads (CPUs) to use per node. Specify 0 to use all available threads.
- ``start_timeout_sec``: Specify start timeout in seconds
- ``yarn_queue``: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
- ``python_environment_name``: Specify the Python environment name you want to use.
- ``spark_properties``: Specify additional spark properties as a Python dictionary.

::

  >>> cluster = conn.start_internal_sparkling_cluster(cluster_name="test",
                                                      profile_name="default-sparkling-internal",
                                                      h2o_version="3.26.0.11",
                                                      driver_cores=1,
                                                      driver_memory_gb=1,
                                                      num_executors=1,
                                                      executor_cores=1,
                                                      executor_memory_gb=1,
                                                      h2o_node_threads=0,
                                                      start_timeout_sec=90,
                                                      yarn_queue=None,
                                                      python_environment_name="Python 3.7 default",
                                                      spark_properties={'spark.python.worker.reuse': 'true', 'key': 'val'})


``stop_h2o_cluster``
''''''''''''''''''''

Use the ``stop_h2o_cluster`` function to stop a cluster.

::

  >>> conn.stop_h2o_cluster(cluster_config)


``upload_engine``
'''''''''''''''''

Use the ``upload_engine`` function to upload H2O engine to Steam.

::

  >>> conn.upload_engine("~/Downloads/h2o-3.26.0.11-hdp2.4.zip")

``upload_sparkling_engine``
'''''''''''''''''''''''''''

Use the ``upload_sparkling_engine`` function to upload Sparkling Water engine to Steam.

::

  >>> conn.upload_sparkling_engine("~/Downloads/sparkling-water-2.3.17.zip")