Using Enterprise Steam with R
-----------------------------

This section describes how to use the Enterprise Steam for R. Note that this requires "urltools". Refer to `https://github.com/Ironholds/urltools/ <https://github.com/Ironholds/urltools/>`__ for more information.

Downloading and Installing
~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Go to  `https://s3.amazonaws.com/steam-release/enterprise-steam/latest-stable.html <https://s3.amazonaws.com/steam-release/enterprise-steam/latest-stable.html>`__ to retrieve the latest version of Enterprise Steam.
2. On the Steam API tab, download the R package.
3. Open a Terminal window, and navigate to the location where the Enterprse Steam file was downloaded. For example:

  ::

    cd ~/Downloads

4. Install Enterprise Steam for R using ``R CMD INSTALL <file_name>``. For example:

  ::

    R CMD INSTALL h2osteam_1.5.0.tar.gz


Available Functions
~~~~~~~~~~~~~~~~~~~


``get_h2o_cluster``
'''''''''''''''''''

Use the ``get_h2o_cluster`` to retrieve information about a specific cluster using the cluster name.

::

  > h2osteam.get_h2o_cluster(conn, 'first-cluster-from-R')
  $id
  [1] 109

  $connect_params
  $connect_params$ip
  [1] "steam.0xdata.loc"

  $connect_params$port
  [1] 9999

  $connect_params$cookies
  [1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g="

  $connect_params$context_path
  [1] "jsmith_first-cluster-from-R"

  $connect_params$https
  [1] TRUE

  $connect_params$insecure
  [1] TRUE


``get_h2o_clusters``
''''''''''''''''''''

Use the ``get_h2o_clusters`` to retrieve all running H2O clusters accessible to current user

::

  > h2osteam.get_h2o_clusters(conn)


``login``
'''''''''

Use the ``login`` function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin. You can use your access token instead of a password. This function takes the following parameters:

- ``url``: The URL of the Enterprise Steam instance
- ``verify_ssl``: Specify True or False to verify SSL certificate
- ``username``: Your username as provided by your Enterprise Steam Admin
- ``password``: Your password as provicded by your Enterprise Steam Admin
- ``login_file``: A login file where user information is stored.
- ``login_file_passphrase``: A login file where user passphrase information is stored.

::

  $ r
  > library(h2osteam)
  > conn <- h2osteam.login(url = "https://steam.0xdata.loc",
                           verify_ssl = F,
                           username="jsmith",
                           password="jsmith")

``show_profiles``
'''''''''''''''''

Use the ``show_profiles`` to show available profiles.

::

  > h2osteam.show_profiles(conn)


``start_h2o_cluster``
'''''''''''''''''''''

Use the ``start_h2o_cluster`` function to create a new cluster. This function takes the following parameters:

- ``cluster_name``: Specify a name for this cluster.
- ``profile_name``: Specify the profile to use for this cluster.
- ``num_nodes``: Specify the number of nodes for the cluster.
- ``node_memory``: Specify the amount of memory that should be available on each node.
- ``v_cores``: Specify the number of virtual cores.
- ``n_threads``: Specify the number of threads (CPUs) to use in the cluster. Specify 0 to use all available threads.
- ``max_idle_time``: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Specify 0 to turn off this setting and allow the cluster to remain idle for an unlimited amount of time.
- ``max_uptime``: Specify the maximum number of hours that the cluster can be running. Specify 0 to turn off this setting and allow the cluster to remain up for an unlimited amount of time.
- ``extramempercent``: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%.
- ``h2o_engine_id``: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
- ``yarn_queue``: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.

::

  > cluster_config <- h2osteam.start_h2o_cluster(conn = conn,
                                                 cluster_name = "first-cluster-from-R",
                                                 profile_name = "default",
                                                 num_nodes = 2,
                                                 node_memory = "30g",
                                                 h2o_version = "3.26.0.1",
                                                 max_idle_time = 1,
                                                 max_uptime = 1)

  # Call the cluster to retrieve its ID and configuration params.
  > cluster_config
  $id
  [1] 109

  $connect_params
  $connect_params$ip
  [1] "steam.0xdata.loc"

  $connect_params$port
  [1] 9999

  $connect_params$cookies
  [1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g="

  $connect_params$context_path
  [1] "jsmith_first-cluster-from-R"

  $connect_params$https
  [1] TRUE

  $connect_params$insecure
  [1] TRUE

Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete R example.

::

  > library(h2o)
  > h2o.connect(config = cluster_config)

  # import the cars dataset
  # this dataset is used to classify whether or not a car is economical based on
  # the car's displacement, power, weight, and acceleration, and the year it was made
  > cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")

  # convert response column to a factor
  > cars["economy_20mpg"] <- as.factor(cars["economy_20mpg"])

  # set the predictor names and the response column name
  > predictors <- c("displacement","power","weight","acceleration","year")
  > response <- "economy_20mpg"

  # split into train and validation sets
  > cars.split <- h2o.splitFrame(data = cars,ratios = 0.8, seed = 1234)
  > train <- cars.split[[1]]
  > valid <- cars.split[[2]]

  # train your model, specifying your 'x' predictors,
  # your 'y' the response column, training_frame, and validation_frame
  > cars_gbm <- h2o.gbm(x = predictors,
                        y = response,
                        training_frame = train,
                        validation_frame = valid,
                        seed = 1234)

  # print the auc for your model
  > print(h2o.auc(cars_gbm, valid = TRUE))


``stop_h2o_cluster``
''''''''''''''''''''

Use the ``stop_h2o_cluster`` function to stop a cluster.

::

  > h2osteam.stop_h2o_cluster(conn, cluster_config)