Using Enterprise Steam with R ----------------------------- This section describes how to use the Enterprise Steam for R. Note that this requires "urltools". Refer to `https://github.com/Ironholds/urltools/ `__ for more information. Downloading and Installing ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Go to `https://s3.amazonaws.com/steam-release/enterprise-steam/latest-stable.html `__ to retrieve the latest version of Enterprise Steam. 2. On the Steam API tab, download the R package. 3. Open a Terminal window, and navigate to the location where the Enterprse Steam file was downloaded. For example: :: cd ~/Downloads 4. Install Enterprise Steam for R using ``R CMD INSTALL ``. For example: :: R CMD INSTALL h2osteam_1.4.9.tar.gz ``login`` ~~~~~~~~~ Use the ``login`` function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin. This function takes the following parameters: - ``url``: The URL of the Enterprise Steam instance - ``verify_ssl``: Specify True or False to verify SSL certificate - ``username``: Your username as provided by your Enterprise Steam Admin - ``password``: Your password as provicded by your Enterprise Steam Admin - ``login_file``: A login file where user information is stored. - ``login_file_passphrase``: A login file where user passphrase information is stored. :: $ r > library(h2osteam) > conn <- h2osteam.login(url = "https://steam.0xdata.loc", verify_ssl = F, username="jsmith", password="jsmith") ``start_h2o_cluster`` ~~~~~~~~~~~~~~~~~~~~~ Use the ``start_h2o_cluster`` function to create a new cluster. This function takes the following parameters: - ``cluster_name``: Specify a name for this cluster. - ``profile_name``: Specify the profile to use for this cluster. - ``num_nodes``: Specify the number of nodes for the cluster. - ``node_memory``: Specify the amount of memory that should be available on each node. - ``v_cores``: Specify the number of virtual cores. - ``n_threads``: Specify the number of threads (CPUs) to use in the cluster. Specify 0 to use all available threads. - ``max_idle_time``: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Specify 0 to turn off this setting and allow the cluster to remain idle for an unlimited amount of time. - ``max_uptime``: Specify the maximum number of hours that the cluster can be running. Specify 0 to turn off this setting and allow the cluster to remain up for an unlimited amount of time. - ``extramempercent``: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%. - ``h2o_engine_id``: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam. - ``yarn_queue``: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces. :: > cluster_config <- h2osteam.start_h2o_cluster(conn = conn, cluster_name = "first-cluster-from-R", profile_name = "default", num_nodes = 2, node_memory = "30g", h2o_version = "3.22.0.1", max_idle_time = 1, max_uptime = 1) # Call the cluster to retrieve its ID and configuration params. > cluster_config $id [1] 109 $connect_params $connect_params$ip [1] "steam.0xdata.loc" $connect_params$port [1] 9999 $connect_params$cookies [1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g=" $connect_params$context_path [1] "jsmith_first-cluster-from-R" $connect_params$https [1] TRUE $connect_params$insecure [1] TRUE Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete R example. :: > library(h2o) > h2o.connect(config = cluster_config) # import the cars dataset # this dataset is used to classify whether or not a car is economical based on # the car's displacement, power, weight, and acceleration, and the year it was made > cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") # convert response column to a factor > cars["economy_20mpg"] <- as.factor(cars["economy_20mpg"]) # set the predictor names and the response column name > predictors <- c("displacement","power","weight","acceleration","year") > response <- "economy_20mpg" # split into train and validation sets > cars.split <- h2o.splitFrame(data = cars,ratios = 0.8, seed = 1234) > train <- cars.split[[1]] > valid <- cars.split[[2]] # train your model, specifying your 'x' predictors, # your 'y' the response column, training_frame, and validation_frame > cars_gbm <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = valid, seed = 1234) # print the auc for your model > print(h2o.auc(cars_gbm, valid = TRUE)) ``get_h2o_cluster`` ~~~~~~~~~~~~~~~~~~~ Use the ``get_h2o_cluster`` to retrieve information about a specific cluster using the cluster name. :: > h2osteam.get_h2o_cluster(conn, 'first-cluster-from-R') $id [1] 109 $connect_params $connect_params$ip [1] "steam.0xdata.loc" $connect_params$port [1] 9999 $connect_params$cookies [1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g=" $connect_params$context_path [1] "jsmith_first-cluster-from-R" $connect_params$https [1] TRUE $connect_params$insecure [1] TRUE ``get_h2o_clusters`` ~~~~~~~~~~~~~~~~~~~~ Use the ``get_h2o_clusters`` to retrieve all running H2O clusters accessible to current user :: > h2osteam.get_h2o_clusters(conn) ``stop_h2o_cluster`` ~~~~~~~~~~~~~~~~~~~~ Use the ``stop_h2o_cluster`` function to stop a cluster. :: > h2osteam.stop_h2o_cluster(conn, cluster_config) ``show_profiles`` ~~~~~~~~~~~~~~~~~ Use the ``show_profiles`` to show available profiles. :: > h2osteam.show_profiles(conn)