Examples¶

This section provides a complete example for using the Enterprise Steam Python client.

Launching and connecting to H2O cluster¶

This examples shows how to login to Steam and launch H2O cluster with 4 nodes and 10GB of memory per node. The H2O cluster is using H2O version 3.28.0.2 and profile called default-h2o and submitting to the default YARN queue. All other H2O parameters are pre-filled according to the selected profile. When the cluster is up we connect to it and start importing data.

import h2o
import h2osteam
from h2osteam.clients import H2oClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oClient.launch_cluster(name="test-cluster",
                                   profile_name="default-h2o",
                                   version="3.28.0.2",
                                   nodes=4,
                                   node_memory_gb=10)
cluster.connect()
airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"
airlines_df = h2o.import_file(path=airlines)

Providing dataset size to preset cluster size¶

This examples shows how to launch H2O cluster providing dataset information. If you are not sure how to exactly size your cluster, you can provide dataset_size_gb and specify whether you are going to use XGBoost algorithm on your cluster with using_xgboost parameter. Setting these parameters will size the cluster accordingly. If your profile does not allow to allocate recommended resources for the cluster, maximum allowed resources will be used. Also any user-specified values of nodes, node_memory_gb, or extra_memory_percent will override recommended values.

import h2o
import h2osteam
from h2osteam.clients import H2oClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oClient.launch_cluster(name="test-cluster",
                                   profile_name="default-h2o",
                                   version="3.28.0.2",
                                   dataset_size_gb=20,
                                   using_xgboost=True)

Connecting to existing H2O cluster¶

This example shows how to login to Steam and connect to existing H2O cluster called test-cluster and import data.

import h2o
import h2osteam
from h2osteam.clients import H2oClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oClient.get_cluster("test-cluster")
cluster.connect()
airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"
airlines_df = h2o.import_file(path=airlines)

Launching and connecting to Sparkling Water cluster¶

This examples shows how to login to Steam and launch Sparkling Water cluster with 4 executors and 10GB of memory per executor. The Sparking Water cluster is using Sparkling Water version 3.28.0.2 and profile called default-sparkling-internal and submitting to the default YARN queue. Profile type dictates a cluster backend type. In this case the cluster is starting in the internal mode. All other Sparkling Water parameters are pre-filled according to the selected profile. When the cluster is up we can send statements to the remote Spark session to start importing data.

import h2o
import h2osteam
from h2osteam.clients import SparklingClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = SparklingClient.launch_sparkling_cluster(name="test-sparkling-cluster",
                                                   profile_name="default-sparkling-internal",
                                                   version="3.28.0.2",
                                                   executors=4,
                                                   executor_memory_gb=10,
                                                   yarn_queue="default")

cluster.send_statement('airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"')
cluster.send_statement('airlines_df = h2o.import_file(path=airlines)')

Providing dataset size to preset Sparkling Water cluster size¶

This examples shows how to launch Sparkling Water cluster providing dataset information. If you are not sure how to exactly size your cluster, you can provide dataset_size_gb and specify whether you are going to use XGBoost algorithm on your cluster with using_xgboost parameter. Setting these parameters will size the cluster accordingly. If your profile does not allow to allocate recommended resources for the cluster, maximum allowed resources will be used. Also any user-specified values of executors, executor_memory_gb, h2o_nodes, h2o_node_memory_gb, or ``h2o_extra_memory_percent` will override recommended values.

import h2o
import h2osteam
from h2osteam.clients import SparklingClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = SparklingClient.launch_sparkling_cluster(name="test-sparkling-cluster",
                                                   profile_name="default-sparkling-internal",
                                                   version="3.28.0.2",
                                                   dataset_size_gb=50,
                                                   using_xgboost=False)

Connecting to existing Sparkling Water cluster¶

This example shows how to login to Steam and connect to existing Sparkling Water cluster called test-sparkling-cluster and import data.

import h2o
import h2osteam
from h2osteam.clients import SparklingClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = SparklingClient.get_cluster("test-sparkling-cluster")

cluster.send_statement('airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"')
cluster.send_statement('airlines_df = h2o.import_file(path=airlines)')

Launching and connecting to Driverless AI instance¶

This example shows how to create instance of Driverless AI v1.8.4.1, connect to it and upload dataset.

import h2osteam
from h2oai_client import Client
from h2osteam.clients import DriverlessClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
instance = DriverlessClient.launch_instance(name="test-instance", version="1.8.4.1")
client = instance.connect()

train_path = '/data/Kaggle/CreditCard/CreditCard-train.csv'
test_path = '/data/Kaggle/CreditCard/CreditCard-test.csv'

train = client.create_dataset_sync(train_path)
test = client.create_dataset_sync(test_path)

Connecting existing Driverless AI instance¶

This example shows how to connect to existing Driverless AI instance called test-instance and upload dataset.

import h2osteam
from h2oai_client import Client
from h2osteam.clients import DriverlessClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
instance = DriverlessClient.get_instance(name="test-instance")
client = instance.connect()

train_path = '/data/Kaggle/CreditCard/CreditCard-train.csv'
test_path = '/data/Kaggle/CreditCard/CreditCard-test.csv'

train = client.create_dataset_sync(train_path)
test = client.create_dataset_sync(test_path)