Examples

This section provides a complete example for using the Enterprise Steam Python client.

Launching and connecting to H2O cluster

This examples shows how to login to Steam and launch H2O cluster with 4 nodes and 10GB of memory per node. The H2O cluster is using H2O version 3.28.0.2 and profile called default-h2o and submitting to the default YARN queue. All other H2O parameters are pre-filled according to the selected profile. When the cluster is up we connect to it and start importing data.

import h2o
import h2osteam
from h2osteam.clients import H2oClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oClient.launch_cluster(name="test-cluster",
                                   profile_name="default-h2o",
                                   version="3.28.0.2",
                                   nodes=4,
                                   node_memory_gb=10)
cluster.connect()
airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"
airlines_df = h2o.import_file(path=airlines)

Connecting to existing H2O cluster

This example shows how to login to Steam and connect to existing H2O cluster called test-cluster and import data.

import h2o
import h2osteam
from h2osteam.clients import H2oClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = H2oClient.get_cluster("test-cluster")
cluster.connect()
airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"
airlines_df = h2o.import_file(path=airlines)

Launching and connecting to Sparkling Water cluster

This examples shows how to login to Steam and launch Sparkling Water cluster with 4 executors and 10GB of memory per executor. The Sparking Water cluster is using Sparkling Water version 3.28.0.2 and profile called default-sparkling-internal and submitting to the default YARN queue. Profile type dictates a cluster backend type. In this case the cluster is starting in the internal mode. All other Sparkling Water parameters are pre-filled according to the selected profile. When the cluster is up we can send statements to the remote Spark session to start importing data.

import h2o
import h2osteam
from h2osteam.clients import SparklingClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = SparklingClient.launch_sparkling_cluster(name="test-sparkling-cluster",
                                                   profile_name="default-sparkling-internal",
                                                   version="3.28.0.2",
                                                   executors=4,
                                                   executor_memory_gb=10,
                                                   yarn_queue="default")

cluster.send_statement('airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"')
cluster.send_statement('airlines_df = h2o.import_file(path=airlines)')

Connecting to existing Sparkling Water cluster

This example shows how to login to Steam and connect to existing Sparkling Water cluster called test-sparkling-cluster and import data.

import h2o
import h2osteam
from h2osteam.clients import SparklingClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
cluster = SparklingClient.get_cluster("test-sparkling-cluster")

cluster.send_statement('airlines = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip"')
cluster.send_statement('airlines_df = h2o.import_file(path=airlines)')

Launching and connecting to Driverless AI instance

This example shows how to create instance of Driverless AI v1.8.4.1, connect to it and upload dataset.

import h2osteam
from h2oai_client import Client
from h2osteam.clients import DaiClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
instance = DaiClient.launch_instance(name="test-instance", version="1.8.4.1")
client = instance.connect()

train_path = '/data/Kaggle/CreditCard/CreditCard-train.csv'
test_path = '/data/Kaggle/CreditCard/CreditCard-test.csv'

train = client.create_dataset_sync(train_path)
test = client.create_dataset_sync(test_path)

Connecting existing Driverless AI instance

This example shows how to connect to existing Driverless AI instance called test-instance and upload dataset.

import h2osteam
from h2oai_client import Client
from h2osteam.clients import DaiClient

h2osteam.login(url="https://steam.h2o.ai:9555", username="user01", password="access-token-here", verify_ssl=True)
instance = DaiClient.get_instance(name="test-instance")
client = instance.connect()

train_path = '/data/Kaggle/CreditCard/CreditCard-train.csv'
test_path = '/data/Kaggle/CreditCard/CreditCard-test.csv'

train = client.create_dataset_sync(train_path)
test = client.create_dataset_sync(test_path)