Import Dataset with Snowflake Connector

First, we’ll initialize a client with our server credentials and store it in the variable dai.

[1]:
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username="py", password="py")

We can check that the Snowflake connector has been enabled on the Driverless AI server.

[2]:
dai.connectors.list()
[2]:
['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url', 'snow']

The Snowflake connector is considered an advanced connector. Thus, the create methods require a data_source_config argument to use them.

User Defined Snowflake Configuration

Here we manually specify the Snowflake URL, username, and the password.

[3]:
dataset_from_sf = dai.datasets.create(
    data_source="snow",
    name="credit-cards",
    data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM CREDIT_CARD",
    data_source_config=dict(
        snowflake_url="https://my_org.snowflakecomputing.com",
        snowflake_user="myusername",
        snowflake_password="mypassword",
        snowflake_warehouse="DEMO_WH",
        snowflake_database="CREDIT_CARD",
        snowflake_schema="PUBLIC",
    ),
    force=True,
)

dataset_from_sf.head()
Complete 100.00% - [4/4] Computed stats for column BILL_AMT5
[3]:
ID BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5
1 3913 3102 689 0 0
2 2682 1725 2682 3272 3455
3 29239 14027 13559 14331 14948
4 46990 48233 49291 28314 28959
5 8617 5670 35835 20940 19146

Predefined Snowflake Configuration

Here we use a predefined configuration that was setup on the Driverless AI server. We only need to specify the Snowflake warehouse, database, and the schema.

[4]:
dataset_from_sf = dai.datasets.create(
    data_source="snow",
    name="credit-cards",
    data="SELECT ID, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 FROM CREDIT_CARD",
    data_source_config=dict(
        snowflake_warehouse="DEMO_WH",
        snowflake_database="CREDIT_CARD",
        snowflake_schema="PUBLIC",
    ),
    force=True,
)

dataset_from_sf.head()
Complete 100.00% - [4/4] Computed stats for column BILL_AMT5
[4]:
ID BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5
1 3913 3102 689 0 0
2 2682 1725 2682 3272 3455
3 29239 14027 13559 14331 14948
4 46990 48233 49291 28314 28959
5 8617 5670 35835 20940 19146