Import Dataset with JDBC Connector

First, we’ll initialize a client with our server credentials and store it in the variable dai.

[8]:

import driverlessai

dai = driverlessai.Client(address='http://mr-dl26:12345', username='py', password='py')

We can check that the JDBC connector has been enabled on the Driverless AI server.

[9]:

dai.connectors.list()

[9]:

['file', 'upload', 's3', 'hdfs', 'jdbc']

The JDBC connector is considered an advanced connector. Thus, the create methods require a data_source_config argument to use them.

User Defined JDBC Configuration

Here we manually specify the JDBC jar, JDBC driver, and JDBC server location.

[22]:

dataset_from_jdbc = dai.datasets.create(
    data="SELECT * FROM creditcardtrain",
    data_source="jdbc",
    data_source_config=dict(
        jdbc_jar="/data/jdbc-jars/postgresql-42.2.23.jar",
        jdbc_driver="org.postgresql.Driver",
        jdbc_url="jdbc:postgresql://mr-0xc2:5432/h2oaidev",
        jdbc_username="h2oaitester",
        jdbc_password="h2oaitesterreadonly"
    ),
    name="From JDBC user defined config",
    force=True
)

dataset_from_jdbc.head()

Complete 100.00% - [4/4] Computed stats for column default_payment_next_month

[22]:

id	limit_bal	sex	education	marriage	age	pay_1	pay_2	pay_3	pay_4	pay_5	pay_6	bill_amt1	bill_amt2	bill_amt3	bill_amt4	bill_amt5	bill_amt6	pay_amt1	pay_amt2	pay_amt3	pay_amt4	pay_amt5	pay_amt6	default_payment_next_month
1	20000	female	university	married	24	2	2	-1	-1	-2	-2	3913	3102	689	0	0	0	0	689	0	0	0	0	1
2	120000	female	university	single	26	-1	2	0	0	0	2	2682	1725	2682	3272	3455	3261	0	1000	1000	1000	0	2000	1
3	90000	female	university	single	34	0	0	0	0	0	0	29239	14027	13559	14331	14948	15549	1518	1500	1000	1000	1000	5000	0
4	50000	female	university	married	37	0	0	0	0	0	0	46990	48233	49291	28314	28959	29547	2000	2019	1200	1100	1069	1000	0
5	50000	male	university	married	57	-1	0	-1	0	0	0	8617	5670	35835	20940	19146	19131	2000	36681	10000	9000	689	679	0

Predefined JDBC Configuration

Here we use a predefined configuration that was setup on the Driverless AI server. We only need to specify the configuration name along with our username and our password.

[23]:

dataset_from_jdbc_postgres = dai.datasets.create(
    data="SELECT * FROM creditcardtrain",
    data_source="jdbc",
    data_source_config=dict(
        jdbc_default_config="postgres",
        jdbc_username="h2oaitester",
        jdbc_password="h2oaitesterreadonly"
    ),
    name="From JDBC postgres config",
    force=True
)

dataset_from_jdbc_postgres.head()

Complete 100.00% - [4/4] Computed stats for column default_payment_next_month

[23]:

id	limit_bal	sex	education	marriage	age	pay_1	pay_2	pay_3	pay_4	pay_5	pay_6	bill_amt1	bill_amt2	bill_amt3	bill_amt4	bill_amt5	bill_amt6	pay_amt1	pay_amt2	pay_amt3	pay_amt4	pay_amt5	pay_amt6	default_payment_next_month
1	20000	female	university	married	24	2	2	-1	-1	-2	-2	3913	3102	689	0	0	0	0	689	0	0	0	0	1
2	120000	female	university	single	26	-1	2	0	0	0	2	2682	1725	2682	3272	3455	3261	0	1000	1000	1000	0	2000	1
3	90000	female	university	single	34	0	0	0	0	0	0	29239	14027	13559	14331	14948	15549	1518	1500	1000	1000	1000	5000	0
4	50000	female	university	married	37	0	0	0	0	0	0	46990	48233	49291	28314	28959	29547	2000	2019	1200	1100	1069	1000	0
5	50000	male	university	married	57	-1	0	-1	0	0	0	8617	5670	35835	20940	19146	19131	2000	36681	10000	9000	689	679	0