Import Dataset with JDBC Connector

First, we’ll initialize a client with our server credentials and store it in the variable dai.

[8]:
import driverlessai

dai = driverlessai.Client(address='http://mr-dl26:12345', username='py', password='py')

We can check that the JDBC connector has been enabled on the Driverless AI server.

[9]:
dai.connectors.list()
[9]:
['file', 'upload', 's3', 'hdfs', 'jdbc']

The JDBC connector is considered an advanced connector. Thus, the create methods require a data_source_config argument to use them.

User Defined JDBC Configuration

Here we manually specify the JDBC jar, JDBC driver, and JDBC server location.

[22]:
dataset_from_jdbc = dai.datasets.create(
    data="SELECT * FROM creditcardtrain",
    data_source="jdbc",
    data_source_config=dict(
        jdbc_jar="/data/jdbc-jars/postgresql-42.2.23.jar",
        jdbc_driver="org.postgresql.Driver",
        jdbc_url="jdbc:postgresql://mr-0xc2:5432/h2oaidev",
        jdbc_username="h2oaitester",
        jdbc_password="h2oaitesterreadonly"
    ),
    name="From JDBC user defined config",
    force=True
)

dataset_from_jdbc.head()
Complete 100.00% - [4/4] Computed stats for column default_payment_next_month
[22]:
id limit_balsex education marriage age pay_1 pay_2 pay_3 pay_4 pay_5 pay_6 bill_amt1 bill_amt2 bill_amt3 bill_amt4 bill_amt5 bill_amt6 pay_amt1 pay_amt2 pay_amt3 pay_amt4 pay_amt5 pay_amt6 default_payment_next_month
1 20000femaleuniversity married 24 2 2 -1 -1 -2 -2 3913 3102 689 0 0 0 0 689 0 0 0 0 1
2 120000femaleuniversity single 26 -1 2 0 0 0 2 2682 1725 2682 3272 3455 3261 0 1000 1000 1000 0 2000 1
3 90000femaleuniversity single 34 0 0 0 0 0 0 29239 14027 13559 14331 14948 15549 1518 1500 1000 1000 1000 5000 0
4 50000femaleuniversity married 37 0 0 0 0 0 0 46990 48233 49291 28314 28959 29547 2000 2019 1200 1100 1069 1000 0
5 50000male university married 57 -1 0 -1 0 0 0 8617 5670 35835 20940 19146 19131 2000 36681 10000 9000 689 679 0

Predefined JDBC Configuration

Here we use a predefined configuration that was setup on the Driverless AI server. We only need to specify the configuration name along with our username and our password.

[23]:
dataset_from_jdbc_postgres = dai.datasets.create(
    data="SELECT * FROM creditcardtrain",
    data_source="jdbc",
    data_source_config=dict(
        jdbc_default_config="postgres",
        jdbc_username="h2oaitester",
        jdbc_password="h2oaitesterreadonly"
    ),
    name="From JDBC postgres config",
    force=True
)

dataset_from_jdbc_postgres.head()
Complete 100.00% - [4/4] Computed stats for column default_payment_next_month
[23]:
id limit_balsex education marriage age pay_1 pay_2 pay_3 pay_4 pay_5 pay_6 bill_amt1 bill_amt2 bill_amt3 bill_amt4 bill_amt5 bill_amt6 pay_amt1 pay_amt2 pay_amt3 pay_amt4 pay_amt5 pay_amt6 default_payment_next_month
1 20000femaleuniversity married 24 2 2 -1 -1 -2 -2 3913 3102 689 0 0 0 0 689 0 0 0 0 1
2 120000femaleuniversity single 26 -1 2 0 0 0 2 2682 1725 2682 3272 3455 3261 0 1000 1000 1000 0 2000 1
3 90000femaleuniversity single 34 0 0 0 0 0 0 29239 14027 13559 14331 14948 15549 1518 1500 1000 1000 1000 5000 0
4 50000femaleuniversity married 37 0 0 0 0 0 0 46990 48233 49291 28314 28959 29547 2000 2019 1200 1100 1069 1000 0
5 50000male university married 57 -1 0 -1 0 0 0 8617 5670 35835 20940 19146 19131 2000 36681 10000 9000 689 679 0