Connectors configuration¶
enabled_file_systems
¶
enabled_file_systems (List)
Default value ['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url']
File System Support upload : standard upload feature file : local file system/server file system hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below dtap : Blue Data Tap file system, remember to configure the DTap section below s3 : Amazon S3, optionally configure secret and access key below gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below minio : Minio Cloud Storage, remember to configure secret and access key below snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password) kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args) azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key) jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) hive: Hive Connector, remember to configure Hive below. (hive_app_configs) recipe_file: Custom recipe file upload recipe_url: Custom recipe upload via url h2o_drive: H2O Drive, remember to configure h2o_drive_endpoint_url below feature_store: Feature Store, remember to configure feature_store_endpoint_url below
max_files_listed
¶
max_files_listed (Number)
Default value 100
file_hide_data_directory
¶
file_hide_data_directory (Boolean)
Default value True
The option disable access to DAI data_directory from file browser
file_path_filtering_enabled
¶
file_path_filtering_enabled (Boolean)
Default value False
Enable usage of path filters
file_path_filter_include
¶
file_path_filter_include (List)
Default value []
List of absolute path prefixes to restrict access to in file system browser. First add the following environment variable to your command line to enable this feature: file_path_filtering_enabled=true This feature can be used in the following ways (using specific path or using logged user’s directory): file_path_filter_include=”[‘/data/stage’]” file_path_filter_include=”[‘/data/stage’,’/data/prod’]” file_path_filter_include=/home/{{DAI_USERNAME}}/ file_path_filter_include=”[‘/home/{{DAI_USERNAME}}/’,’/data/stage’,’/data/prod’]”
hdfs_auth_type
¶
hdfs_auth_type (String)
Default value 'noauth'
(Required) HDFS connector Specify HDFS Auth Type, allowed options are: noauth : (default) No authentication needed principal : Authenticate with HDFS with a principal user (DEPRECTATED - use keytab auth type) keytab : Authenticate with a Key tab (recommended). If running
DAI as a service, then the Kerberos keytab needs to be owned by the DAI user.
keytabimpersonation : Login with impersonation using a keytab
hdfs_app_principal_user
¶
hdfs_app_principal_user (String)
Default value ''
Kerberos app principal user. Required when hdfs_auth_type=’keytab’; recommended otherwise.
hdfs_app_login_user
¶
hdfs_app_login_user (String)
Default value ''
Deprecated - Do Not Use, login user is taken from the user name from login
hdfs_app_jvm_args
¶
hdfs_app_jvm_args (String)
Default value ''
JVM args for HDFS distributions, provide args seperate by space -Djava.security.krb5.conf=<path>/krb5.conf -Dsun.security.krb5.debug=True -Dlog4j.configuration=file:///<path>log4j.properties
hdfs_app_classpath
¶
hdfs_app_classpath (String)
Default value ''
hdfs class path
hdfs_app_supported_schemes
¶
hdfs_app_supported_schemes (List)
Default value ['hdfs://', 'maprfs://', 'swift://']
List of supported DFS schemas. Ex. “[‘hdfs://’, ‘maprfs://’, ‘swift://’]” Supported schemas list is used as an initial check to ensure valid input to connector
hdfs_max_files_listed
¶
hdfs_max_files_listed (Number)
Default value 100
Maximum number of files viewable in connector ui. Set to larger number to view more files
hdfs_init_path
¶
hdfs_init_path (String)
Default value 'hdfs://'
Starting HDFS path displayed in UI HDFS browser
hdfs_upload_init_path
¶
hdfs_upload_init_path (String)
Default value 'hdfs://'
Starting HDFS path for the artifacts upload operations
enable_mapr_multi_user_mode
¶
enable_mapr_multi_user_mode (Boolean)
Default value False
Enables the multi-user mode for MapR integration, which allows to have MapR ticket per user.
dtap_auth_type
¶
dtap_auth_type (String)
Default value 'noauth'
Blue Data DTap connector settings are similar to HDFS connector settings.
Specify DTap Auth Type, allowed options are: noauth : No authentication needed principal : Authenticate with DTab with a principal user keytab : Authenticate with a Key tab (recommended). If running
DAI as a service, then the Kerberos keytab needs to be owned by the DAI user.
keytabimpersonation : Login with impersonation using a keytab
NOTE: “hdfs_app_classpath” and “core_site_xml_path” are both required to be set for DTap connector
dtap_config_path
¶
dtap_config_path (String)
Default value ''
Dtap (HDFS) config folder path , can contain multiple config files
dtap_key_tab_path
¶
dtap_key_tab_path (String)
Default value ''
Path of the principal key tab file, dtap_key_tab_path is deprecated. Please use dtap_keytab_path
dtap_keytab_path
¶
dtap_keytab_path (String)
Default value ''
Path of the principal key tab file
dtap_app_principal_user
¶
dtap_app_principal_user (String)
Default value ''
Kerberos app principal user (recommended)
dtap_app_login_user
¶
dtap_app_login_user (String)
Default value ''
Specify the user id of the current user here as user@realm
dtap_app_jvm_args
¶
dtap_app_jvm_args (String)
Default value ''
JVM args for DTap distributions, provide args seperate by space
dtap_app_classpath
¶
dtap_app_classpath (String)
Default value ''
DTap (HDFS) class path. NOTE: set ‘hdfs_app_classpath’ also
dtap_init_path
¶
dtap_init_path (String)
Default value 'dtap://'
Starting DTAP path displayed in UI DTAP browser
aws_access_key_id
¶
aws_access_key_id (String)
Default value ''
S3 Connector credentials
aws_secret_access_key
¶
aws_secret_access_key (String)
Default value ''
S3 Connector credentials
aws_role_arn
¶
aws_role_arn (String)
Default value ''
S3 Connector credentials
aws_default_region
¶
aws_default_region (String)
Default value ''
What region to use when none is specified in the s3 url. Ignored when aws_s3_endpoint_url is set.
aws_s3_endpoint_url
¶
aws_s3_endpoint_url (String)
Default value ''
Sets enpoint URL that will be used to access S3.
aws_use_ec2_role_credentials
¶
aws_use_ec2_role_credentials (Boolean)
Default value False
If set to true S3 Connector will try to to obtain credentials assiciated with the role attached to the EC2 instance.
s3_init_path
¶
s3_init_path (String)
Default value 's3://'
Starting S3 path displayed in UI S3 browser
s3_skip_cert_verification
¶
s3_skip_cert_verification (Boolean)
Default value False
S3 Connector will skip cert verification if this is set to true, (mostly used for S3-like connectors, e.g. Ceph)
s3_connector_cert_location
¶
s3_connector_cert_location (String)
Default value ''
path/to/cert/bundle.pem - A filename of the CA cert bundle to use for the S3 connector
gcs_path_to_service_account_json
¶
gcs_path_to_service_account_json (String)
Default value ''
- GCS Connector credentials
example (suggested) – ‘/licenses/my_service_account_json.json’
gcs_init_path
¶
gcs_init_path (String)
Default value 'gs://'
Starting GCS path displayed in UI GCS browser
minio_endpoint_url
¶
minio_endpoint_url (String)
Default value ''
Minio Connector credentials
minio_access_key_id
¶
minio_access_key_id (String)
Default value ''
Minio Connector credentials
minio_secret_access_key
¶
minio_secret_access_key (String)
Default value ''
Minio Connector credentials
minio_skip_cert_verification
¶
minio_skip_cert_verification (Boolean)
Default value False
Minio Connector will skip cert verification if this is set to true
minio_connector_cert_location
¶
minio_connector_cert_location (String)
Default value ''
path/to/cert/bundle.pem - A filename of the CA cert bundle to use for the Minio connector
minio_init_path
¶
minio_init_path (String)
Default value '/'
Starting Minio path displayed in UI Minio browser
h2o_drive_endpoint_url
¶
h2o_drive_endpoint_url (String)
Default value ''
H2O Drive server endpoint URL
h2o_drive_access_token_scopes
¶
h2o_drive_access_token_scopes (String)
Default value ''
Space seperated list of OpenID scopes for the access token used by the H2O Drive connector
h2o_drive_session_duration
¶
h2o_drive_session_duration (Number)
Default value 10800
Maximum duration (in seconds) for a session with the H2O Drive
snowflake_url
¶
snowflake_url (String)
Default value ''
Recommended Provide: url, user, password Optionally Provide: account, user, password Example URL: https://<snowflake_account>.<region>.snowflakecomputing.com
Snowflake Connector credentials
snowflake_user
¶
snowflake_user (String)
Default value ''
Snowflake Connector credentials
snowflake_password
¶
snowflake_password (String)
Default value ''
Snowflake Connector credentials
snowflake_account
¶
snowflake_account (String)
Default value ''
Snowflake Connector credentials
snowflake_allow_stages
¶
snowflake_allow_stages (Boolean)
Default value True
Setting to allow or disallow Snowflake connector from using Snowflake stages during queries. True - will permit the connector to use stages and generally improves performance. However, if the Snowflake user does not have permission to create/use stages will end in errors. False - will prevent the connector from using stages, thus Snowflake users without permission to create/use stages will have successful queries, however may significantly negatively impact query performance.
snowflake_batch_size
¶
snowflake_batch_size (Number)
Default value 10000
Sets the number of rows to be fetched by Snowflake cursor at one time. This is only used if setting snowflake_allow_stages is set to False, may help with performance depending on the type and size of data being queried.
kdb_user
¶
kdb_user (String)
Default value ''
KDB Connector credentials
kdb_password
¶
kdb_password (String)
Default value ''
KDB Connector credentials
kdb_hostname
¶
kdb_hostname (String)
Default value ''
KDB Connector credentials
kdb_port
¶
kdb_port (String)
Default value ''
KDB Connector credentials
kdb_app_classpath
¶
kdb_app_classpath (String)
Default value ''
KDB Connector credentials
kdb_app_jvm_args
¶
kdb_app_jvm_args (String)
Default value ''
KDB Connector credentials
azure_blob_account_name
¶
azure_blob_account_name (String)
Default value ''
Azure Blob Store Connector credentials
azure_blob_account_key
¶
azure_blob_account_key (String)
Default value ''
Azure Blob Store Connector credentials
azure_connection_string
¶
azure_connection_string (String)
Default value ''
Azure Blob Store Connector credentials
azure_blob_init_path
¶
azure_blob_init_path (String)
Default value 'https://'
Starting Azure blob store path displayed in UI Azure blob store browser
azure_blob_use_access_token
¶
azure_blob_use_access_token (Boolean)
Default value False
When enabled, Azure Blob Store Connector will use access token derived from the credentials received on login with OpenID Connect.
azure_blob_use_access_token_scopes
¶
azure_blob_use_access_token_scopes (String)
Default value 'https://storage.azure.com/.default'
Configures the scopes for the access token used by Azure Blob Store Connector when the azure_blob_use_access_token us enabled. (space separated list)
azure_blob_use_access_token_source
¶
azure_blob_use_access_token_source (String)
Default value 'SESSION'
- Sets the source of the access token for accessing the Azure bob store
- KEYCLOAK: Will exchange the session access token for the federated
refresh token with Keycloak and use it to obtain the access token directly with the Azure AD.
- SESSION: Will use the access token derived from the credentials
received on login with OpenID Connect.
azure_blob_keycloak_aad_client_id
¶
azure_blob_keycloak_aad_client_id (String)
Default value ''
Application (client) ID registered on Azure AD when the KEYCLOAK source is enabled.
azure_blob_keycloak_aad_client_secret
¶
azure_blob_keycloak_aad_client_secret (String)
Default value ''
Application (client) secret when the KEYCLOAK source is enabled.
azure_blob_keycloak_aad_auth_uri
¶
azure_blob_keycloak_aad_auth_uri (String)
Default value ''
A URL that identifies a token authority. It should be of the format https://login.microsoftonline.com/your_tenant
azure_blob_keycloak_broker_token_endpoint
¶
azure_blob_keycloak_broker_token_endpoint (String)
Default value ''
Keycloak Endpoint for Retrieving External IDP Tokens (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
azure_enable_token_auth_aad
¶
azure_enable_token_auth_aad (Boolean)
Default value False
- (DEPRECATED, use azure_blob_use_access_token and
azure_blob_use_access_token_source=”KEYCLOAK” instead.)
(When enabled only DEPRECATED options azure_ad_client_id, azure_ad_client_secret, azure_ad_auth_uri and azure_keycloak_idp_token_endpoint will be effective)
- This is equivalent to setting
azure_blob_use_access_token_source = “KEYCLOAK”
and setting azure_blob_keycloak_aad_client_id, azure_blob_keycloak_aad_client_secret, azure_blob_keycloak_aad_auth_uri and azure_blob_keycloak_broker_token_endpoint options. )
If true, enable the Azure Blob Storage Connector to use Azure AD tokens obtained from the Keycloak for auth.
azure_ad_client_id
¶
azure_ad_client_id (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_client_id instead.) Application (client) ID registered on Azure AD
azure_ad_client_secret
¶
azure_ad_client_secret (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_client_secret instead.) Application Client Secret
azure_ad_auth_uri
¶
azure_ad_auth_uri (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_auth_uri instead)A URL that identifies a token authority. It should be of the format https://login.microsoftonline.com/your_tenant
azure_ad_scopes
¶
azure_ad_scopes (List)
Default value []
(DEPRECATED, use azure_blob_use_access_token_scopes instead.)Scopes requested to access a protected API (a resource).
azure_keycloak_idp_token_endpoint
¶
azure_keycloak_idp_token_endpoint (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_broker_token_endpoint instead.)Keycloak Endpoint for Retrieving External IDP Tokens (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
jdbc_app_configs
¶
jdbc_app_configs (String)
Default value '{}'
Configuration for JDBC Connector. JSON/Dictionary String with multiple keys. Format as a single line without using carriage returns (the following example is formatted for readability). Use triple quotations to ensure that the text is read as a single string. Example: ‘{
- “postgres”: {
“url”: “jdbc:postgresql://ip address:port/postgres”, “jarpath”: “/path/to/postgres_driver.jar”, “classpath”: “org.postgresql.Driver”
}, “mysql”: {
“url”:”mysql connection string”, “jarpath”: “/path/to/mysql_driver.jar”, “classpath”: “my.sql.classpath.Driver”
}
}’
jdbc_app_jvm_args
¶
jdbc_app_jvm_args (String)
Default value '-Xmx4g'
extra jvm args for jdbc connector
jdbc_app_classpath
¶
jdbc_app_classpath (String)
Default value ''
alternative classpath for jdbc connector
hive_app_configs
¶
hive_app_configs (String)
Default value '{}'
Configuration for Hive Connector. Note that inputs are similar to configuring HDFS connectivity. important keys: * hive_conf_path - path to hive configuration, may have multiple files. typically: hive-site.xml, hdfs-site.xml, etc * auth_type - one of noauth, keytab, keytabimpersonation for kerberos authentication * keytab_path - path to the kerberos keytab to use for authentication, can be “” if using noauth auth_type * principal_user - Kerberos app principal user. Required when using auth_type keytab or keytabimpersonation JSON/Dictionary String with multiple keys. Example: ‘{
- “hive_connection_1”: {
“hive_conf_path”: “/path/to/hive/conf”, “auth_type”: “one of [‘noauth’, ‘keytab’, ‘keytabimpersonation’]”, “keytab_path”: “/path/to/<filename>.keytab”, “principal_user”: “hive/localhost@EXAMPLE.COM”,
}, “hive_connection_2”: {
“hive_conf_path”: “/path/to/hive/conf_2”, “auth_type”: “one of [‘noauth’, ‘keytab’, ‘keytabimpersonation’]”, “keytab_path”: “/path/to/<filename_2>.keytab”, “principal_user”: “my_user/localhost@EXAMPLE.COM”,
}
}’
hive_app_jvm_args
¶
hive_app_jvm_args (String)
Default value '-Xmx4g'
Extra jvm args for hive connector
hive_app_classpath
¶
hive_app_classpath (String)
Default value ''
Alternative classpath for hive connector. Can be used to add additional jar files to classpath.
enable_artifacts_upload
¶
enable_artifacts_upload (Boolean)
Default value False
Replace all the downloads on the experiment page to exports and allow users to push to the artifact store configured with artifacts_store
artifacts_store
¶
artifacts_store (String)
Default value 'file_system'
- Artifacts store.
file_system: stores artifacts on a file system directory denoted by artifacts_file_system_directory. s3: stores artifacts to S3 bucket. bitbucket: stores data into Bitbucket repository. azure: stores data into Azure Blob Store. hdfs: stores data into a Hadoop distributed file system location.
bitbucket_skip_cert_verification
¶
bitbucket_skip_cert_verification (Boolean)
Default value False
Decide whether to skip cert verification for Bitbucket when using a repo with HTTPS
bitbucket_tmp_relative_dir
¶
bitbucket_tmp_relative_dir (String)
Default value 'local_git_tmp'
Local temporary directory to clone artifacts to, relative to data_directory
artifacts_file_system_directory
¶
artifacts_file_system_directory (String)
Default value 'tmp'
File system location where artifacts will be copied in case artifacts_store is set to file_system
artifacts_s3_bucket
¶
artifacts_s3_bucket (String)
Default value ''
AWS S3 bucket to be used for storing artifacts.
artifacts_azure_blob_account_name
¶
artifacts_azure_blob_account_name (String)
Default value ''
Azure Blob Store upload credentials
artifacts_azure_blob_account_key
¶
artifacts_azure_blob_account_key (String)
Default value ''
Azure Blob Store upload credentials
artifacts_azure_connection_string
¶
artifacts_azure_connection_string (String)
Default value ''
Azure Blob Store upload credentials
artifacts_git_user
¶
artifacts_git_user (String)
Default value 'git'
Git auth user
artifacts_git_password
¶
artifacts_git_password (String)
Default value ''
Git auth password
artifacts_git_repo
¶
artifacts_git_repo (String)
Default value ''
Git repo where artifacts will be pushed upon and upload
artifacts_git_branch
¶
artifacts_git_branch (String)
Default value 'dev'
Git branch on the remote repo where artifacts are pushed
artifacts_git_ssh_private_key_file_location
¶
artifacts_git_ssh_private_key_file_location (String)
Default value ''
File location for the ssh private key used for git authentication
feature_store_endpoint_url
¶
feature_store_endpoint_url (String)
Default value ''
Feature Store server endpoint URL
feature_store_enable_tls
¶
feature_store_enable_tls (Boolean)
Default value False
Enable TLS communication between DAI and the Feature Store server
feature_store_tls_cert_path
¶
feature_store_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with the Feature Store server. This is only effective when feature_store_enable_tls=True.