DataRobot Predict Tool

The DataRobot Predict Tool allows you to integrate predictions into your workflow in Alteryx.

Overview

Add the DataRobot Predict tool and configure a connection.
Choose a Prediction Type and click "Configure Predictions".
Choose your Deployment or Model.
Attach an input to provide data on which to predict.
- Note: If using a dataset with non-Latin characters, be sure that the Alteryx Input tool has UTF-8 selected as the encoding method. By default Alteryx uses ISO 8859-1 Latin for its encoding.
Attach an output to process or store prediction results.
Proxy's are currently not supported due to an issue in Alteryx's browser/renderer code. Per this discussion, a fix should be coming in Alteryx version 2019.4.

Configuration

DataRobot Connection Options

Instance URL: The URL of the DataRobot instance to communicate with. For cloud-based accounts of DataRobot this will be https://app.datarobot.com.
API Token: The API Token to use to authenticate with the specified DataRobot Host. In the DataRobot Web Application (both cloud and enterprise) you can find the API Token for your account under the "Profile" page.
Prediction Type: Predictions can be made either through a model deployment or through a specific model from a selected project. If predictions are to be made through a model deployment, select "Deployments". If predictions are to be made through a specific model in a project, select "Models".

Advanced Options

Batch Size: By default, this connector will automatically chunk incoming data into batches, so we recommend leaving this field at 0 (the default). However, you can input the number of rows you would like to chunk the data into instead. This can help if you are experiencing timeouts or memory issues on your machine. The connector may still choose to chunk requests into smaller batches if a very large number is selected.
Disable SSL Verification: Some Alteryx servers run behind firewalls that do not allow connections to be made to DataRobot over SSL. Enable this option if your firewall requires SSL verification to be disabled.
Logging Level: Warnings, errors, and info messages are logged to the Alteryx console by default. You can either increase to Debug or decrease to Warning to change the amount of information sent to the console. Increasing the logging level may be helpful when trying to debug network issues or when communicating with DataRobot Support.
Create Support Log: Selecting this option will create a log file on disk. Your DataRobot Support engineer may request that you run your workflow with this option enabled if you are experiencing issues with this tool. It will capture a detailed trace of the full execution of this tool and create a .log file in your %ProgramData%\Alteryx\Support folder. The log file does not contain any customer sensitive information so it is safe to zip up and send to DataRobot for analysis of your issue. (Note: if your account doesn't have Administrator access, the logs will be saved in %AppData%)

DataRobot Deployment Options

Deployment: This specifies the DataRobot model deployment you want to use to generate predictions. This field is searchable via either the deployment name or its ID.
Include prediction explanations: If this box is checked, it means DataRobot will also calculate prediction explanations for each row in addition to the prediction being made. These prediction explanations explain the top reasons why the prediction was made the way it was for that particular row. Note: prediction explanations are not available for all deployment types.
Number of prediction explanations: If you are calculating prediction explanations, this field specifies how many reasons for each row you want to return.
Low Threshold: This field is optional. If specified, it means DataRobot will only calculate prediction explanations for prediction values of this threshold or lower. This will be combined with the high threshold if specified to return predictions for both low and high values. This lets you zero in on explanations for the highest and lowest predictions to help understand your data.
High Threshold: This field is optional. If specified, it means DataRobot will only calculate prediction explanations for prediction values of this threshold or higher. This will be combined with the low threshold if specified to return predictions for both low and high values. This lets you zero in on explanations for the highest and lowest predictions to help understand your data.
Output features: This field is optional. Columns from the original dataset that will be passed along with your DataRobot prediction values in each row. Capped at five, selecting one or more output features can aid in making joins into other datasets much easier.

Time Series Predictions

Relax known in advance features: If this box is checked, DataRobot will ignore missing data that is in a column marked as a known in advance feature. WARNING: The resulting predictions will be less accurate than if the known in advance features were included.

Terminology

Features known in advance: A variable for which you know the value in advance and does not need to be lagged, such as holiday dates. Or, for example, you might know that a product will be on sale next week and so you can provide the pricing information in advance.
Forecast rows: Rows being submitted for predictions. These rows must be blank except for the date field that was used during modeling as well as any known in advance features (unless Relax known in advance features has been selected). This information is only available for the DataRobot deployment owner.
Historical rows: Rows defining a rolling window of data that DataRobot uses to derive predictions from a dataset. The number of required rows back in time from your forecast point is based on the number originally specified when the time series model was created. This information is only available for the DataRobot deployment owner.
Forecast point: An arbitrary point in time for making a prediction. By default this will be the last row in historical rows if not otherwise specified in the DataRobot app.

Important information about time series datasets

Making predictions with time series models requires the dataset to be in a particular format. The format is based on your time series project settings.

For example, if the dataset required:

Historical rows: -5 to -3 days
Forecast rows: +1 to +3 days
Features known in advance: Holiday

...then the prediction dataset should look like:

	Row	Time	Target	Temp.	Holiday
	1	2017-01-03	16,443	72	TRUE
Historical rows	2	2017-01-04	3,013	72	FALSE
	3	2017-01-05	1,643	68	FALSE
	4	2017-01-06	-	-	FALSE
	5	2017-01-07	-	-	FALSE
Forecast point	6	2017-01-08	-	-	FALSE
	7	2017-01-09	-	-	TRUE
Forecast rows	8	2017-01-10	-	-	FALSE
	9	2017-01-11	-	-	FALSE
			Blank cells		Values of features known in advance

DataRobot Model Options

Project: This specifies the DataRobot project you want to use to generate predictions. This field is searchable via the project name or its ID.
Model: This specifies the DataRobot model within that project you want to use to generate predictions. This field is searchable via either the model description or its ID.
Use dedicated prediction server (depricated): This option is only available for compatibility with existing workflows. It is not accessible when the tool is being configured for the first time. For workflows already configured with this option, users can update the configuration but are also invited to switch to using the ‘DataRobot Deployment’ option instead. Workflows using a dedicated prediction server will stop working after DataRobot 5.3 is released.
Include reason codes: If this box is checked, it means DataRobot will also calculate reason codes for each row in addition to the prediction being made. These reason codes explain top reasons why the prediction was made the way it was for that particular row. Note: reason codes are not available for all model types.
Number of reasons codes: If you are calculating reason codes, this field specifies how many reasons for each row you want to return.
Low threshold: This field is optional. If specified, it means DataRobot will only calculate reason codes for prediction values of this threshold or lower. This will be combined with the high threshold if specified to return predictions for both low and high values. This lets you zero in on reasons for the highest and lowest predictions to help understand your data.
High threshold: This field is optional. If specified, it means DataRobot will only calculate reason codes for prediction values of this threshold or higher. This will be combined with the low threshold if specified to return predictions for both low and high values. This lets you zero in on reasons for the highest and lowest predictions to help understand your data.

Inputs

The tool requires a single input identifying the data to send for scoring. All of the data will be sent to your DataRobot model and prediction results returned.

This tool will not finish executing until all of the predictions have been made.

Outputs

This tool will output the prediction results and prediction explanations you have requested. Deployments may also be configured to return additional features you want passed through to downstream tools. Note: this data will not include the raw data that the predictions were made on, but it could be joined to the raw data using the Alteryx Join Tool and selecting the by Record Position option).

For binary classification predictions: if the prediction was made using a deployment, the output will contain the same positive and negative classes that were used during modeling (e.g if 'yes' and 'no' were used in modeling, the tool will return 'yes' and 'no'). If the prediction was made using a model, the output will return 'True' or 'False', regardless of the positive and negative class used during modeling (e.g if 'yes' and 'no' were used in modeling, the tool will return 'True' and 'False').