The DataRobot Automodel Tool allows you to create projects and train models as part of your workflow in Alteryx.
Instance URL: The URL of the DataRobot instance to communicate with. For cloud-based accounts of DataRobot, this will be https://app.datarobot.com.
API Token: The token used to authenticate with the specified DataRobot instance. For cloud version, you can find the API Token under the Profile page. For enterprise version, account information can be found at your internal deployment address.
Project name: This field is optional. If specified, it will become the name of the project inside the DataRobot application. It may be useful for some workflows to specify a unique name so that the new project can be easily found in DataRobot.
Target variable: This field is required. Specify the column in the input data that should be modeled using the rest of the fields in the input data.
Use Quick Autopilot: Checking this box will instruct DataRobot to evaluate a smaller number of models in the DataRobot Autopilot process. It results in a much faster training time, but potentially at the cost of lower accuracy.
Do not run Autopilot: If this checkbox is enabled, the Automodel Tool will create a project and only upload input data to DataRobot's servers. This will not begin the Autopilot process. Use the DataRobot web app to perform additional transformations, or select specific model blueprints to build from your data once the project has been created.
Open project in web browser: With this option selected, the workflow will open the newly created project in your default browser so you can watch the progress of the model training process.
Disable SSL Verification: Some Alteryx servers run behind firewalls that do not allow connections to be made to DataRobot over SSL. Enable this option if your firewall requires SSL verification to be disabled.
Logging Level: Warnings, errors, and info messages are logged to the Alteryx console by default. You can either increase to Debug or decrease to Warning to change the amount of information sent to the console. Increasing the logging level may be helpful when trying to debug network issues or when communicating with DataRobot Support.
Create Support Log: Selecting this option will create a log file on disk. Your DataRobot Support engineer may request that you run your workflow with this option enabled if you are experiencing issues with this tool. It will capture a detailed trace of the full execution of this tool and create a .log file in your %ProgramData%\Alteryx\Support folder. The log file does not contain any customer sensitive information so it is safe to zip up and send to DataRobot for analysis of your issue. (Note: if your account doesn't have Administrator access, the logs will be saved in %AppData%)
Timeout: The number of seconds to wait for both HTTP connections to be established and for asynchronous operations (like project creation or setting the target of a project) to resolve. The default value is 7200 seconds which should be sufficient for most workflows, but this value can be raised to accommodate very large uploads and very long asynchronous operations.
DataRobot creates different types of projects based on the target column. Those three project types are:
Note: If your target column appears numeric in nature but you would like DataRobot to treat it as categorical, we recommend creating a new column with the Alteryx Formula tool and prepend or append a string to the values. For example, this often happens with zip code data.
All columns sent to the tool as input will be used in the DataRobot model generation and will be required as input when requesting predictions. Be sure to remove any columns from the input using the Alteryx Select tool that should not be used in modelling.
DataRobot requires at least 100 rows of data to start modelling. To support advanced modeling analytics (such as reason codes) even more rows of data will be necessary. Conversely, there is a limit to the maximum amount of data that you can use in a project in DataRobot depending on your license restrictions.
Projects and models in DataRobot are immutable, meaning there are few attributes you can change about a project or model after they are created. The recommended workflow is to simply create new projects when you require changes to the modeling data. You can create projects with the same name to simplify automation. However, please note that project and model reference ids will be unique per project.
In DataRobot, column ordering is not important during modeling or computing predictions. There are a few special characters that DataRobot doesn't support in column names but as an Alteryx user, you need not worry about them. Both the Automodel and Prediction tools will silently sanitize the names for you. Please be aware of this fact when browsing the features in the DataRobot Web UI if some of the feature names do not match up to the names you see in Alteryx.
One of the project settings available in the DataRobot platform is the number of workers to make available to each project. The DataRobot Automodel tool will always use the maximum number of workers allowed for your account.
The tool requires a single input. All of the data will be sent to DataRobot to create a single new project and the automated modeling process will begin. One of the fields in the input will be the target of the modeling process, and the rest will be considered valid predictors to use to model the target. The tool will not finish executing until all of the models recommended by DataRobot have been trained and evaluated.
An optional output will send the Project ID and Model ID of the best model created in DataRobot to another tool. This can be consumed by the DataRobot Predict tool in lieu of having to specify these parameters manually.