Driverless AI MOJO Scoring Pipeline - C++ Runtime with Python and R Wrappers¶
The C++ Scoring Pipeline is provided as R and Python packages for the protobuf-based MOJO2 protocol. The packages are self contained, so no additional software is required. Simply build the MOJO Scoring Pipeline and begin using your preferred method. To download the MOJO Scoring Pipeline onto your local machine, click the Download MOJO Scoring Pipeline button, then click the same button again in the pop-up menu that appears. Refer to the provided instructions for Java, Python, or R.
Notes:
- MOJOs are currently not available for RuleFit or FTRL models.
- The Download MOJO Scoring Pipeline button appears as Build MOJO Scoring Pipeline if the MOJO Scoring Pipeline is disabled.
Examples¶
The following examples show how to use the R and Python APIs of the C++ MOJO runtime.
R Example¶
Prerequisites¶
Rcpp
(>=1.0.0)data.table
- Driverless AI License (either file or environment variable)
Running the MOJO2 R Runtime¶
# Install the R MOJO runtime
install.packages("./daimojo_2.0.1.tar.gz")
# Load the MOJO
library(daimojo)
m <- load.mojo("./mojo-pipeline/pipeline.mojo")
# retrieve the creation time of the MOJO
create.time(m)
## [1] "2018-12-17 22:00:24 UTC"
# retrieve the UUID of the experiment
uuid(m)
## [1] "65875c15-943a-4bc0-a162-b8984fe8e50d"
# Load data and make predictions
col_class <- setNames(feature.types(m), feature.names(m)) # column names and types
library(data.table)
d <- fread("./mojo-pipeline/example.csv", colClasses=col_class)
predict(m, d)
## label.B label.M
## 1 0.08287659 0.91712341
## 2 0.77655075 0.22344925
## 3 0.58438434 0.41561566
## 4 0.10570505 0.89429495
## 5 0.01685609 0.98314391
## 6 0.23656610 0.76343390
## 7 0.17410333 0.82589667
## 8 0.10157948 0.89842052
## 9 0.13546191 0.86453809
## 10 0.94778244 0.05221756
Python Example¶
Prerequisites¶
Python 3.6
datatable. Run the following to install:
pip install https://s3.amazonaws.com/h2o-release/datatable/stable/datatable-0.8.0/datatable-0.8.0-cp36-cp36m-linux_x86_64.whl
Python MOJO runtime. Run the following after downloading from the GUI:
pip install daimojo-2.0.1+master.478-cp36-cp36m-linux_x86_64.whl
Note: For PowerPC, replacex86_64
withppc64le
above.
- Driverless AI License (either file or environment variable)
Running the MOJO2 Python Runtime¶
# import the daimojo model package
import daimojo.model
# specify the location of the MOJO
m = daimojo.model("./mojo-pipeline/pipeline.mojo")
# retrieve the creation time of the MOJO
m.created_time
# 'Mon May 6 14:00:24 2019'
# retrieve the UUID of the experiment
m.uuid
# retrive a list of missing values
m.missing_values
# ['',
# '?',
# 'None',
# 'nan',
# 'NA',
# 'N/A',
# 'unknown',
# 'inf',
# '-inf',
# '1.7976931348623157e+308',
# '-1.7976931348623157e+308']
# retrieve the feature names
m.feature_names
# ['clump_thickness',
# 'uniformity_cell_size',
# 'uniformity_cell_shape',
# 'marginal_adhesion',
# 'single_epithelial_cell_size',
# 'bare_nuclei',
# 'bland_chromatin',
# 'normal_nucleoli',
# 'mitoses']
# retrieve the feature types
m.feature_types
# ['float32',
# 'float32',
# 'float32',
# 'float32',
# 'float32',
# 'float32',
# 'float32',
# 'float32',
# 'float32']
# retrieve the output names
m.output_names
# ['label.B', 'label.M']
# retrieve the output types
m.output_types
# ['float64', 'float64']
# import the datatable module
import datatable as dt
# parse the example.csv file
pydt = dt.fread("./mojo-pipeline/example.csv", na_strings=m.missing_values)
pydt
# clump_thickness uniformity_cell_size uniformity_cell_shape marginal_adhesion single_epithelial_cell_size bare_nuclei bland_chromatin normal_nucleoli mitoses
# 0 8 1 3 10 6 6 9 1 1
# 1 2 1 2 2 5 3 4 8 8
# 2 1 1 1 9 4 10 3 5 4
# 3 2 6 9 10 4 8 1 1 3
# 4 10 10 8 1 8 3 6 3 4
# 5 1 8 4 5 10 1 2 5 3
# 6 2 10 2 9 1 2 9 3 8
# 7 2 8 9 2 10 10 3 5 4
# 8 6 3 8 5 2 3 5 3 4
# 9 4 2 2 8 1 2 8 9 1
# [10 rows × 9 columns]
# retrieve the column types
pydt.stypes
# (stype.float64,
# stype.float64,
# stype.float64,
# stype.float64,
# stype.float64,
# stype.float64,
# stype.float64,
# stype.float64,
# stype.float64)
# make predictions on the example.csv file
res = m.predict(pydt)
# retrieve the predictions
res
# label.B label.M
# 0 0.0828766 0.917123
# 1 0.776551 0.223449
# 2 0.584384 0.415616
# 3 0.105705 0.894295
# 4 0.0168561 0.983144
# 5 0.236566 0.763434
# 6 0.174103 0.825897
# 7 0.101579 0.898421
# 8 0.135462 0.864538
# 9 0.947782 0.0522176
# [10 rows × 2 columns]
# retrieve the prediction column names
res.names
# ('label.B', 'label.M')
# retrieve the prediction column types
res.stypes
# (stype.float64, stype.float64)
# convert datatable results to common data types
# res.to_pandas() # need pandas
# res.to_numpy() # need numpy
res.to_list()