water
Class Model

java.lang.Object
  extended by water.Iced
      extended by water.Model
All Implemented Interfaces:
java.lang.Cloneable, Freezable
Direct Known Subclasses:
DTree.TreeModel, GLMModel, KMeans2.KMeans2Model, NeuralNet.NeuralNetModel

public abstract class Model
extends Iced

A Model models reality (hopefully). A model can be used to 'score' a row, or a collection of rows on any compatible dataset - meaning the row has all the columns with the same names as used to build the mode.


Nested Class Summary
protected static class Model.SB
           
 
Field Summary
 Key _dataKey
          Dataset key used to *build* the model, for models for which this makes sense, or null otherwise.
 java.lang.String[][] _domains
          Categorical/factor/enum mappings, per column.
 java.lang.String[] _names
          Columns used in the model and are used to match up with scoring data columns.
 Key _selfKey
          Key associated with this Model, if any.
static DocGen.FieldDoc[] DOC_FIELDS
           
 
Constructor Summary
Model(Key selfKey, Key dataKey, Frame fr)
          Full constructor from frame: Strips out the Vecs to just the names needed to match columns later for future datasets.
Model(Key selfKey, Key dataKey, java.lang.String[] names, java.lang.String[][] domains)
          Full constructor
Model(Key selfKey, Model m)
          Simple shallow copy constructor to a new Key
 
Method Summary
 Frame[] adapt(Frame fr, boolean exact)
          Build an adapted Frame from the given Frame.
 java.lang.String[] classNames()
           
 ConfusionMatrix cm()
          For classifiers, confusion matrix on validation set.
 void delete()
          Called when deleting this model, to cleanup any internal keys
static int[] getDomainMapping(java.lang.String colName, java.lang.String[] modelDom, java.lang.String[] dom, boolean exact)
          Returns a mapping between values domains for a given column.
 boolean isClassifier()
           
 int nclasses()
           
 java.lang.String responseName()
           
 double score(double[] data)
           
 Frame score(Frame fr, boolean exact)
          Bulk score the frame 'fr', producing a Frame result; the 1st Vec is the predicted class, the remaining Vecs are the probability distributions.
 float[] score(Frame fr, boolean exact, int row)
          Single row scoring, on a compatible Frame.
 float[] score(int[][] map, double[] row, float[] preds)
          Single row scoring, on a compatible set of data, given an adaption vector
 float[] score(java.lang.String[] names, java.lang.String[][] domains, boolean exact, double[] row)
          Single row scoring, on a compatible set of data.
protected  float[] score0(Chunk[] chks, int row_in_chunk, double[] tmp, float[] preds)
          Bulk scoring API for one row.
protected abstract  float[] score0(double[] data, float[] preds)
          Subclasses implement the scoring logic.
 void testJavaScoring(Frame fr)
           
 java.lang.String toJava()
          Return a String which is a valid Java program representing a class that implements the Model.
protected  void toJavaInit(javassist.CtClass ct)
           
protected  void toJavaInit(Model.SB sb)
           
protected  void toJavaPredictBody(Model.SB sb)
           
 
Methods inherited from class water.Iced
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DOC_FIELDS

public static DocGen.FieldDoc[] DOC_FIELDS

_selfKey

public final Key _selfKey
Key associated with this Model, if any.


_dataKey

public final Key _dataKey
Dataset key used to *build* the model, for models for which this makes sense, or null otherwise. Not all models are built from a dataset (eg artificial models), or are built from a single dataset (various ensemble models), so this key has no *mathematical* significance in the model but is handy during common model-building and for the historical record.


_names

public final java.lang.String[] _names
Columns used in the model and are used to match up with scoring data columns. The last name is the response column name.


_domains

public final java.lang.String[][] _domains
Categorical/factor/enum mappings, per column. Null for non-enum cols. The last column holds the response col enums.

Constructor Detail

Model

public Model(Key selfKey,
             Key dataKey,
             Frame fr)
Full constructor from frame: Strips out the Vecs to just the names needed to match columns later for future datasets.


Model

public Model(Key selfKey,
             Key dataKey,
             java.lang.String[] names,
             java.lang.String[][] domains)
Full constructor


Model

public Model(Key selfKey,
             Model m)
Simple shallow copy constructor to a new Key

Method Detail

delete

public void delete()
Called when deleting this model, to cleanup any internal keys


responseName

public java.lang.String responseName()

classNames

public java.lang.String[] classNames()

isClassifier

public boolean isClassifier()

nclasses

public int nclasses()

cm

public ConfusionMatrix cm()
For classifiers, confusion matrix on validation set.


score

public Frame score(Frame fr,
                   boolean exact)
Bulk score the frame 'fr', producing a Frame result; the 1st Vec is the predicted class, the remaining Vecs are the probability distributions. For Regression (single-class) models, the 1st and only Vec is the prediction value. Also passed in a flag describing how hard we try to adapt the frame.


score

public final float[] score(Frame fr,
                           boolean exact,
                           int row)
Single row scoring, on a compatible Frame.


score

public final float[] score(java.lang.String[] names,
                           java.lang.String[][] domains,
                           boolean exact,
                           double[] row)
Single row scoring, on a compatible set of data. Fairly expensive to adapt.


score

public final float[] score(int[][] map,
                           double[] row,
                           float[] preds)
Single row scoring, on a compatible set of data, given an adaption vector


adapt

public Frame[] adapt(Frame fr,
                     boolean exact)
Build an adapted Frame from the given Frame. Useful for efficient bulk scoring of a new dataset to an existing model. Same adaption as above, but expressed as a Frame instead of as an int[][]. The returned Frame does not have a response column. It returns a two element array containing an adapted frame and a frame which contains only vectors which where adapted (the purpose of the second frame is to delete all adapted vectors with deletion of the frame).


getDomainMapping

public static int[] getDomainMapping(java.lang.String colName,
                                     java.lang.String[] modelDom,
                                     java.lang.String[] dom,
                                     boolean exact)
Returns a mapping between values domains for a given column.


score0

protected float[] score0(Chunk[] chks,
                         int row_in_chunk,
                         double[] tmp,
                         float[] preds)
Bulk scoring API for one row. Chunks are all compatible with the model, and expect the last Chunks are for the final distribution & prediction. Default method is to just load the data into the tmp array, then call subclass scoring logic.


score0

protected abstract float[] score0(double[] data,
                                  float[] preds)
Subclasses implement the scoring logic. The data is pre-loaded into a re-used temp array, in the order the model expects. The predictions are loaded into the re-used temp array, which is also returned.


score

public double score(double[] data)

toJava

public java.lang.String toJava()
Return a String which is a valid Java program representing a class that implements the Model. The Java is of the form:
    class UUIDxxxxModel {
      public static final String NAMES[] = { ....column names... }
      public static final String DOMAINS[][] = { ....domain names... }
      // Pass in data in a double[], pre-aligned to the Model's requirements.
      // Jam predictions into the preds[] array; preds[0] is reserved for the
      // main prediction (class for classifiers or value for regression),
      // and remaining columns hold a probability distribution for classifiers.
      float[] predict( double data[], float preds[] );
      double[] map( HashMap row, double data[] );
      // Does the mapping lookup for every row, no allocation
      float[] predict( HashMap row, double data[], float preds[] );
      // Allocates a double[] for every row
      float[] predict( HashMap row, float preds[] );
      // Allocates a double[] and a float[] for every row
      float[] predict( HashMap row );
    }
  


toJavaInit

protected void toJavaInit(Model.SB sb)

toJavaInit

protected void toJavaInit(javassist.CtClass ct)

toJavaPredictBody

protected void toJavaPredictBody(Model.SB sb)

testJavaScoring

public void testJavaScoring(Frame fr)