hex.gbm
Class DBinHistogram

java.lang.Object
  extended by water.Iced
      extended by hex.gbm.DHistogram<DBinHistogram>
          extended by hex.gbm.DBinHistogram
All Implemented Interfaces:
java.lang.Cloneable, Freezable

public class DBinHistogram
extends DHistogram<DBinHistogram>

A Histogram, computed in parallel over a Vec.

A DBinHistogram bins every value added to it, and computes a the vec min & max (for use in the next split), and response mean & variance for each bin. DBinHistograms are initialized with a min, max and number-of- elements to be added (all of which are generally available from a Vec). Bins run from min to max in uniform sizes. If the DBinHistogram can determine that fewer bins are needed (e.g. boolean columns run from 0 to 1, but only ever take on 2 values, so only 2 bins are needed), then fewer bins are used.

If we are successively splitting rows (e.g. in a decision tree), then a fresh DBinHistogram for each split will dynamically re-bin the data. Each successive split will logarithmically divide the data. At the first split, outliers will end up in their own bins - but perhaps some central bins may be very full. At the next split(s), the full bins will get split, and again until (with a log number of splits) each bin holds roughly the same amount of data. This dynamic binning resolves a lot of problems with picking the proper bin count or limits - generally a few more tree levels will equal any fancy but fixed-size binning strategy.


Field Summary
 long[] _bins
           
 float _bmin
           
 float[] _maxs
           
 float[] _mins
           
 char _nbins
           
 float _step
           
 
Constructor Summary
DBinHistogram(java.lang.String name, char nbins, byte isInt, float min, float max, long nelems)
           
 
Method Summary
 DBinHistogram bigCopy()
           
 void fini()
           
static DBinHistogram[] initialHist(Frame fr, int ncols, char nbins)
           
 boolean isConstantResponse()
           
 DHistogram smallCopy()
           
 void tightenMinMax()
           
 java.lang.String toString()
           
 
Methods inherited from class hex.gbm.DHistogram
byteSize, byteSize, byteSize, byteSize, byteSize, byteSize, byteSize
 
Methods inherited from class water.Iced
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

_step

public final float _step

_bmin

public final float _bmin

_nbins

public final char _nbins

_bins

public long[] _bins

_mins

public float[] _mins

_maxs

public float[] _maxs
Constructor Detail

DBinHistogram

public DBinHistogram(java.lang.String name,
                     char nbins,
                     byte isInt,
                     float min,
                     float max,
                     long nelems)
Method Detail

smallCopy

public DHistogram smallCopy()
Overrides:
smallCopy in class DHistogram<DBinHistogram>

bigCopy

public DBinHistogram bigCopy()
Overrides:
bigCopy in class DHistogram<DBinHistogram>

fini

public void fini()

tightenMinMax

public void tightenMinMax()
Overrides:
tightenMinMax in class DHistogram<DBinHistogram>

initialHist

public static DBinHistogram[] initialHist(Frame fr,
                                          int ncols,
                                          char nbins)

isConstantResponse

public boolean isConstantResponse()

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object