hex.gbm
Class DHistogram<T extends DHistogram>

java.lang.Object
  extended by water.Iced
      extended by hex.gbm.DHistogram<T>
All Implemented Interfaces:
java.lang.Cloneable, Freezable
Direct Known Subclasses:
DBinHistogram

public class DHistogram<T extends DHistogram>
extends Iced

A DHistogram, computed in parallel over a Vec.

A DHistogram bins (by default into bins) every value added to it, and computes a the min, max, and either class distribution or mean & variance for each bin. DHistograms are initialized with a min, max and number-of-elements to be added (all of which are generally available from a Vec). Bins normally run from min to max in uniform sizes, but if the DHistogram can determine that fewer bins are needed (e.g. boolean columns run from 0 to 1, but only ever take on 2 values, so only 2 bins are needed), then fewer bins are used.

If we are successively splitting rows (e.g. in a decision tree), then a fresh DHistogram for each split will dynamically re-bin the data. Each successive split then, will logarithmically divide the data. At the first split, outliers will end up in their own bins - but perhaps some central bins may be very full. At the next split(s), the full bins will get split, and again until (with a log number of splits) each bin holds roughly the same amount of data.


Constructor Summary
DHistogram(java.lang.String name, byte isInt)
           
DHistogram(java.lang.String name, byte isInt, float min, float max)
           
 
Method Summary
 DHistogram bigCopy()
           
protected static int byteSize(byte[] bs)
           
protected static int byteSize(double[] fs)
           
protected static int byteSize(float[] fs)
           
protected static int byteSize(int[] is)
           
protected static int byteSize(long[] ls)
           
protected static int byteSize(java.lang.Object[] ls)
           
protected static int byteSize(short[] ss)
           
 DHistogram smallCopy()
           
 void tightenMinMax()
           
 
Methods inherited from class water.Iced
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DHistogram

public DHistogram(java.lang.String name,
                  byte isInt,
                  float min,
                  float max)

DHistogram

public DHistogram(java.lang.String name,
                  byte isInt)
Method Detail

smallCopy

public DHistogram smallCopy()

bigCopy

public DHistogram bigCopy()

tightenMinMax

public void tightenMinMax()

byteSize

protected static int byteSize(byte[] bs)

byteSize

protected static int byteSize(short[] ss)

byteSize

protected static int byteSize(float[] fs)

byteSize

protected static int byteSize(int[] is)

byteSize

protected static int byteSize(long[] ls)

byteSize

protected static int byteSize(double[] fs)

byteSize

protected static int byteSize(java.lang.Object[] ls)