hex.rf
Class GiniStatistic

java.lang.Object
  extended by hex.rf.GiniStatistic

public class GiniStatistic
extends java.lang.Object

Computes the gini split statistics. The Gini fitness is calculated as a probability that the element will be misclassified, which is: 1 - \sum(p_i^2) This is computed for the left and right subtrees and added together: gini left * weight left + gini right * weight left -------------------------------------------------- weight total And subtracted from an ideal worst 1 to simulate the gain from previous node. The best gain is then selected. Same is done for exclusions, where again left stands for the rows with column value equal to the split value and right for all different ones.


Field Summary
protected  int[][][] _columnDists
          Column distributions: column x arity x classes Remembers the number of rows of the given column index, encodedValue, class.
protected  int[] _features
           
protected  java.util.Random _random
           
 
Constructor Summary
GiniStatistic(Data data, int features, long seed, int exclusiveSplitLimit)
           
 
Method Summary
protected  hex.rf.Statistic.Split eqSplit(int colIndex, Data d, int[] dist, int distWeight, java.util.Random _)
           
protected  hex.rf.Statistic.Split ltSplit(int col, Data d, int[] dist, int distWeight, java.util.Random _)
          Returns the best split for a given column
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_columnDists

protected final int[][][] _columnDists
Column distributions: column x arity x classes Remembers the number of rows of the given column index, encodedValue, class.


_features

protected final int[] _features

_random

protected java.util.Random _random
Constructor Detail

GiniStatistic

public GiniStatistic(Data data,
                     int features,
                     long seed,
                     int exclusiveSplitLimit)
Method Detail

ltSplit

protected hex.rf.Statistic.Split ltSplit(int col,
                                         Data d,
                                         int[] dist,
                                         int distWeight,
                                         java.util.Random _)
Returns the best split for a given column


eqSplit

protected hex.rf.Statistic.Split eqSplit(int colIndex,
                                         Data d,
                                         int[] dist,
                                         int distWeight,
                                         java.util.Random _)