water.fvec
Class Vec

java.lang.Object
  extended by water.Iced
      extended by water.fvec.Vec
All Implemented Interfaces:
java.lang.Cloneable, Freezable
Direct Known Subclasses:
AppendableVec, ByteVec, TransfVec

public class Vec
extends Iced

A single distributed vector column.

A distributed vector has a count of elements, an element-to-chunk mapping, a Java type (mostly determines rounding on store and display), and functions to directly load elements without further indirections. The data is compressed, or backed by disk or both. *Writing* to elements may throw if the backing data is read-only (file backed).

  Vec Key format is: Key. VEC - byte, 0 - byte,   0    - int, normal Key bytes.
 DVec Key format is: Key.DVEC - byte, 0 - byte, chunk# - int, normal Key bytes.
 
The main API is at, set, and isNA:
   double  at  ( long row );  // Returns the value expressed as a double.  NaN if missing.
   long    at8 ( long row );  // Returns the value expressed as a long.  Throws if missing.
   boolean isNA( long row );  // True if the value is missing.
   set( long row, double d ); // Stores a double; NaN will be treated as missing.
   set( long row, long l );   // Stores a long; throws if l exceeds what fits in a double & any floats are ever set.
   setNA( long row );         // Sets the value as missing.
 
Note this dangerous scenario: loading a missing value as a double, and setting it as a long:
   set(row,(long)at(row)); // Danger!
The cast from a Double.NaN to a long produces a zero! This code will replace a missing value with a zero.


Nested Class Summary
static class Vec.CollectDomain
           
static class Vec.VectorGroup
          Class representing the group of vectors.
 
Field Summary
 java.lang.String[] _domain
          Enum/factor/categorical names.
 Key _key
          Key mapping a Value which holds this Vec.
static int LOG_CHK
          Log-2 of Chunk size.
 
Constructor Summary
Vec(Key key, double d)
           
 
Method Summary
 Vec adaptTo(Vec v, boolean exact)
          Adapt given vector v to this vector.
 void asEnum()
          Deprecated. 
 double at(long i)
          Fetch element the slow way, as a double.
 long at8(long i)
          Fetch element the slow way, as a long.
 long byteSize()
          Size of compressed vector data.
 Chunk chunk(long i)
          The Chunk for a row#.
 long chunk2StartElem(int cidx)
          Convert a chunk-index into a starting row #.
 Value chunkIdx(int cidx)
          Get a Chunk's Value by index.
 Key chunkKey(int cidx)
          Get a Chunk Key from a chunk-index.
 int chunkLen(int cidx)
          Number of rows in chunk.
 java.lang.String[] defaultLevels()
          Deprecated. 
 java.lang.String[] domain()
          Return an array of domains.
 java.lang.String domain(long i)
          Map the integer value for a enum/factor/categorical to it's String.
 Chunk elem2BV(int cidx)
          The Chunk for a chunk#.
 Vec.VectorGroup group()
          Get the group this vector belongs to.
 boolean isEnum()
          Is the column a factor/categorical/enum? Note: all "isEnum()" columns are are also "isInt()" but not vice-versa.
 boolean isInt()
          Is all integers?
 boolean isNA(long row)
          Fetch the missing-status the slow way.
 long length()
          Number of elements in the vector.
 Vec makeCon(double d)
           
 Vec makeCon(long l)
          Make a new vector with the same size and data layout as the old one, and initialized to a constant.
 Vec makeTransf(int[] domMap)
           
 Vec makeTransf(int[] domMap, java.lang.String[] domain)
           
 Vec makeZero()
          Make a new vector with the same size and data layout as the old one, and initialized to zero.
 double max()
          Return column max - lazily computed as needed.
 double mean()
          Return column mean - lazily computed as needed.
 double min()
          Return column min - lazily computed as needed.
 long naCnt()
          Return column missing-element-count - lazily computed as needed.
 int nChunks()
          Number of chunks.
 void postWrite()
          Stop writing into this Vec.
protected  boolean readable()
          Default read/write behavior for Vecs.
 void remove(Futures fs)
           
 Vec rollupStats()
          Compute the roll-up stats as-needed, and copy into the Vec object
 void rollupStats(Futures fs)
           
 void rollupStats(H2O.H2OCountedCompleter cc)
           
 double set(long i, double d)
          Write element the slow way, as a double.
 float set(long i, float f)
          Write element the slow way, as a float.
 long set(long i, long l)
          Write element the slow way, as a long.
 boolean setNA(long i)
          Set the element as missing the slow way.
 double sigma()
          Return column standard deviation - lazily computed as needed.
 Vec toEnum()
          Transform this vector to enum.
 java.lang.String toString()
          Pretty print the Vec: [#elems, min/mean/max]{chunks,...}
protected  boolean writable()
          Default read/write behavior for Vecs.
 
Methods inherited from class water.Iced
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LOG_CHK

public static final int LOG_CHK
Log-2 of Chunk size.

See Also:
Constant Field Values

_key

public final Key _key
Key mapping a Value which holds this Vec.


_domain

public java.lang.String[] _domain
Enum/factor/categorical names.

Constructor Detail

Vec

public Vec(Key key,
           double d)
Method Detail

makeZero

public Vec makeZero()
Make a new vector with the same size and data layout as the old one, and initialized to zero.


makeCon

public Vec makeCon(long l)
Make a new vector with the same size and data layout as the old one, and initialized to a constant.


makeCon

public Vec makeCon(double d)

makeTransf

public Vec makeTransf(int[] domMap)

makeTransf

public Vec makeTransf(int[] domMap,
                      java.lang.String[] domain)

adaptTo

public Vec adaptTo(Vec v,
                   boolean exact)
Adapt given vector v to this vector. I.e., unify domains and call makeTransf().


length

public long length()
Number of elements in the vector. Overridden by subclasses that compute length in an alternative way, such as file-backed Vecs.


nChunks

public int nChunks()
Number of chunks. Overridden by subclasses that compute chunks in an alternative way, such as file-backed Vecs.


isEnum

public final boolean isEnum()
Is the column a factor/categorical/enum? Note: all "isEnum()" columns are are also "isInt()" but not vice-versa.


domain

public java.lang.String domain(long i)
Map the integer value for a enum/factor/categorical to it's String. Error if it is not an ENUM.


domain

public java.lang.String[] domain()
Return an array of domains. This is eagerly manifested for enum or categorical columns. Returns null for non-Enum/factor columns.


asEnum

@Deprecated
public void asEnum()
Deprecated. 

Convert an integer column to an enum column, with just number strings for the factors or levels. Deprecated - you should use toEnum ALWAYS returning a new vector which provides a correct transformation to enum. The caller of toEnum() is ALWAYS responsible for its deletion!!!


toEnum

public Vec toEnum()
Transform this vector to enum. Transformation is done by a TransfVec which provides a mapping between values. The caller is responsible for vector deletion!


defaultLevels

@Deprecated
public java.lang.String[] defaultLevels()
Deprecated. 


readable

protected boolean readable()
Default read/write behavior for Vecs. File-backed Vecs are read-only.


writable

protected boolean writable()
Default read/write behavior for Vecs. AppendableVecs are write-only.


min

public double min()
Return column min - lazily computed as needed.


max

public double max()
Return column max - lazily computed as needed.


mean

public double mean()
Return column mean - lazily computed as needed.


sigma

public double sigma()
Return column standard deviation - lazily computed as needed.


naCnt

public long naCnt()
Return column missing-element-count - lazily computed as needed.


isInt

public boolean isInt()
Is all integers?


byteSize

public long byteSize()
Size of compressed vector data.


rollupStats

public Vec rollupStats()
Compute the roll-up stats as-needed, and copy into the Vec object


rollupStats

public void rollupStats(Futures fs)

rollupStats

public void rollupStats(H2O.H2OCountedCompleter cc)

postWrite

public void postWrite()
Stop writing into this Vec. Rollup stats will again (lazily) be computed.


chunk2StartElem

public long chunk2StartElem(int cidx)
Convert a chunk-index into a starting row #. For constant-sized chunks this is a little shift-and-add math. For variable-sized chunks this is a table lookup.


chunkLen

public int chunkLen(int cidx)
Number of rows in chunk. Does not fetch chunk content.


chunkKey

public Key chunkKey(int cidx)
Get a Chunk Key from a chunk-index. Basically the index-to-key map.


chunkIdx

public Value chunkIdx(int cidx)
Get a Chunk's Value by index. Basically the index-to-key map, plus the DKV.get. Warning: this pulls the data locally; using this call on every Chunk index on the same node will probably trigger an OOM!


group

public final Vec.VectorGroup group()
Get the group this vector belongs to. In case of a group with only one vector, the object actually does not exist in KV store.

Returns:
VectorGroup this vector belongs to.

elem2BV

public Chunk elem2BV(int cidx)
The Chunk for a chunk#. Warning: this loads the data locally!


chunk

public final Chunk chunk(long i)
The Chunk for a row#. Warning: this loads the data locally!


at8

public final long at8(long i)
Fetch element the slow way, as a long. Floating point values are silently rounded to an integer. Throws if the value is missing.


at

public final double at(long i)
Fetch element the slow way, as a double. Missing values are returned as Double.NaN instead of throwing.


isNA

public final boolean isNA(long row)
Fetch the missing-status the slow way.


set

public final long set(long i,
                      long l)
Write element the slow way, as a long. There is no way to write a missing value with this call. Under rare circumstances this can throw: if the long does not fit in a double (value is larger magnitude than 2^52), AND float values are stored in Vector. In this case, there is no common compatible data representation.


set

public final double set(long i,
                        double d)
Write element the slow way, as a double. Double.NaN will be treated as a set of a missing element.


set

public final float set(long i,
                       float f)
Write element the slow way, as a float. Float.NaN will be treated as a set of a missing element.


setNA

public final boolean setNA(long i)
Set the element as missing the slow way.


toString

public java.lang.String toString()
Pretty print the Vec: [#elems, min/mean/max]{chunks,...}

Overrides:
toString in class java.lang.Object

remove

public void remove(Futures fs)