public class Vec extends Iced
A distributed vector has a count of elements, an element-to-chunk mapping, a Java type (mostly determines rounding on store and display), and functions to directly load elements without further indirections. The data is compressed, or backed by disk or both. *Writing* to elements may throw if the backing data is read-only (file backed).
Vec Key format is: Key. VEC - byte, 0 - byte, 0 - int, normal Key bytes. DVec Key format is: Key.DVEC - byte, 0 - byte, chunk# - int, normal Key bytes.The main API is at, set, and isNA:
double at ( long row ); // Returns the value expressed as a double. NaN if missing. long at8 ( long row ); // Returns the value expressed as a long. Throws if missing. boolean isNA( long row ); // True if the value is missing. set( long row, double d ); // Stores a double; NaN will be treated as missing. set( long row, long l ); // Stores a long; throws if l exceeds what fits in a double & any floats are ever set. setNA( long row ); // Sets the value as missing.Note this dangerous scenario: loading a missing value as a double, and setting it as a long:
set(row,(long)at(row)); // Danger!The cast from a Double.NaN to a long produces a zero! This code will replace a missing value with a zero.
Modifier and Type | Class and Description |
---|---|
static class |
Vec.CollectDomain
Collect numeric domain of given vector
|
static class |
Vec.VectorGroup
Class representing the group of vectors.
|
Modifier and Type | Field and Description |
---|---|
java.lang.String[] |
_domain
Enum/factor/categorical names.
|
long[] |
_espc
Element-start per chunk.
|
Key |
_key
Key mapping a Value which holds this Vec.
|
static int |
KEY_PREFIX_LEN |
static int |
LOG_CHK
Log-2 of Chunk size.
|
static int |
MAX_ENUM_SIZE
Maximal size of enum domain
|
Modifier | Constructor and Description |
---|---|
|
Vec(Key key,
double d) |
|
Vec(Key key,
long[] espc)
Main default constructor; requires the caller understand Chunk layout
already, along with count of missing elements.
|
protected |
Vec(Key key,
Vec v) |
Modifier and Type | Method and Description |
---|---|
Vec |
align(Vec vec)
Always makes a copy of the given vector which shares the same
group.
|
double |
at(long i)
Fetch element the slow way, as a double.
|
long |
at8(long i)
Fetch element the slow way, as a long.
|
long |
byteSize()
Size of compressed vector data.
|
int |
cardinality()
Returns cardinality for enum domain or -1 for other types.
|
protected boolean |
checkMissing(int cidx,
Value val) |
Chunk |
chunk(long i)
The Chunk for a row#.
|
long |
chunk2StartElem(int cidx)
Convert a chunk-index into a starting row #.
|
Value |
chunkIdx(int cidx)
Get a Chunk's Value by index.
|
Key |
chunkKey(int cidx)
Get a Chunk Key from a chunk-index.
|
int |
chunkLen(int cidx)
Number of rows in chunk.
|
java.lang.String[] |
domain()
Return an array of domains.
|
java.lang.String |
domain(long i)
Map the integer value for a enum/factor/categorical to it's String.
|
Chunk |
elem2BV(int cidx)
The Chunk for a chunk#.
|
boolean |
equals(java.lang.Object o) |
static Key |
getVecKey(Key key)
Get a Vec Key from Chunk Key, without loading the Chunk
|
Vec.VectorGroup |
group()
Get the group this vector belongs to.
|
int |
hashCode() |
boolean |
isEnum()
Is the column a factor/categorical/enum? Note: all "isEnum()" columns
are are also "isInt()" but not vice-versa.
|
boolean |
isInt()
Is all integers?
|
boolean |
isNA(long row)
Fetch the missing-status the slow way.
|
long |
length()
Number of elements in the vector.
|
Vec |
makeCon(double d) |
Vec |
makeCon(long l)
Make a new vector with the same size and data layout as the old one, and
initialized to a constant.
|
Vec |
makeTransf(int[][] map)
Create a vector transforming values according given domain map.
|
Vec |
makeZero()
Make a new vector with the same size and data layout as the old one, and
initialized to zero.
|
Vec |
masterVec()
This Vec does not have dependent hidden Vec it uses.
|
double |
max()
Return column max - lazily computed as needed.
|
double |
mean()
Return column mean - lazily computed as needed.
|
double |
min()
Return column min - lazily computed as needed.
|
long |
naCnt()
Return column missing-element-count - lazily computed as needed.
|
int |
nChunks()
Number of chunks.
|
static Key |
newKey()
Make a new random Key that fits the requirements for a Vec key.
|
void |
postWrite()
Stop writing into this Vec.
|
protected boolean |
readable()
Default read/write behavior for Vecs.
|
void |
remove(Futures fs) |
Vec |
rollupStats()
Compute the roll-up stats as-needed, and copy into the Vec object
|
Vec |
rollupStats(Futures fs) |
double |
set(long i,
double d)
Write element the slow way, as a double.
|
float |
set(long i,
float f)
Write element the slow way, as a float.
|
long |
set(long i,
long l)
Write element the slow way, as a long.
|
boolean |
setNA(long i)
Set the element as missing the slow way.
|
double |
sigma()
Return column standard deviation - lazily computed as needed.
|
Vec |
toEnum()
Transform this vector to enum.
|
java.lang.String |
toString()
Pretty print the Vec: [#elems, min/mean/max]{chunks,...}
|
protected boolean |
writable()
Default read/write behavior for Vecs.
|
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
public static final int LOG_CHK
public final Key _key
public final long[] _espc
public java.lang.String[] _domain
public static final int MAX_ENUM_SIZE
public static final int KEY_PREFIX_LEN
public Vec(Key key, long[] espc)
public Vec(Key key, double d)
public Vec makeZero()
public Vec makeCon(long l)
public Vec makeCon(double d)
public Vec makeTransf(int[][] map)
makeTransf(int[], int[], String[])
public Vec masterVec()
null
public long length()
public int nChunks()
public final boolean isEnum()
public java.lang.String domain(long i)
public java.lang.String[] domain()
public int cardinality()
public Vec toEnum()
TransfVec
which provides a mapping between values.protected boolean readable()
protected boolean writable()
public double min()
public double max()
public double mean()
public double sigma()
public long naCnt()
public boolean isInt()
public long byteSize()
public Vec rollupStats()
public void postWrite()
public long chunk2StartElem(int cidx)
public int chunkLen(int cidx)
public static Key getVecKey(Key key)
public Key chunkKey(int cidx)
public Value chunkIdx(int cidx)
DKV.get()
. Warning: this pulls the data locally;
using this call on every Chunk index on the same node will
probably trigger an OOM!protected boolean checkMissing(int cidx, Value val)
public static Key newKey()
public final Vec.VectorGroup group()
public Chunk elem2BV(int cidx)
public final Chunk chunk(long i)
public final long at8(long i)
public final double at(long i)
public final boolean isNA(long row)
public final long set(long i, long l)
public final double set(long i, double d)
public final float set(long i, float f)
public final boolean setNA(long i)
public java.lang.String toString()
toString
in class java.lang.Object
public void remove(Futures fs)
public boolean equals(java.lang.Object o)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public Vec align(Vec vec)
vec
- vector which is intended to be copiedVec.VectorGroup
with this vector