water
Class MRTask2<T extends MRTask2<T>>
java.lang.Object
jsr166y.ForkJoinTask<java.lang.Void>
jsr166y.CountedCompleter
water.H2O.H2OCountedCompleter
water.DTask
water.MRTask2<T>
- All Implemented Interfaces:
- java.io.Serializable, java.lang.Cloneable, java.util.concurrent.Future<java.lang.Void>, Freezable
- Direct Known Subclasses:
- FrameSplit.FrameSplitter, FrameTask, GLMModel.GLMValidationTask, KMeans2.Lloyds, KMeans2.Sampler, KMeans2.SumSqr, LR2.CalcRegressionTask, LR2.CalcSquareErrorsTasks, LR2.CalcSumsTask, ParseDataset2.EnumUpdateTask, PCAScore.PCAScoreTask, Quantiles.QuantilesTask, SharedTreeModelBuilder.Score, Vec.CollectDomain
public abstract class MRTask2<T extends MRTask2<T>>
- extends DTask
- implements java.lang.Cloneable
Map/Reduce style distributed computation.
MRTask2 provides several map
and reduce
methods that can be
overriden to specify a computation. Several instances of this class will be
created to distribute the computation over F/J threads and machines. Non-transient
fields are copied and serialized to instances created for map invocations. Reduce
methods can store their results in fields. Results are serialized and reduced all the
way back to the invoking node. When the last reduce method has been called, fields
of the initial MRTask2 instance contains the computation results.
Apart from small reduced POJO returned to the calling node, MRtask2 can produce output vector(s) as a result.
These will have chunks co-located with the input dataset, however, their number of lines will generally differ,
(so they won't be strictly compatible with the original). To produce output vectors, call doAll.dfork version
with required number of outputs and override appropriate map
call taking required number of NewChunks.
MRTask2 will automatically close the new Appendable vecs and produce an output frame with newly created Vecs.
- See Also:
- Serialized Form
Field Summary |
Frame |
_fr
The Vectors to work on. |
protected Futures |
_fs
We can add more things to block on - in case we want a bunch of lazy
tasks produced by children to all end before this top-level task ends. |
protected int |
_hi
Internal field to track a range of local Chunks to work on |
protected T |
_left
Internal field to track the left & right sub-range of chunks to work on |
protected int |
_lo
Internal field to track a range of local Chunks to work on |
protected RPC<T> |
_nleft
Internal field to track the left & right remote nodes/JVMs to work on |
protected long |
_nodes
Internal field to track a range of remote nodes/JVMs to work on |
protected RPC<T> |
_nrite
Internal field to track the left & right remote nodes/JVMs to work on |
Frame |
_outputFrame
|
protected T |
_rite
Internal field to track the left & right sub-range of chunks to work on |
protected boolean |
_topLocal
Internal field to track if this is a top-level local call |
Method Summary |
T |
clone()
Local Clone - setting final-field completer |
protected void |
closeLocal()
Override to do any remote cleaning on the last remote instance of
this object, for disposing of node-local shared data structures. |
void |
compute2()
Called from FJ threads to do local work. |
T |
dfork(Frame fr)
|
T |
dfork(int outputs,
Frame fr)
|
T |
dfork(int outputs,
Vec... vecs)
|
T |
dfork(Vec... vecs)
Invokes the map/reduce computation over the given Frame. |
void |
dinvoke(H2ONode sender)
Called once on remote at top level, probably with a subset of the cloud. |
T |
doAll(Frame fr)
Invokes the map/reduce computation over the given Frame. |
T |
doAll(int outputs,
Frame fr)
|
T |
doAll(int outputs,
Vec... vecs)
|
T |
doAll(Vec... vecs)
Invokes the map/reduce computation over the given Vecs. |
T |
getResult()
Block for & get any final results from a dfork'd MRTask2. |
void |
map(Chunk c)
Override with your map implementation. |
void |
map(Chunk[] cs)
Override with your map implementation. |
void |
map(Chunk[] cs,
NewChunk nc)
|
void |
map(Chunk[] cs,
NewChunk[] ncs)
|
void |
map(Chunk[] cs,
NewChunk nc1,
NewChunk nc2)
|
void |
map(Chunk c0,
Chunk c1)
Override with your map implementation. |
void |
map(Chunk c0,
Chunk c1,
Chunk c2)
Override with your map implementation. |
void |
map(Chunk c0,
Chunk c1,
Chunk c2,
NewChunk nc)
|
void |
map(Chunk c0,
Chunk c1,
Chunk c2,
NewChunk nc1,
NewChunk nc2)
|
void |
map(Chunk c0,
Chunk c1,
NewChunk nc)
|
void |
map(Chunk c0,
Chunk c1,
NewChunk nc1,
NewChunk nc2)
|
void |
map(Chunk c,
NewChunk nc)
|
void |
onCompletion(jsr166y.CountedCompleter caller)
OnCompletion - reduce the left & right into self. |
boolean |
onExceptionalCompletion(java.lang.Throwable ex,
jsr166y.CountedCompleter caller)
Cancel/kill all work as we can, then rethrow... |
protected void |
postGlobal()
|
java.lang.String |
profString()
|
void |
reduce(T mrt)
Override to combine results from 'mrt' into 'this' MRTask2. |
protected void |
reduce4(T mrt)
Call user's reduction. |
protected void |
setupLocal()
Override to do any remote initialization on the 1st remote instance of
this object, for initializing node-local shared data structures. |
Vec |
vecs(int i)
Returns a Vec from the Frame. |
Methods inherited from class water.DTask |
copyOver, frozenType, getDException, hasException, logVerbose, newInstance, onAck, onAckAck, read, setException, toDocField, write, writeJSONFields |
Methods inherited from class jsr166y.CountedCompleter |
addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryComplete |
Methods inherited from class jsr166y.ForkJoinTask |
adapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnfork |
Methods inherited from class java.lang.Object |
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
_fr
public Frame _fr
- The Vectors to work on.
_outputFrame
public Frame _outputFrame
_nodes
protected long _nodes
- Internal field to track a range of remote nodes/JVMs to work on
_nleft
protected transient RPC<T extends MRTask2<T>> _nleft
- Internal field to track the left & right remote nodes/JVMs to work on
_nrite
protected transient RPC<T extends MRTask2<T>> _nrite
- Internal field to track the left & right remote nodes/JVMs to work on
_topLocal
protected transient boolean _topLocal
- Internal field to track if this is a top-level local call
_lo
protected transient int _lo
- Internal field to track a range of local Chunks to work on
_hi
protected transient int _hi
- Internal field to track a range of local Chunks to work on
_left
protected transient T extends MRTask2<T> _left
- Internal field to track the left & right sub-range of chunks to work on
_rite
protected transient T extends MRTask2<T> _rite
- Internal field to track the left & right sub-range of chunks to work on
_fs
protected transient Futures _fs
- We can add more things to block on - in case we want a bunch of lazy
tasks produced by children to all end before this top-level task ends.
Semantically, these will all complete before we return from the top-level
task. Pragmatically, we block on a finer grained basis.
MRTask2
public MRTask2()
map
public void map(Chunk c)
- Override with your map implementation. This overload is given a single
local input Chunk. It is meant for map/reduce jobs that use a
single column in a input Frame. All map variants are called, but only one is
expected to be overridden.
map
public void map(Chunk c,
NewChunk nc)
map
public void map(Chunk c0,
Chunk c1)
- Override with your map implementation. This overload is given two
local Chunks. All map variants are called, but only one
is expected to be overridden.
map
public void map(Chunk c0,
Chunk c1,
NewChunk nc)
map
public void map(Chunk c0,
Chunk c1,
NewChunk nc1,
NewChunk nc2)
map
public void map(Chunk c0,
Chunk c1,
Chunk c2)
- Override with your map implementation. This overload is given three
local input Chunks. All map variants are called, but only one
is expected to be overridden.
map
public void map(Chunk c0,
Chunk c1,
Chunk c2,
NewChunk nc)
map
public void map(Chunk c0,
Chunk c1,
Chunk c2,
NewChunk nc1,
NewChunk nc2)
map
public void map(Chunk[] cs)
- Override with your map implementation. This overload is given an array
of local input Chunks, for Frames with arbitrary column
numbers. All map variants are called, but only one is expected to be
overridden.
map
public void map(Chunk[] cs,
NewChunk nc)
map
public void map(Chunk[] cs,
NewChunk nc1,
NewChunk nc2)
map
public void map(Chunk[] cs,
NewChunk[] ncs)
reduce
public void reduce(T mrt)
- Override to combine results from 'mrt' into 'this' MRTask2. Both 'this'
and 'mrt' are guaranteed to either have map() run on them, or be the
results of a prior reduce(). Reduce is optional if, e.g., the result is
some output vector.
setupLocal
protected void setupLocal()
- Override to do any remote initialization on the 1st remote instance of
this object, for initializing node-local shared data structures.
closeLocal
protected void closeLocal()
- Override to do any remote cleaning on the last remote instance of
this object, for disposing of node-local shared data structures.
profString
public java.lang.String profString()
vecs
public final Vec vecs(int i)
- Returns a Vec from the Frame.
doAll
public final T doAll(Vec... vecs)
- Invokes the map/reduce computation over the given Vecs. This call is
blocking.
doAll
public final T doAll(int outputs,
Vec... vecs)
doAll
public final T doAll(Frame fr)
- Invokes the map/reduce computation over the given Frame. This call is
blocking.
doAll
public final T doAll(int outputs,
Frame fr)
dfork
public final T dfork(Vec... vecs)
- Invokes the map/reduce computation over the given Frame. This call is
asynchronous. It returns 'this', on which getResult() can be invoked
later to wait on the computation.
dfork
public final T dfork(Frame fr)
dfork
public final T dfork(int outputs,
Vec... vecs)
dfork
public final T dfork(int outputs,
Frame fr)
getResult
public final T getResult()
- Block for & get any final results from a dfork'd MRTask2.
Note: the desired name 'get' is final in ForkJoinTask.
dinvoke
public final void dinvoke(H2ONode sender)
- Called once on remote at top level, probably with a subset of the cloud.
Called internal by D/F/J. Not expected to be user-called.
- Overrides:
dinvoke
in class DTask
compute2
public final void compute2()
- Called from FJ threads to do local work. The first called Task (which is
also the last one to Complete) also reduces any global work. Called
internal by F/J. Not expected to be user-called.
- Specified by:
compute2
in class H2O.H2OCountedCompleter
onCompletion
public final void onCompletion(jsr166y.CountedCompleter caller)
- OnCompletion - reduce the left & right into self. Called internal by
F/J. Not expected to be user-called.
- Overrides:
onCompletion
in class jsr166y.CountedCompleter
- Parameters:
caller
- the task invoking this method (which may
be this task itself).
postGlobal
protected void postGlobal()
reduce4
protected void reduce4(T mrt)
- Call user's reduction. Also reduce any new AppendableVecs. Called
internal by F/J. Not expected to be user-called.
onExceptionalCompletion
public final boolean onExceptionalCompletion(java.lang.Throwable ex,
jsr166y.CountedCompleter caller)
- Cancel/kill all work as we can, then rethrow... do not invisibly swallow
exceptions (which is the F/J default). Called internal by F/J. Not
expected to be user-called.
- Overrides:
onExceptionalCompletion
in class DTask
- Parameters:
ex
- the exceptioncaller
- the task invoking this method (which may
be this task itself).
- Returns:
- true if this exception should be propagated to this
tasks completer, if one exists.
clone
public T clone()
- Local Clone - setting final-field completer
- Overrides:
clone
in class DTask