water
Class MRTask2<T extends MRTask2<T>>

java.lang.Object
  extended by jsr166y.ForkJoinTask<java.lang.Void>
      extended by jsr166y.CountedCompleter
          extended by water.H2O.H2OCountedCompleter
              extended by water.DTask
                  extended by water.MRTask2<T>
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.util.concurrent.Future<java.lang.Void>, Freezable
Direct Known Subclasses:
FrameSplit.FrameSplitter, FrameTask, GLMModel.GLMValidationTask, KMeans2.Lloyds, KMeans2.Sampler, KMeans2.SumSqr, LR2.CalcRegressionTask, LR2.CalcSquareErrorsTasks, LR2.CalcSumsTask, ParseDataset2.EnumUpdateTask, PCAScore.PCAScoreTask, Quantiles.QuantilesTask, SharedTreeModelBuilder.Score, Vec.CollectDomain

public abstract class MRTask2<T extends MRTask2<T>>
extends DTask
implements java.lang.Cloneable

Map/Reduce style distributed computation. MRTask2 provides several map and reduce methods that can be overriden to specify a computation. Several instances of this class will be created to distribute the computation over F/J threads and machines. Non-transient fields are copied and serialized to instances created for map invocations. Reduce methods can store their results in fields. Results are serialized and reduced all the way back to the invoking node. When the last reduce method has been called, fields of the initial MRTask2 instance contains the computation results. Apart from small reduced POJO returned to the calling node, MRtask2 can produce output vector(s) as a result. These will have chunks co-located with the input dataset, however, their number of lines will generally differ, (so they won't be strictly compatible with the original). To produce output vectors, call doAll.dfork version with required number of outputs and override appropriate map call taking required number of NewChunks. MRTask2 will automatically close the new Appendable vecs and produce an output frame with newly created Vecs.

See Also:
Serialized Form

Field Summary
 Frame _fr
          The Vectors to work on.
protected  Futures _fs
          We can add more things to block on - in case we want a bunch of lazy tasks produced by children to all end before this top-level task ends.
protected  int _hi
          Internal field to track a range of local Chunks to work on
protected  T _left
          Internal field to track the left & right sub-range of chunks to work on
protected  int _lo
          Internal field to track a range of local Chunks to work on
protected  RPC<T> _nleft
          Internal field to track the left & right remote nodes/JVMs to work on
protected  long _nodes
          Internal field to track a range of remote nodes/JVMs to work on
protected  RPC<T> _nrite
          Internal field to track the left & right remote nodes/JVMs to work on
 Frame _outputFrame
           
protected  T _rite
          Internal field to track the left & right sub-range of chunks to work on
protected  boolean _topLocal
          Internal field to track if this is a top-level local call
 
Fields inherited from class water.DTask
_cls, _eFromNode, _exception, _fname, _lineNum, _msg, _mth
 
Constructor Summary
MRTask2()
           
 
Method Summary
 T clone()
          Local Clone - setting final-field completer
protected  void closeLocal()
          Override to do any remote cleaning on the last remote instance of this object, for disposing of node-local shared data structures.
 void compute2()
          Called from FJ threads to do local work.
 T dfork(Frame fr)
           
 T dfork(int outputs, Frame fr)
           
 T dfork(int outputs, Vec... vecs)
           
 T dfork(Vec... vecs)
          Invokes the map/reduce computation over the given Frame.
 void dinvoke(H2ONode sender)
          Called once on remote at top level, probably with a subset of the cloud.
 T doAll(Frame fr)
          Invokes the map/reduce computation over the given Frame.
 T doAll(int outputs, Frame fr)
           
 T doAll(int outputs, Vec... vecs)
           
 T doAll(Vec... vecs)
          Invokes the map/reduce computation over the given Vecs.
 T getResult()
          Block for & get any final results from a dfork'd MRTask2.
 void map(Chunk c)
          Override with your map implementation.
 void map(Chunk[] cs)
          Override with your map implementation.
 void map(Chunk[] cs, NewChunk nc)
           
 void map(Chunk[] cs, NewChunk[] ncs)
           
 void map(Chunk[] cs, NewChunk nc1, NewChunk nc2)
           
 void map(Chunk c0, Chunk c1)
          Override with your map implementation.
 void map(Chunk c0, Chunk c1, Chunk c2)
          Override with your map implementation.
 void map(Chunk c0, Chunk c1, Chunk c2, NewChunk nc)
           
 void map(Chunk c0, Chunk c1, Chunk c2, NewChunk nc1, NewChunk nc2)
           
 void map(Chunk c0, Chunk c1, NewChunk nc)
           
 void map(Chunk c0, Chunk c1, NewChunk nc1, NewChunk nc2)
           
 void map(Chunk c, NewChunk nc)
           
 void onCompletion(jsr166y.CountedCompleter caller)
          OnCompletion - reduce the left & right into self.
 boolean onExceptionalCompletion(java.lang.Throwable ex, jsr166y.CountedCompleter caller)
          Cancel/kill all work as we can, then rethrow...
protected  void postGlobal()
           
 java.lang.String profString()
           
 void reduce(T mrt)
          Override to combine results from 'mrt' into 'this' MRTask2.
protected  void reduce4(T mrt)
          Call user's reduction.
protected  void setupLocal()
          Override to do any remote initialization on the 1st remote instance of this object, for initializing node-local shared data structures.
 Vec vecs(int i)
          Returns a Vec from the Frame.
 
Methods inherited from class water.DTask
copyOver, frozenType, getDException, hasException, logVerbose, newInstance, onAck, onAckAck, read, setException, toDocField, write, writeJSONFields
 
Methods inherited from class water.H2O.H2OCountedCompleter
compute, priority
 
Methods inherited from class jsr166y.CountedCompleter
addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryComplete
 
Methods inherited from class jsr166y.ForkJoinTask
adapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnfork
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_fr

public Frame _fr
The Vectors to work on.


_outputFrame

public Frame _outputFrame

_nodes

protected long _nodes
Internal field to track a range of remote nodes/JVMs to work on


_nleft

protected transient RPC<T extends MRTask2<T>> _nleft
Internal field to track the left & right remote nodes/JVMs to work on


_nrite

protected transient RPC<T extends MRTask2<T>> _nrite
Internal field to track the left & right remote nodes/JVMs to work on


_topLocal

protected transient boolean _topLocal
Internal field to track if this is a top-level local call


_lo

protected transient int _lo
Internal field to track a range of local Chunks to work on


_hi

protected transient int _hi
Internal field to track a range of local Chunks to work on


_left

protected transient T extends MRTask2<T> _left
Internal field to track the left & right sub-range of chunks to work on


_rite

protected transient T extends MRTask2<T> _rite
Internal field to track the left & right sub-range of chunks to work on


_fs

protected transient Futures _fs
We can add more things to block on - in case we want a bunch of lazy tasks produced by children to all end before this top-level task ends. Semantically, these will all complete before we return from the top-level task. Pragmatically, we block on a finer grained basis.

Constructor Detail

MRTask2

public MRTask2()
Method Detail

map

public void map(Chunk c)
Override with your map implementation. This overload is given a single local input Chunk. It is meant for map/reduce jobs that use a single column in a input Frame. All map variants are called, but only one is expected to be overridden.


map

public void map(Chunk c,
                NewChunk nc)

map

public void map(Chunk c0,
                Chunk c1)
Override with your map implementation. This overload is given two local Chunks. All map variants are called, but only one is expected to be overridden.


map

public void map(Chunk c0,
                Chunk c1,
                NewChunk nc)

map

public void map(Chunk c0,
                Chunk c1,
                NewChunk nc1,
                NewChunk nc2)

map

public void map(Chunk c0,
                Chunk c1,
                Chunk c2)
Override with your map implementation. This overload is given three local input Chunks. All map variants are called, but only one is expected to be overridden.


map

public void map(Chunk c0,
                Chunk c1,
                Chunk c2,
                NewChunk nc)

map

public void map(Chunk c0,
                Chunk c1,
                Chunk c2,
                NewChunk nc1,
                NewChunk nc2)

map

public void map(Chunk[] cs)
Override with your map implementation. This overload is given an array of local input Chunks, for Frames with arbitrary column numbers. All map variants are called, but only one is expected to be overridden.


map

public void map(Chunk[] cs,
                NewChunk nc)

map

public void map(Chunk[] cs,
                NewChunk nc1,
                NewChunk nc2)

map

public void map(Chunk[] cs,
                NewChunk[] ncs)

reduce

public void reduce(T mrt)
Override to combine results from 'mrt' into 'this' MRTask2. Both 'this' and 'mrt' are guaranteed to either have map() run on them, or be the results of a prior reduce(). Reduce is optional if, e.g., the result is some output vector.


setupLocal

protected void setupLocal()
Override to do any remote initialization on the 1st remote instance of this object, for initializing node-local shared data structures.


closeLocal

protected void closeLocal()
Override to do any remote cleaning on the last remote instance of this object, for disposing of node-local shared data structures.


profString

public java.lang.String profString()

vecs

public final Vec vecs(int i)
Returns a Vec from the Frame.


doAll

public final T doAll(Vec... vecs)
Invokes the map/reduce computation over the given Vecs. This call is blocking.


doAll

public final T doAll(int outputs,
                     Vec... vecs)

doAll

public final T doAll(Frame fr)
Invokes the map/reduce computation over the given Frame. This call is blocking.


doAll

public final T doAll(int outputs,
                     Frame fr)

dfork

public final T dfork(Vec... vecs)
Invokes the map/reduce computation over the given Frame. This call is asynchronous. It returns 'this', on which getResult() can be invoked later to wait on the computation.


dfork

public final T dfork(Frame fr)

dfork

public final T dfork(int outputs,
                     Vec... vecs)

dfork

public final T dfork(int outputs,
                     Frame fr)

getResult

public final T getResult()
Block for & get any final results from a dfork'd MRTask2. Note: the desired name 'get' is final in ForkJoinTask.


dinvoke

public final void dinvoke(H2ONode sender)
Called once on remote at top level, probably with a subset of the cloud. Called internal by D/F/J. Not expected to be user-called.

Overrides:
dinvoke in class DTask

compute2

public final void compute2()
Called from FJ threads to do local work. The first called Task (which is also the last one to Complete) also reduces any global work. Called internal by F/J. Not expected to be user-called.

Specified by:
compute2 in class H2O.H2OCountedCompleter

onCompletion

public final void onCompletion(jsr166y.CountedCompleter caller)
OnCompletion - reduce the left & right into self. Called internal by F/J. Not expected to be user-called.

Overrides:
onCompletion in class jsr166y.CountedCompleter
Parameters:
caller - the task invoking this method (which may be this task itself).

postGlobal

protected void postGlobal()

reduce4

protected void reduce4(T mrt)
Call user's reduction. Also reduce any new AppendableVecs. Called internal by F/J. Not expected to be user-called.


onExceptionalCompletion

public final boolean onExceptionalCompletion(java.lang.Throwable ex,
                                             jsr166y.CountedCompleter caller)
Cancel/kill all work as we can, then rethrow... do not invisibly swallow exceptions (which is the F/J default). Called internal by F/J. Not expected to be user-called.

Overrides:
onExceptionalCompletion in class DTask
Parameters:
ex - the exception
caller - the task invoking this method (which may be this task itself).
Returns:
true if this exception should be propagated to this tasks completer, if one exists.

clone

public T clone()
Local Clone - setting final-field completer

Overrides:
clone in class DTask