Broadcast variable abstraction
Drm block-wise tuple: Array of row keys and the matrix block.
Checkpointed DRM API.
Additional experimental operations over CheckpointedDRM implementation.
Distributed context (a.
Abstraction of optimizer/distributed engine
Basic spark DRM trait.
Common Drm ops
Drm row-wise tuple
Implicit broadcast -> value conversion.
Just throw all engine operations into context as well.
Compute COV(X) matrix and mean of row-wise data set.
note: will pin input into cache if not yet pinned.
mean → covariance DRM
Thin column-wise mean and covariance matrix computation.
note: will pin input to cache if not yet pinned.
mean → covariance matrix (in core)
Compute column wise means and standard deviations -- distributed version.
note: input will be pinned to cache if not yet pinned
colMeans → colStdevs
Compute column wise means and variances -- distributed version.
Note: will pin input to cache if not yet pinned.
colMeans → colVariances
We assume that whenever computational action is invoked without explicit checkpoint, the user doesn't imply caching
Implicit conversion to in-core with NONE caching of the result.
Convert arbitrarily-keyed matrix to int-keyed matrix.
input to be transcoded
old key -> int key map to front-end?
Sequentially keyed matrix + (optionally) map from non-int key to Int key. If the key type is actually Int, then we just return the argument with None for the map, regardless of computeMap parameter.
Broadcast support API
Broadcast support API
Load DRM from hdfs (as in Mahout DRM format)
Shortcut to parallelizing matrices with indices, ignore row labels.
This creates an empty DRM with specified number of partitions and cardinality.
Creates empty DRM with non-trivial height
Parallelize in-core matrix as spark distributed matrix, using row ordinal indices as data set keys.
Parallelize in-core matrix as spark distributed matrix, using row labels as a data set keys.
(Optional) Sampling operation.
Compute fold-in distances (distributed version).
m x d row-wise dataset. Pinned to cache if not yet pinned.
n x d row-wise dataset. Pinned to cache if not yet pinned.
m x d pairwise squared distance matrix (between rows of X and Y)
Distributed Squared distance matrix computation.