org.apache.mahout.math.cf

SimilarityAnalysis

object SimilarityAnalysis extends Serializable

Based on "Ted Dunnning & Ellen Friedman: Practical Machine Learning, Innovations in Recommendation", available at http://www.mapr.com/practical-machine-learning

see also "Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce ACM Conference on Recommender Systems 2012"

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. SimilarityAnalysis
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def computeSimilarities(drm: DrmLike[Int], numUsers: Int, maxInterestingItemsPerThing: Int, bcastNumInteractionsB: BCast[Vector], bcastNumInteractionsA: BCast[Vector], crossCooccurrence: Boolean = true): DrmLike[Int]

  9. def cooccurrences(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500, drmBs: Array[DrmLike[Int]] = Array()): List[DrmLike[Int]]

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices

    drmARaw

    Primary interaction matrix

    randomSeed

    when kept to a constant will make repeatable downsampling

    maxInterestingItemsPerThing

    number of similar items to return per item, default: 50

    maxNumInteractions

    max number of interactions after downsampling, default: 500

    returns

    a list of org.apache.mahout.math.drm.DrmLike containing downsampled DRMs for cooccurrence and cross-cooccurrence

  10. def cooccurrencesIDSs(indexedDatasets: Array[IndexedDataset], randomSeed: Int = 0xdeadbeef, maxInterestingItemsPerThing: Int = 50, maxNumInteractions: Int = 500): List[IndexedDataset]

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, .

    Calculates item (column-wise) similarity using the log-likelihood ratio on A'A, A'B, A'C, ... and returns a list of similarity and cross-similarity matrices. Somewhat easier to use method, which handles the ID dictionaries correctly

    indexedDatasets

    first in array is primary/A matrix all others are treated as secondary

    randomSeed

    use default to make repeatable, otherwise pass in system time or some randomizing seed

    maxInterestingItemsPerThing

    max similarities per items

    maxNumInteractions

    max number of input items per item

    returns

    a list of org.apache.mahout.math.indexeddataset.IndexedDataset containing downsampled IndexedDatasets for cooccurrence and cross-cooccurrence

  11. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  17. def logLikelihoodRatio(numInteractionsWithA: Long, numInteractionsWithB: Long, numInteractionsWithAandB: Long, numInteractions: Long): Double

    Compute loglikelihood ratio see http://tdunning.

    Compute loglikelihood ratio see http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html for details

  18. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  21. def rowSimilarity(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxNumInteractions: Int = 500): DrmLike[Int]

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a DRM of rows and similar rows

    drmARaw

    Primary interaction matrix

    randomSeed

    when kept to a constant will make repeatable downsampling

    maxInterestingSimilaritiesPerRow

    number of similar items to return per item, default: 50

    maxNumInteractions

    max number of interactions after downsampling, default: 500

  22. def rowSimilarityIDS(indexedDataset: IndexedDataset, randomSeed: Int = 0xdeadbeef, maxInterestingSimilaritiesPerRow: Int = 50, maxObservationsPerRow: Int = 500): IndexedDataset

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows.

    Calculates row-wise similarity using the log-likelihood ratio on AA' and returns a drm of rows and similar rows. Uses IndexedDatasets, which handle external ID dictionaries properly

    indexedDataset

    compare each row to every other

    randomSeed

    use default to make repeatable, otherwise pass in system time or some randomizing seed

    maxInterestingSimilaritiesPerRow

    max elements returned in each row

    maxObservationsPerRow

    max number of input elements to use

  23. def sampleDownAndBinarize(drmM: DrmLike[Int], seed: Int, maxNumInteractions: Int): DrmLike[Int]

    Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.

    Selectively downsample rows and items with an anomalous amount of interactions, inspired by https://github.com/tdunning/in-memory-cooccurrence/blob/master/src/main/java/com/tdunning/cooc/Analyze.java

    additionally binarizes input matrix, as we're only interesting in knowing whether interactions happened or not

    drmM

    matrix to downsample

    seed

    random number generator seed, keep to a constant if repeatability is neccessary

    maxNumInteractions

    number of elements in a row of the returned matrix

    returns

    the downsampled DRM

  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  25. def toString(): String

    Definition Classes
    AnyRef → Any
  26. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped