org.apache.mahout.classifier.naivebayes

SparkNaiveBayes

object SparkNaiveBayes extends NaiveBayes

Distributed training of a Naive Bayes model. Follows the approach presented in Rennie et.al.: Tackling the poor assumptions of Naive Bayes Text classifiers, ICML 2003, http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Linear Supertypes
NaiveBayes, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. SparkNaiveBayes
  2. NaiveBayes
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. type CategoryParser = (String) ⇒ String

    Definition Classes
    NaiveBayes

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def argmax(v: Vector): (Int, Double)

    Definition Classes
    NaiveBayes
  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def defaultAlphaI: Float

    Definition Classes
    NaiveBayes
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def extractLabelsAndAggregateObservations[K](stringKeyedObservations: DrmLike[K], cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit arg0: ClassTag[K], ctx: DistributedContext): (HashMap[String, Integer], DrmLike[Int])

    Math-Scala Naive Bayes optimized for Spark.

    Math-Scala Naive Bayes optimized for Spark.

    Extract label Keys from raw TF or TF-IDF Matrix generated by seqdirectory/seq2sparse and aggregate TF or TF-IDF values by their label

    stringKeyedObservations

    DrmLike matrix; Output from seq2sparse in form K = e.g./Category/document_title V = TF or TF-IDF values per term

    cParser

    a String => String function used to extract categories from Keys of the stringKeyedObservations DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'

    returns

    (labelIndexMap, aggregatedByLabelObservationDrm) labelIndexMap is a HashMap K = label row index V = label aggregatedByLabelObservationDrm is a DrmLike[Int] of aggregated TF or TF-IDF counts per label

    Definition Classes
    SparkNaiveBayes → NaiveBayes
  13. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  20. def seq2SparseCategoryParser: (String) ⇒ String

    Definition Classes
    NaiveBayes
  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  22. def test[K](model: NBModel, testSet: DrmLike[K], testComplementary: Boolean = false, cParser: (String) ⇒ String = seq2SparseCategoryParser)(implicit arg0: ClassTag[K], ctx: DistributedContext): ResultAnalyzer

    Test a trained model with a labeled dataset

    Test a trained model with a labeled dataset

    K

    implicitly determined Key type of test set DRM: String

    model

    a trained NBModel

    testSet

    a labeled testing set

    testComplementary

    test using a complementary or a standard NB classifier

    cParser

    a String => String function used to extract categories from Keys of the testing set DRM. The default CategoryParser will extract "Category" from: '/Category/document_id'

    returns

    a result analyzer with confusion matrix and accuracy statistics

    Definition Classes
    SparkNaiveBayes → NaiveBayes
  23. def toString(): String

    Definition Classes
    AnyRef → Any
  24. def train(observationsPerLabel: DrmLike[Int], labelIndex: Map[String, Integer], trainComplementary: Boolean, alphaI: Float): NBModel

    Definition Classes
    NaiveBayes
  25. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from NaiveBayes

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped