com.peoplepattern.text

LanguageIdentifier

object LanguageIdentifier extends LanguageIdentifier

Linear Supertypes
LanguageIdentifier, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. LanguageIdentifier
  2. LanguageIdentifier
  3. AnyRef
  4. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def classify(text: String, threshold: Double, minTextSize: Int): Option[(String, Double)]

    Classify an indivitual text for the language of the text

    Classify an indivitual text for the language of the text

    text

    the text to classify

    threshold

    the minimum score threshold to consider a valid prediction

    minTextSize

    the minimum length for the text to make a prediction

    returns

    a pair of code (ISO-639-1 langauge code) and prediction score if a prediction could be made, otherwise None

    Definition Classes
    LanguageIdentifierLanguageIdentifier
  8. def classify(text: String): Option[(String, Double)]

    Classify an indivitual text for the language of the text

    Classify an indivitual text for the language of the text

    This method should use sensible defaults for the threshold and minTextScore parameters

    returns

    a pair of code (ISO-639-1 langauge code) and prediction score if a prediction could be made, otherwise None

    Definition Classes
    LanguageIdentifier
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. lazy val defaultFrequency: Double

  11. lazy val defaultMinTextSize: Int

  12. lazy val defaultThreshold: Double

  13. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  18. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  19. val model: Model

  20. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  21. final def notify(): Unit

    Definition Classes
    AnyRef
  22. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  23. def summarize(texts: TraversableOnce[String], threshold: Double, frequency: Double, minTextSize: Int): Vector[(String, Double, Double)]

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    This is built to classify many text entries, with the assumption that we only neeed to knowabout the most common langagues in the texts.

    Change threshold and frequency to deal with outlier data. Increasing threshold increases the confidence of identified languages, while increasing frequency reduces impact of minor second language usage.

    texts

    the texts to classify and summarize

    threshold

    the

    returns

    Vector of 3-tuples (lang-code, avg-lang-classification-score, frequency)

    Definition Classes
    LanguageIdentifierLanguageIdentifier
  24. def summarize(texts: TraversableOnce[String]): Vector[(String, Double, Double)]

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    This is built to classify many text entries, with the assumption that we only neeed to knowabout the most common langagues in the texts.

    Change threshold and frequency to deal with outlier data. Increasing threshold increases the confidence of identified languages, while increasing frequency reduces impact of minor second language usage.

    returns

    Vector of 3-tuples (lang-code, avg-lang-classification-score, frequency)

    Definition Classes
    LanguageIdentifier
  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  26. def toString(): String

    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from LanguageIdentifier

Inherited from AnyRef

Inherited from Any

Ungrouped