Trait/Object

com.peoplepattern.text

LanguageIdentifier

Related Docs: object LanguageIdentifier | package text

Permalink

trait LanguageIdentifier extends AnyRef

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. LanguageIdentifier
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def classify(text: String, threshold: Double, minTextSize: Int): Option[(String, Double)]

    Permalink

    Classify an indivitual text for the language of the text

    Classify an indivitual text for the language of the text

    text

    the text to classify

    threshold

    the minimum score threshold to consider a valid prediction

    minTextSize

    the minimum length for the text to make a prediction

    returns

    a pair of code (ISO-639-1 langauge code) and prediction score if a prediction could be made, otherwise None

  2. abstract def summarize(texts: TraversableOnce[String], threshold: Double, frequency: Double, minTextSize: Int): Vector[(String, Double, Double)]

    Permalink

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    This is built to classify many text entries, with the assumption that we only neeed to knowabout the most common langagues in the texts.

    Change threshold and frequency to deal with outlier data. Increasing threshold increases the confidence of identified languages, while increasing frequency reduces impact of minor second language usage.

    texts

    the texts to classify and summarize

    threshold

    the

    returns

    Vector of 3-tuples (lang-code, avg-lang-classification-score, frequency)

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def classify(text: String): Option[(String, Double)]

    Permalink

    Classify an indivitual text for the language of the text

    Classify an indivitual text for the language of the text

    This method should use sensible defaults for the threshold and minTextScore parameters

    returns

    a pair of code (ISO-639-1 langauge code) and prediction score if a prediction could be made, otherwise None

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  13. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  15. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. def summarize(texts: TraversableOnce[String]): Vector[(String, Double, Double)]

    Permalink

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.

    This is built to classify many text entries, with the assumption that we only neeed to knowabout the most common langagues in the texts.

    Change threshold and frequency to deal with outlier data. Increasing threshold increases the confidence of identified languages, while increasing frequency reduces impact of minor second language usage.

    returns

    Vector of 3-tuples (lang-code, avg-lang-classification-score, frequency)

  17. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  18. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  19. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped