Classify an indivitual text for the language of the text
Classify an indivitual text for the language of the text
the text to classify
the minimum score threshold to consider a valid prediction
the minimum length for the text to make a prediction
a pair of code (ISO-639-1 langauge code) and prediction score if a prediction could be made, otherwise None
Classify an indivitual text for the language of the text
Classify an indivitual text for the language of the text
This method should use sensible defaults for the threshold
and
minTextScore
parameters
a pair of code (ISO-639-1 langauge code) and prediction score if a prediction could be made, otherwise None
Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.
Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.
This is built to classify many text entries, with the assumption that we only neeed to knowabout the most common langagues in the texts.
Change threshold and frequency to deal with outlier data. Increasing threshold increases the confidence of identified languages, while increasing frequency reduces impact of minor second language usage.
the texts to classify and summarize
the
Vector of 3-tuples (lang-code, avg-lang-classification-score, frequency)
Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.
Given a list of strings, returns an ordered list of unique identified languages using ISO-639-1 langauge code.
This is built to classify many text entries, with the assumption that we only neeed to knowabout the most common langagues in the texts.
Change threshold and frequency to deal with outlier data. Increasing threshold increases the confidence of identified languages, while increasing frequency reduces impact of minor second language usage.
Vector of 3-tuples (lang-code, avg-lang-classification-score, frequency)