Language detection and tracking in multilingual documents using weak estimators

Aleksander Stensby; B. John Oommen; Ole Christoffer Granmo

Conference ProceedingsOPEN ACCESS

Language detection and tracking in multilingual documents using weak estimators

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6218 LNCS 600-609

DOI: 10.1007/978-3-642-14980-1_59

9Citations

4Readers

Abstract

This paper deals with the extremely complicated problem of language detection and tracking in real-life electronic (for example, in Word-of-Mouth (WoM)) applications, where various segments of the text are written in different languages. The difficulties in solving the problem are many-fold. First of all, the analyst has no knowledge of when one language stops and when the next starts. Further, the features which one uses for any one language (for example, the n-grams) will not be valid to recognize another. Finally, and most importantly, in most real-life applications, such as in WoM, the fragments of text available before the switching, are so small that it renders any meaningful classification using traditional estimation methods almost meaningless. Earlier, the authors of [10] had recommended that for a variety of problems, the use of strong estimators (i.e., estimators that converge with probability 1) is sub-optimal. In this vein, we propose to solve the current problem using novel estimators that are pertinent for non-stationary environments. The classification results which involve as many as 8 languages demonstrates that our proposed methodology is both powerful and efficient. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Stensby, A., Oommen, B. J., & Granmo, O. C. (2010). Language detection and tracking in multilingual documents using weak estimators. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6218 LNCS, pp. 600–609). https://doi.org/10.1007/978-3-642-14980-1_59

Language detection and tracking in multilingual documents using weak estimators

Abstract

Author supplied keywords

Cite

Register to see more suggestions