An automatic intelligent language classifier

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper presents a novel sentence-based language classifier that accepts a sentence as input and produces a confidence value for each target language. The proposed classifier incorporates Unicode based features and a neural network. The three features Unicode, exclusive Unicode and word matching score are extracted and fed to a neural network for obtaining a final confidence value. The word matching score is calculated by matching words in an input sentence against a common word list for each target language. In a common word list, the most frequently used words for each language are statistically collected and a database is created. The preliminary experiments were performed using test samples from web documents for languages such as English, German, Polish, French, Spanish, Chinese, Japanese and Korean. The classification accuracy of 98.88% has been achieved on a small database. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Verma, B., Lee, H., & Zakos, J. (2009). An automatic intelligent language classifier. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5507 LNCS, pp. 639–646). https://doi.org/10.1007/978-3-642-03040-6_78

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free