Indian language identification for short text

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Language identification is used to categorize the language of a given document. Language identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.

Cite

CITATION STYLE

APA

Bhaskaran, S., Paul, G., Gupta, D., & Amudha, J. (2021). Indian language identification for short text. In Advances in Intelligent Systems and Computing (Vol. 1086, pp. 47–58). Springer. https://doi.org/10.1007/978-981-15-1275-9_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free