Indian language identification for short text

Sreebha Bhaskaran; Geetika Paul; Deepa Gupta; J. Amudha

Conference Proceedings

Indian language identification for short text

Advances in Intelligent Systems and Computing (2021) 1086 47-58

DOI: 10.1007/978-981-15-1275-9_5

7Citations

4Readers

Get full text

Abstract

Language identification is used to categorize the language of a given document. Language identification categorizes the contents and can have a better search results for a multilingual document. In this work, we classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages. It detects that a given document is in multilingual and identifies the appropriate Indian languages. The approach used is the combination of n-gram technique and a list of short distinctive words. The n-gram model applied is language independent whereas short word method uses less computation. The results show the effectiveness of our approach over the synthetic data.

Author supplied keywords

Cite

CITATION STYLE

APA

Bhaskaran, S., Paul, G., Gupta, D., & Amudha, J. (2021). Indian language identification for short text. In Advances in Intelligent Systems and Computing (Vol. 1086, pp. 47–58). Springer. https://doi.org/10.1007/978-981-15-1275-9_5

Indian language identification for short text

Abstract

Author supplied keywords

Cite

Register to see more suggestions