A language model is a technique that shows which words are more or less likely to be generated during some conversation in any natural language. N-gram language modeling is the pioneer technology used to construct language models. N-gram technique considers preceding words only to predict the upcoming word. Factored language modeling is a formalism that provides a facility to undertake other linguistic knowledge of the words like gender, number, part of speech, stem of word along with word itself to predict next word in a sentence. This paper discusses the effect of various combinations of linguistic features of word on predictability of next word in Hindi-language sentence. The paper also discusses how use of linguistic features decreases the perplexity by 31.71% as compared to perplexity of baseline N-gram language model.
CITATION STYLE
Babhulgaonkar, A. R., & Sonavane, S. P. (2020). Application of Linguistic Knowledge in Factored Language Modeling for Hindi Language. In Advances in Intelligent Systems and Computing (Vol. 1025, pp. 521–531). Springer. https://doi.org/10.1007/978-981-32-9515-5_50
Mendeley helps you to discover research relevant for your work.