Developing an adaptive language model for Bahasa Indonesia

9Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

Abstract

A language model is one of the important components in a speech recognition system. It is commonly developed using a statistical method called n-gram. However, a standard n-gram cannot be used for general domains with so many ambiguous semantics of sentences. This paper focuses on developing an adaptive n-gram language model for Bahasa Indonesia. First, a text corpus of ten million distinct sentences is crawled from hundreds of websites of news, magazines, personal blogs, and writing forums. The text corpus is then used to construct an adaptive language model using Latent Dirichlet Allocation (LDA) with Collapsed Gibbs Sampling (CGS) training method. Compare to the standard n-gram, the adaptive language model gives a better performance in the word selection to produce the best sentence.

Cite

CITATION STYLE

APA

Hidayatullah, S. N., & Suyanto. (2019). Developing an adaptive language model for Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 10(1), 488–492. https://doi.org/10.14569/IJACSA.2019.0100163

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free