Abstract
A language model is one of the important components in a speech recognition system. It is commonly developed using a statistical method called n-gram. However, a standard n-gram cannot be used for general domains with so many ambiguous semantics of sentences. This paper focuses on developing an adaptive n-gram language model for Bahasa Indonesia. First, a text corpus of ten million distinct sentences is crawled from hundreds of websites of news, magazines, personal blogs, and writing forums. The text corpus is then used to construct an adaptive language model using Latent Dirichlet Allocation (LDA) with Collapsed Gibbs Sampling (CGS) training method. Compare to the standard n-gram, the adaptive language model gives a better performance in the word selection to produce the best sentence.
Author supplied keywords
Cite
CITATION STYLE
Hidayatullah, S. N., & Suyanto. (2019). Developing an adaptive language model for Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 10(1), 488–492. https://doi.org/10.14569/IJACSA.2019.0100163
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.