Developing an adaptive language model for Bahasa Indonesia

Satria Nur Hidayatullah; undefined Suyanto

Journal ArticleOPEN ACCESS

Developing an adaptive language model for Bahasa Indonesia

International Journal of Advanced Computer Science and Applications (2019) 10(1) 488-492

DOI: 10.14569/IJACSA.2019.0100163

9Citations

23Readers

Abstract

A language model is one of the important components in a speech recognition system. It is commonly developed using a statistical method called n-gram. However, a standard n-gram cannot be used for general domains with so many ambiguous semantics of sentences. This paper focuses on developing an adaptive n-gram language model for Bahasa Indonesia. First, a text corpus of ten million distinct sentences is crawled from hundreds of websites of news, magazines, personal blogs, and writing forums. The text corpus is then used to construct an adaptive language model using Latent Dirichlet Allocation (LDA) with Collapsed Gibbs Sampling (CGS) training method. Compare to the standard n-gram, the adaptive language model gives a better performance in the word selection to produce the best sentence.

Author supplied keywords

Cite

CITATION STYLE

APA

Hidayatullah, S. N., & Suyanto. (2019). Developing an adaptive language model for Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 10(1), 488–492. https://doi.org/10.14569/IJACSA.2019.0100163

Developing an adaptive language model for Bahasa Indonesia

Abstract

Author supplied keywords

Cite

Register to see more suggestions