Feature extraction for document text using Latent Dirichlet Allocation

P. M. Prihatini; I. K. Suryawan; I. N. Mandia

Conference ProceedingsOPEN ACCESS

Feature extraction for document text using Latent Dirichlet Allocation

Journal of Physics: Conference Series (2018) 953(1)

DOI: 10.1088/1742-6596/953/1/012047

11Citations

35Readers

Abstract

Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.

Cite

CITATION STYLE

APA

Prihatini, P. M., Suryawan, I. K., & Mandia, I. N. (2018). Feature extraction for document text using Latent Dirichlet Allocation. In Journal of Physics: Conference Series (Vol. 953). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/953/1/012047

Feature extraction for document text using Latent Dirichlet Allocation

Abstract

Cite

Register to see more suggestions