This research compares unsupervised learning methods in topic extraction and modeling in large-scale text corpora. The methods used are Singular Value Decomposition (SVD) and Latent Dirichlet Allocation (LDA). SVD is used to extract important features through term-document matrix decomposition, while LDA identifies hidden topics based on the probability distribution of words. The research involves data collection, data exploratory analysis (EDA), topic extraction using SVD, data preprocessing, and topic extraction using LDA. The data used were large-scale text corpora. Data explorative analysis was conducted to understand the characteristics and structure of text corpora before topic extraction was performed. SVD and LDA were used to identify the main topics in the text corpora. The results showed that SVD and LDA were successful in topic extraction and modeling of large-scale text corpora. SVD reveals cohesive patterns and thematically related topics. LDA identifies hidden topics based on the probability distribution of words. These findings have important implications in text processing and analysis. The resulting topic representations can be used for information mining, document categorization, and more in-depth text analysis. The use of SVD and LDA in topic extraction and modeling of large-scale text corpora provides valuable insights in text analysis. However, this research has limitations. The success of the methods depends on the quality and representativeness of the text corpora. Topic interpretation still requires further understanding and analysis. Future research can develop methods and techniques to improve the accuracy and efficiency of topic extraction and text corpora modeling.
CITATION STYLE
Henderi, Hayadi, B. H., Sofiana, S., Padeli, & Setiyadi, D. (2023). Unsupervised Learning Methods for Topic Extraction and Modeling in Large-scale Text Corpora using LSA and LDA. Journal of Applied Data Sciences, 4(3), 103–118. https://doi.org/10.47738/jads.v4i3.102
Mendeley helps you to discover research relevant for your work.