Clustering algorithms such as k-Means, fail to function appropriately when used to analyze data with high dimensions. Therefore, in order to achieve a good clustering, a feature selection or a feature extraction dimensional reduction is needed. The Principal Component Analysis (PCA) algorithm often utilized the extraction methods, however, the reduction result is not too good, due to low quality of clustering and lengthy processing time. Therefore, it is necessary to study other algorithms methods to obtain alternatives to the PCA. This study therefore was conducted by comparing the results of Indonesian text document clustering, which had been reduced in dimensions by PCA, Self-Organizing Map (SOM), and Isometric Featuring Mapping (Isomap). The measurements were made on clustering quality parameters using the Davies Bouldin Index, computational time, and iterations. The results shows that SOM tend to improve cluster quality to 269.084% better than the k-Means, while, Isomap has the ability to speed up the clustering computing time by 190 times. In addition, the qualitative outcome determines the most appropriate algorithm extraction method capable of reducing clustering features of Indonesian language text documents.
CITATION STYLE
Jambak, M. I., Jambak, A. I. I., Febrianto, R. T., Saputra, D. M., & Jambak, M. I. (2021). Dimension reduction with extraction methods (principal component analysis - Self organizing map - Isometric mapping) in indonesian language text documents clustering. In Advances in Intelligent Systems and Computing (Vol. 1179 AISC, pp. 1–9). Springer. https://doi.org/10.1007/978-3-030-49336-3_1
Mendeley helps you to discover research relevant for your work.