A comparative study on dimensionality reduction between principal component analysis and k-means clustering

13Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

Abstract

The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.

Cite

CITATION STYLE

APA

Noor Mathivanan, N. M., MdGhani, N. A., & Janor, R. M. (2019). A comparative study on dimensionality reduction between principal component analysis and k-means clustering. Indonesian Journal of Electrical Engineering and Computer Science, 16(2), 752–758. https://doi.org/10.11591/ijeecs.v16.i2.pp752-758

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free