Identifying redundant features using unsupervised learning for high-dimensional data

Asir Antony Gnana Singh Danasingh; Appavu alias Balamurugan Subramanian; Jebamalar Leavline Epiphany

Journal ArticleOPEN ACCESS

Identifying redundant features using unsupervised learning for high-dimensional data

SN Applied Sciences (2020) 2(8)

DOI: 10.1007/s42452-020-3157-6

18Citations

47Readers

Abstract

In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are massively generated due to advancements in information and communication technology. The high-dimensional space contains irrelevant and redundant features; both of them reduce the classification accuracy and increase space and building time of the classifiers. Redundancy and relevancy analysis mechanisms of the feature selection process remove the irrelevant and redundant features. Identifying the irrelevant features is a simple task since that only considers the relevancy between each feature and the target class of a dataset using any one of the statistical or information theoretic measures. Identifying the redundant features from a dataset is quite difficult, especially in high-dimensional space since it needs to consider the relevancy among the features. This leads to more computational complexity and an inappropriate relevancy measure that can degrade the classification accuracy. In order to overcome these problems, this paper presents an unsupervised learning-based redundancy analysis mechanism for feature selection by evaluating various clustering techniques in terms of average redundancy rate and runtime.

Author supplied keywords

Cite

CITATION STYLE

APA

Danasingh, A. A. G. S., Subramanian, A. alias B., & Epiphany, J. L. (2020). Identifying redundant features using unsupervised learning for high-dimensional data. SN Applied Sciences, 2(8). https://doi.org/10.1007/s42452-020-3157-6

Identifying redundant features using unsupervised learning for high-dimensional data

Abstract

Author supplied keywords

Cite

Register to see more suggestions