Feature selection in high-dimensional data

17Citations
Citations of this article
164Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Today, with the increase of data dimensions, many challenges are faced in many contexts including machine learning, informatics, and medicine. However, reducing data dimension can be considered as a basic method in handling high-dimensional data, because by reducing dimensions, applying many of the existing operations on data is facilitated. Microarray data are derived from tissues and cells considering differences in the gene, which can be useful for diagnosing disease and tumors. Due to the large number of features (genes) and small number of samples in microarray datasets, selecting the most salient genes is a difficult task. Among the many techniques of machine learning, feature selection and data classification play a very important and widespread role in enhancing human life, from detecting voice emotion to detecting illness in the body. In medicine, an effective gene selection can greatly enhance the process of prediction and diagnosis of cancer. After selecting effective genes, the duty of a specific classifier is usually to discriminate healthy people from patients that are suffering from cancer based on their expression of the selected genes. A vast body of feature selection methods has been proposed for high-dimensional microarray data. Traditionally, these methods fall into three categories including filter, wrapper, and hybrid approaches. Furthermore, new techniques such as ensemble methods have recently been developed to improve the process of feature selection and classification. This chapter presents an overview of the most popular feature selection methods to deal with high-dimensional data and analyze their performance under different conditions. The chapter starts with a global overview of the high-dimensional data and feature selection (Sects. 5.2 and 5.3). Then, in Sect. 5.4 we review the state-of-the-art methods on filter algorithms. In the next three Sects. (5.5, 5.6 and 5.7) we describe the wrapper, hybrid, and embedded methods and in each section, an overview of several works performed on these methods is discussed. Sect. 5.8 describes the ensemble techniques recently considered by the researchers and summarizes the works done based on these techniques. In Sect. 5.9, we present the experimental results of the most significant methods on high-dimensional data. Finally, Sect. 5.10 summarizes this chapter.

Cite

CITATION STYLE

APA

Rouhi, A., & Nezamabadi-Pour, H. (2020). Feature selection in high-dimensional data. In Advances in Intelligent Systems and Computing (Vol. 1123, pp. 85–128). Springer. https://doi.org/10.1007/978-3-030-34094-0_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free