An intelligent data pre-processing of complex datasets

Shuzlina Abdul-Rahman; Azuraliza Abu Bakar; Zeti Azura Mohamed-Hussein

Journal ArticleOPEN ACCESS

An intelligent data pre-processing of complex datasets

Intelligent Data Analysis (2012) 16(2) 305-325

DOI: 10.3233/IDA-2012-0525

18Citations

27Readers

Abstract

Pre-processing plays a vital role in classification tasks, particularly when complex features are involved, and this demands a highly intelligent method. In bioinformatics, where datasets are categorised as having complex features, the need for pre-processing is unavoidable. In this paper, we propose a framework for selecting the discriminatory features from protein sequences prior to classification by integrating the filter and wrapper approaches. Several state-of-the-art multivariate filters were explored in the first phase to remove the unwanted features that contributed to noise, while particle swarm optimisation (PSO) with support vector machine (SVM) was adopted in the wrapper phase to produce the most optimal features. Several PSO variants were investigated in the wrapper phase to compare the most suitable PSO variants for the problem domain. The results of both phases were analysed based on classification accuracy, number of selected features, modelling time and area under the curve on the main dataset and, five benchmark machine learning datasets of similar complexity. The higher classification accuracy of the proposed framework was highly reliable with an improvement over the filter phase and the use of full features despite using smaller features. © 2012 - IOS Press and the authors. All rights reserved.

Author supplied keywords

Cite

CITATION STYLE

APA

Abdul-Rahman, S., Bakar, A. A., & Mohamed-Hussein, Z. A. (2012). An intelligent data pre-processing of complex datasets. Intelligent Data Analysis, 16(2), 305–325. https://doi.org/10.3233/IDA-2012-0525

An intelligent data pre-processing of complex datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions