High dimensional data always lead to overfitting in the prediction model. There are many feature selection methods used to reduce dimensionality. However, previous studies in this area of research have reported that an imbalanced class raises another issue in the prediction model. The existence of the imbalanced class can lead to low accuracy in the minority class. Therefore, high dimensional data with imbalanced class not only increase the computational cost but also reduce the accuracy of the prediction model. Handling imbalanced class in high dimensional data is still not widely reported in the literature. The objective of the study is to increase the performance of the prediction model. We increased the sample size using the Synthetic Minority Oversampling Technique (SMOTE) and performing the dimension reduction using minimum redundancy and maximum relevance criteria. The support vector machine (SVM) classifier was used to build the prediction model. The leukaemia dataset was used in this study due to its high dimensionality and imbalanced class. Consistent with the literature, the result shows that the performance of the shortlisted features is better than those without undergoing the SMOTE. In conclusion, a better classification result can be achieved when high dimensional feature selection coupled with the oversampling method. However, there are certain drawbacks associated with the use of a constant amount of synthesis of SMOTE, further study on different amounts of synthesis might provide different performances.
CITATION STYLE
Chin, F. Y., Lim, C. A., & Lem, K. H. (2021). Handling leukaemia imbalanced data using synthetic minority oversampling technique (SMOTE). In Journal of Physics: Conference Series (Vol. 1988). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1988/1/012042
Mendeley helps you to discover research relevant for your work.