Microarray technology measures on a large and parallel scale to express tens of thousands of genes. It has widely applied to predict gene function, new subtypes of specific tumors and cancer classification. However, microarray data are known has feature characteristics such as high dimension, small sample, high noise, and imbalanced class distribution. Support Vector Machine (SVM) has been widely used and shows the success in major applications to improve classification performance. To overcome the high dimension, we applied the Ensemble-SVM method. This method classifies features use clustering hierarchy and each group will be classified. While the condition of imbalance data becomes a problem in classification because the classifier will tend to predict the majority class compared to the minority class. Therefore, a Random Undersampling or EnSVM-RUS method is used to balance the size of the majority class into the minority class. We uses threefold cross-validation with a feature selection method that is Fast Correlation Based Filter (FCBF). The multiclass method used is SVM One Against One (OAO). While the evaluation criteria of performance classification based on the value of accuracy, F-score and G-mean and running time. We performs a simulation study with various scenario level of ratio imbalance (IR) that is ratio 1, 5, and 8 to know the performance of the proposed method. While the application on real data using Microarray DNA data with IR 4.22, 15.00 and 23.17 The results showed that EnSVM-RUS-OAO method with 2 clusters had higher performance than the EnSVM-OAO and EnSVM-OAO methods. Increasing the ratio imbalance doesn't affect the advantage of the EnSVM-RUS-OAO method when compared to EnSVM-OAO and EnSVM-OAO methods. While on the use of the kernel, RBF kernel and polynomials produce higher performance and shorter computation time than linear kernels.
CITATION STYLE
Rahmi, N. S. (2020). Ensemble-support vector machine-random undersampling: Simulation study of multiclass classification for handling high dimensional and imbalanced data. In Journal of Physics: Conference Series (Vol. 1613). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1613/1/012064
Mendeley helps you to discover research relevant for your work.