STORET is one method to determine the river water quality, and to classify them into four classes (very good, good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows, the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accuracy from 83.3% to be 98.8%. While the process of noise elimination onthe data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
CITATION STYLE
Uyun, S., & Sulistyowati, E. (2020). Feature selection for multiple water quality status: Integrated bootstrapping and SMOTE approach in imbalance classes. International Journal of Electrical and Computer Engineering, 10(4), 4331–4339. https://doi.org/10.11591/ijece.v10i4.pp4331-4339
Mendeley helps you to discover research relevant for your work.