A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

Jafar Pouramini; Behrouz Minaei-Bidgoli

Journal ArticleOPEN ACCESS

A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

Pouramini J
Minaei-Bidgoli B

Bulletin de la Société Royale des Sciences de Liège (2016) 358-375

DOI: 10.25518/0037-9565.5414

N/ACitations

7Readers

Abstract

Ever-growing extension of textual data has increased the necessity of processing textual data. Data imbalance in classification of textual data is one of the cases that decrease efficiency. In order to confront with imbalance problem, various methods are suggested. Some of the methods are: data-based, cost-based, algorithm-based and feature selection methods. In recent researches, some methods are considered into account using ensemble methods. In this research, a new oversampling method is suggested. In the new method the number of minor class samples is increased using ontology and then random oversampling is performed for minor class. Finally, using the methods of feature selection, appropriate features are selected. New ensemble method was tested using Hamshahri data. The results show that the ensemble method on Hamshahri collection, despite decreasing number of features, causes the improvement of classification results for polynomial Naïve Bayes and decision tree.

Cite

CITATION STYLE

APA

Pouramini, J., & Minaei-Bidgoli, B. (2016). A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts. Bulletin de La Société Royale Des Sciences de Liège, 358–375. https://doi.org/10.25518/0037-9565.5414

A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

Abstract

Cite

Register to see more suggestions