Imbalanced web spam classification using self-labeled techniques and multi-classifier models

Xiaonan Fang; Yanyan Tan; Xiyuan Zheng; Huaxiang Zhang; Shuang Zhou

Conference Proceedings

Imbalanced web spam classification using self-labeled techniques and multi-classifier models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9403 663-668

DOI: 10.1007/978-3-319-25159-2_60

1Citations

5Readers

Get full text

Abstract

Web spam has become a critical problem in web search area. Unfortunately, highly imbalanced distribution and too many unlabeled instances always disturb the performance of classifiers. In this paper, we focus on solving the serious imbalance distribution of web spam under the semi-supervised learning frame. First, we introduce the self-labeled techniques and the multi-classifier mode. Second, the imbalance situation of web spam data sets and five combination methods are proposed. Particularly, we propose several improved self-labeled methods by using classic over-sampling technique SMOTE in pre-processing stage, and then balance the uneven labeled sets. Further, considering the serious imbalance situation of web spam, we introduce the AUC value into semi-supervised classification. Experiments under WEBSPAM UK2007 indicate that our methods can get better performance both on recall and AUC values.

Author supplied keywords

Cite

CITATION STYLE

APA

Fang, X., Tan, Y., Zheng, X., Zhang, H., & Zhou, S. (2015). Imbalanced web spam classification using self-labeled techniques and multi-classifier models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9403, pp. 663–668). Springer Verlag. https://doi.org/10.1007/978-3-319-25159-2_60

Imbalanced web spam classification using self-labeled techniques and multi-classifier models

Abstract

Author supplied keywords

Cite

Register to see more suggestions