Imbalanced web spam classification using self-labeled techniques and multi-classifier models

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web spam has become a critical problem in web search area. Unfortunately, highly imbalanced distribution and too many unlabeled instances always disturb the performance of classifiers. In this paper, we focus on solving the serious imbalance distribution of web spam under the semi-supervised learning frame. First, we introduce the self-labeled techniques and the multi-classifier mode. Second, the imbalance situation of web spam data sets and five combination methods are proposed. Particularly, we propose several improved self-labeled methods by using classic over-sampling technique SMOTE in pre-processing stage, and then balance the uneven labeled sets. Further, considering the serious imbalance situation of web spam, we introduce the AUC value into semi-supervised classification. Experiments under WEBSPAM UK2007 indicate that our methods can get better performance both on recall and AUC values.

Cite

CITATION STYLE

APA

Fang, X., Tan, Y., Zheng, X., Zhang, H., & Zhou, S. (2015). Imbalanced web spam classification using self-labeled techniques and multi-classifier models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9403, pp. 663–668). Springer Verlag. https://doi.org/10.1007/978-3-319-25159-2_60

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free