Over the last years, research on web spam filtering has gained interest from both academia and industry. In this context, although there are a good number of successful antispam techniques available (i.e., content-based, link-based, and hiding), an adequate combination of different algorithms supported by an advanced web spamfiltering platformwould offer more promising results. To this end, we propose theWSF2 framework, a new platform particularly suitable for filtering spam content on web pages. Currently, our framework allows the easy combination of different filtering techniques including, but not limited to, regular expressions and well-known classifiers (i.e., Naÿve Bayes, Support Vector Machines, and C5.0). Applying our WSF2 framework over the publicly availableWEBSPAM-UK2007 corpus, we have been able to demonstrate that a simple combination of different techniques is able to improve the accuracy of single classifiers on web spam detection. As a result, we conclude that the proposed filtering platform is a powerful tool for boosting applied research in this area.
CITATION STYLE
Fdez-Glez, J., Ruano-Ordás, D., Laza, R., Méndez, J. R., Pavón, R., & Fdez-Riverola, F. (2016). WSF2: A Novel Framework for Filtering Web Spam. Scientific Programming, 2016. https://doi.org/10.1155/2016/6091385
Mendeley helps you to discover research relevant for your work.