Training datasets collection and evaluation of feature selection methods for web content filtering

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper focuses on the main aspects of development of a qualitative system for dynamic content filtering. These aspects include collection of meaningful training data and the feature selection techniques. The Web changes rapidly so the classifier needs to be regularly re-trained. The problem of training data collection is treated as a special case of the focused crawling. A simple and easy-to-tune technique was proposed, implemented and tested. The proposed feature selection technique tends to minimize the feature set size without loss of accuracy and to consider interlinked nature of the Web. This is essential to make a content filtering solution fast and non-burdensome for end users, especially when content filtering is performed using a restricted hardware. Evaluation and comparison of various classifiers and techniques are provided.

Cite

CITATION STYLE

APA

Suvorov, R., Sochenkov, I., & Tikhomirov, I. (2014). Training datasets collection and evaluation of feature selection methods for web content filtering. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8722, 129–138. https://doi.org/10.1007/978-3-319-10554-3_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free