Training datasets collection and evaluation of feature selection methods for web content filtering

Roman Suvorov; Ilya Sochenkov; Ilya Tikhomirov

Journal Article

Training datasets collection and evaluation of feature selection methods for web content filtering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8722 129-138

DOI: 10.1007/978-3-319-10554-3_12

1Citations

7Readers

Get full text

Abstract

This paper focuses on the main aspects of development of a qualitative system for dynamic content filtering. These aspects include collection of meaningful training data and the feature selection techniques. The Web changes rapidly so the classifier needs to be regularly re-trained. The problem of training data collection is treated as a special case of the focused crawling. A simple and easy-to-tune technique was proposed, implemented and tested. The proposed feature selection technique tends to minimize the feature set size without loss of accuracy and to consider interlinked nature of the Web. This is essential to make a content filtering solution fast and non-burdensome for end users, especially when content filtering is performed using a restricted hardware. Evaluation and comparison of various classifiers and techniques are provided.

Author supplied keywords

Cite

CITATION STYLE

APA

Suvorov, R., Sochenkov, I., & Tikhomirov, I. (2014). Training datasets collection and evaluation of feature selection methods for web content filtering. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8722, 129–138. https://doi.org/10.1007/978-3-319-10554-3_12

Training datasets collection and evaluation of feature selection methods for web content filtering

Abstract

Author supplied keywords

Cite

Register to see more suggestions