Hazardous document detection based on dependency relations and thesaurus

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose algorithms to increase the accuracy of hazardous Web page detection by correcting the detection errors of typical keyword-based algorithms based on the dependency relations between the hazardous keywords and their neighboring segments. Most typical text-based filtering systems ignore the context where the hazardous keywords appear. Our algorithms automatically obtain segment pairs that are in dependency relations and appear to characterize hazardous documents. In addition, we also propose a practical approach to expanding segment pairs with a thesaurus. Experiments with a large number of Web pages show that our algorithms increase the detection F value by 7.3% compared to the conventional algorithms. © 2010 Springer-Verlag.

Cite

CITATION STYLE

APA

Ikeda, K., Yanagihara, T., Hattori, G., Matsumoto, K., & Takisima, Y. (2010). Hazardous document detection based on dependency relations and thesaurus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6464 LNAI, pp. 455–465). https://doi.org/10.1007/978-3-642-17432-2_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free