Hazardous document detection based on dependency relations and thesaurus

Kazushi Ikeda; Tadashi Yanagihara; Gen Hattori; Kazunori Matsumoto; Yasuhiro Takisima

Conference Proceedings

Hazardous document detection based on dependency relations and thesaurus

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6464 LNAI 455-465

DOI: 10.1007/978-3-642-17432-2_46

3Citations

1Readers

Get full text

Abstract

In this paper, we propose algorithms to increase the accuracy of hazardous Web page detection by correcting the detection errors of typical keyword-based algorithms based on the dependency relations between the hazardous keywords and their neighboring segments. Most typical text-based filtering systems ignore the context where the hazardous keywords appear. Our algorithms automatically obtain segment pairs that are in dependency relations and appear to characterize hazardous documents. In addition, we also propose a practical approach to expanding segment pairs with a thesaurus. Experiments with a large number of Web pages show that our algorithms increase the detection F value by 7.3% compared to the conventional algorithms. © 2010 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Ikeda, K., Yanagihara, T., Hattori, G., Matsumoto, K., & Takisima, Y. (2010). Hazardous document detection based on dependency relations and thesaurus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6464 LNAI, pp. 455–465). https://doi.org/10.1007/978-3-642-17432-2_46

Hazardous document detection based on dependency relations and thesaurus

Abstract

Author supplied keywords

Cite

Register to see more suggestions