In this paper, we propose algorithms to increase the accuracy of hazardous Web page detection by correcting the detection errors of typical keyword-based algorithms based on the dependency relations between the hazardous keywords and their neighboring segments. Most typical text-based filtering systems ignore the context where the hazardous keywords appear. Our algorithms automatically obtain segment pairs that are in dependency relations and appear to characterize hazardous documents. In addition, we also propose a practical approach to expanding segment pairs with a thesaurus. Experiments with a large number of Web pages show that our algorithms increase the detection F value by 7.3% compared to the conventional algorithms. © 2010 Springer-Verlag.
CITATION STYLE
Ikeda, K., Yanagihara, T., Hattori, G., Matsumoto, K., & Takisima, Y. (2010). Hazardous document detection based on dependency relations and thesaurus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6464 LNAI, pp. 455–465). https://doi.org/10.1007/978-3-642-17432-2_46
Mendeley helps you to discover research relevant for your work.