Comparison of text-similarity metrics for the purpose of identifying identical web pages during automated web application testing

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper focuses on the evaluation of effectiveness of a number of algorithms used to assess text similarity. The purpose of such evaluation is to determine the best methods for comparing and identifying near-identical web pages. Such comparison of web pages is in turn a prerequisite for building new automated testing tools and security scanners. The goal is to build scanners that will be able to automatically test the web application behavior for a large range of supplied parameters (known as fuzzing). Such testing requires massive generation and processing of requests, which in turn require fast page comparison methods. The similarity comparison is performed on a shortened, tokenized version of web pages, using a test set of pages downloaded from popular websites. A methodology for the evaluation of similarity metrics is proposed, together with a quality metric for the intended task. Several tokenization strategies are also tested and their impact on the final result is assessed.

Cite

CITATION STYLE

APA

Zachara, M., & Pałka, D. (2016). Comparison of text-similarity metrics for the purpose of identifying identical web pages during automated web application testing. In Advances in Intelligent Systems and Computing (Vol. 430, pp. 25–35). Springer Verlag. https://doi.org/10.1007/978-3-319-28561-0_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free