This paper describes a new algorithm identifying common phrase sequences. The SHAPD2 algorithm was designed to achieve the goal of a single-pass corpus to corpus comparison. It is a highly efficient solution that finds application with considerable amount of data and excels over other approaches. One of its possible applications is the detection of potential plagiarisms by comparing not a document against a corpus, but corpus to corpus. This makes the SHAPD2 algorithm a valuable alternative to the available solutions. © Springer International Publishing Switzerland 2014.
CITATION STYLE
Ceglarek, D. (2014). Single-pass corpus to corpus comparison by sentence hashing. Studies in Computational Intelligence, 513, 167–176. https://doi.org/10.1007/978-3-319-01787-7_16
Mendeley helps you to discover research relevant for your work.