We present a set of approaches for corpus filtering in the context of document external plagiarism detection. Producing filtered sets, and hence limiting the problem's search space, can be a performance improvement and is used today in many real-world applications such as web search engines. With regards to document plagiarism detection, the database of documents to match the suspicious candidate against is potentially fairly large, and hence it becomes very recommendable to apply filtered set generation techniques. The approaches that we have implemented include information retrieval methods and a document similarity measure based on a variant of tf-idf. Furthermore, we perform textual comparisons, as well as a semantic similarity analysis in order to capture higher levels of obfuscation. © 2011 Springer-Verlag.
CITATION STYLE
Micol, D., Ferrández, Ó., & Muñoz, R. (2011). Information retrieval techniques for corpus filtering applied to external plagiarism detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6716 LNCS, pp. 100–111). https://doi.org/10.1007/978-3-642-22327-3_10
Mendeley helps you to discover research relevant for your work.