Information retrieval techniques for corpus filtering applied to external plagiarism detection

Daniel Micol; Óscar Ferrández; Rafael Muñoz

Conference Proceedings

Information retrieval techniques for corpus filtering applied to external plagiarism detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6716 LNCS 100-111

DOI: 10.1007/978-3-642-22327-3_10

2Citations

6Readers

Get full text

Abstract

We present a set of approaches for corpus filtering in the context of document external plagiarism detection. Producing filtered sets, and hence limiting the problem's search space, can be a performance improvement and is used today in many real-world applications such as web search engines. With regards to document plagiarism detection, the database of documents to match the suspicious candidate against is potentially fairly large, and hence it becomes very recommendable to apply filtered set generation techniques. The approaches that we have implemented include information retrieval methods and a document similarity measure based on a variant of tf-idf. Furthermore, we perform textual comparisons, as well as a semantic similarity analysis in order to capture higher levels of obfuscation. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Micol, D., Ferrández, Ó., & Muñoz, R. (2011). Information retrieval techniques for corpus filtering applied to external plagiarism detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6716 LNCS, pp. 100–111). https://doi.org/10.1007/978-3-642-22327-3_10

Information retrieval techniques for corpus filtering applied to external plagiarism detection

Abstract

Cite

Register to see more suggestions