Automatic generation of summary obfuscation corpus for plagiarism detection

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we describe an approach to create a summary obfuscation corpus for the task of plagiarism detection. Our method is based on information from the Document Understanding Conferences related to years 2001 and 2006, for the English language. Overall, an unattributed summary used within someone else’s document is considered a kind of plagiarism because the main author’s ideas are still in a succinct form. In order to create the corpus, we use a Named Entity Recognizer (NER) to identify the entities within an original document, its associated summaries, and target documents. After, these entities, together with similar paragraphs in target documents, are used to make fake suspicious documents and plagiarized documents. The corpus was tested in plagiarism competition.

Cite

CITATION STYLE

APA

Miranda-Jiménez, S., & Stamatatos, E. (2017). Automatic generation of summary obfuscation corpus for plagiarism detection. Acta Polytechnica Hungarica, 14(3), 99–112. https://doi.org/10.12700/APH.14.3.2017.3.6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free