Measuring performance of n-gram and jaccard-similarity metrics in document plagiarism application

Nova Eka Diana; Ikrima Hanana Ulfa

Conference ProceedingsOPEN ACCESS

Measuring performance of n-gram and jaccard-similarity metrics in document plagiarism application

Journal of Physics: Conference Series (2019) 1196(1)

DOI: 10.1088/1742-6596/1196/1/012069

15Citations

44Readers

Abstract

String-based similarity metrics were mainly used to lexically measure the similarity between words based on the string sequences and character compositions. This research aimed to build an application that can identify the similarity between documents. The program employed two lexical-based algorithms, N-gram and Jaccard, to check the documents similarity. The author focused on analysing the algorithms' performance based on accuracy, sensitivity, and efficiency metric. Datasets used in this research were the final thesis documents in Indonesian and English language. Experiment results revealed that Jaccard algorithm has a better performance in term of accuracy and sensitivity compared to N-gram. Notwithstanding its superior performance, Jaccard had a longer running time than N-gram to process documents. Furthermore, the results also pointed out that the cross-language documents were indeed affecting the degree of similarity checking.

Cite

CITATION STYLE

APA

Diana, N. E., & Hanana Ulfa, I. (2019). Measuring performance of n-gram and jaccard-similarity metrics in document plagiarism application. In Journal of Physics: Conference Series (Vol. 1196). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1196/1/012069

Measuring performance of n-gram and jaccard-similarity metrics in document plagiarism application

Abstract

Cite

Register to see more suggestions