On Document Similarity Measures

  • Asahara M
  • Kato S
N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Document similarity measuring techniques are used to evaluate both content and writing style. Evaluation measures for comparing the summary or translation of a system-generated source text with that of human-generated text have been proposed in text summarization and machine translation fields. The distance metrics are mea-sures in terms of morphemes or morpheme sequences to evaluate or register different writing styles. In this study, we discuss the relations among the equivalence proper-ties of mathematical metrics, similarities, kernels, ordinal scales, and correlations. In addition, we investigate the behavior of techniques for measuring content and style similarities for several corpora having similar content. The analysis results obtained using different document similarity measurement techniques indicate the instability of the evaluate system.

Cite

CITATION STYLE

APA

Asahara, M., & Kato, S. (2016). On Document Similarity Measures. Journal of Natural Language Processing, 23(5), 463–499. https://doi.org/10.5715/jnlp.23.463

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free