Building a non-trivial paraphrase corpus using multiple machine translation systems

13Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

We propose a novel sentential paraphrase acquisition method. To build a well-balanced corpus for Paraphrase Identification, we especially focus on acquiring both non-trivial positive and negative instances. We use multiple machine translation systems to generate positive candidates and a monolingual corpus to extract negative candidates. To collect nontrivial instances, the candidates are uniformly sampled by word overlap rate. Finally, annotators judge whether the candidates are either positive or negative. Using this method, we built and released the first evaluation corpus for Japanese paraphrase identification, which comprises 655 sentence pairs.

Cite

CITATION STYLE

APA

Suzuki, Y., Kajiwara, T., & Komachi, M. (2017). Building a non-trivial paraphrase corpus using multiple machine translation systems. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 36–42). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P17-3007

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free