Building a non-trivial paraphrase corpus using multiple machine translation systems

Yui Suzuki; Tomoyuki Kajiwara; Mamoru Komachi

Conference ProceedingsOPEN ACCESS

Building a non-trivial paraphrase corpus using multiple machine translation systems

ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (2017) 36-42

DOI: 10.18653/v1/P17-3007

13Citations

96Readers

Abstract

We propose a novel sentential paraphrase acquisition method. To build a well-balanced corpus for Paraphrase Identification, we especially focus on acquiring both non-trivial positive and negative instances. We use multiple machine translation systems to generate positive candidates and a monolingual corpus to extract negative candidates. To collect nontrivial instances, the candidates are uniformly sampled by word overlap rate. Finally, annotators judge whether the candidates are either positive or negative. Using this method, we built and released the first evaluation corpus for Japanese paraphrase identification, which comprises 655 sentence pairs.

Cite

CITATION STYLE

APA

Suzuki, Y., Kajiwara, T., & Komachi, M. (2017). Building a non-trivial paraphrase corpus using multiple machine translation systems. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 36–42). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P17-3007

Building a non-trivial paraphrase corpus using multiple machine translation systems

Abstract

Cite

Register to see more suggestions