We propose a novel sentential paraphrase acquisition method. To build a well-balanced corpus for Paraphrase Identification, we especially focus on acquiring both non-trivial positive and negative instances. We use multiple machine translation systems to generate positive candidates and a monolingual corpus to extract negative candidates. To collect nontrivial instances, the candidates are uniformly sampled by word overlap rate. Finally, annotators judge whether the candidates are either positive or negative. Using this method, we built and released the first evaluation corpus for Japanese paraphrase identification, which comprises 655 sentence pairs.
CITATION STYLE
Suzuki, Y., Kajiwara, T., & Komachi, M. (2017). Building a non-trivial paraphrase corpus using multiple machine translation systems. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 36–42). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P17-3007
Mendeley helps you to discover research relevant for your work.