We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic labels are introduced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method. © 2008 Association for Computational Linguistics.
CITATION STYLE
Callison-Burch, C. (2008). Syntactic constraints on paraphrases extracted from parallel corpora. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (pp. 196–205). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1613715.1613743
Mendeley helps you to discover research relevant for your work.