PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese

8Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning model that is able to automatically extract high-quality sentence-level paraphrases from multiple Chinese translations of the same source texts. By applying this new model, we obtain a large-scale paraphrase corpus, which contains 509,832 pairs of paraphrased sentences. The quality of this new corpus is manually examined. Our new model is language-independent, meaning that such paraphrase corpora for other languages can be built in the same way.

Cite

CITATION STYLE

APA

Zhang, B., Sun, W., Wan, X., & Guo, Z. (2019). PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11838 LNAI, pp. 814–826). Springer. https://doi.org/10.1007/978-3-030-32233-5_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free