PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese

Bowei Zhang; Weiwei Sun; Xiaojun Wan; Zongming Guo

Conference Proceedings

PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11838 LNAI 814-826

DOI: 10.1007/978-3-030-32233-5_63

8Citations

10Readers

Get full text

Abstract

One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning model that is able to automatically extract high-quality sentence-level paraphrases from multiple Chinese translations of the same source texts. By applying this new model, we obtain a large-scale paraphrase corpus, which contains 509,832 pairs of paraphrased sentences. The quality of this new corpus is manually examined. Our new model is language-independent, meaning that such paraphrase corpora for other languages can be built in the same way.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, B., Sun, W., Wan, X., & Guo, Z. (2019). PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11838 LNAI, pp. 814–826). Springer. https://doi.org/10.1007/978-3-030-32233-5_63

PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese

Abstract

Author supplied keywords

Cite

Register to see more suggestions