Finding friends and flipping frenemies: Automatic paraphrase dataset augmentation using graph theory

12Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

Abstract

Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip their labels. We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.

Cite

CITATION STYLE

APA

Chen, H., Ji, Y., & Evans, D. (2020). Finding friends and flipping frenemies: Automatic paraphrase dataset augmentation using graph theory. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 4741–4751). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.426

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free