Extracting Lexically Divergent Paraphrases from Twitter

  • Xu W
  • Ritter A
  • Callison-Burch C
  • et al.
N/ACitations
Citations of this article
134Readers
Mendeley users who have this article in their library.

Abstract

We present MultiP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method which combines a latent space model with a feature-based supervised classifier. Our model also captures lexically divergent paraphrases that differ from yet complement previous methods; combining our model with previous work significantly outperforms the state-of-the-art. In addition, we present a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter. We make this new dataset available to the research community.

Cite

CITATION STYLE

APA

Xu, W., Ritter, A., Callison-Burch, C., Dolan, W. B., & Ji, Y. (2014). Extracting Lexically Divergent Paraphrases from Twitter. Transactions of the Association for Computational Linguistics, 2, 435–448. https://doi.org/10.1162/tacl_a_00194

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free