An exploration of data augmentation and sampling techniques for domain-agnostic question answering

22Citations
Citations of this article
115Readers
Mendeley users who have this article in their library.

Abstract

To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pretrained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a simple negative sampling technique to be particularly effective, even though it is typically used for datasets that include unanswerable questions, such as SQuAD 2.0. When applied in conjunction with per-domain sampling, our XLNet (Yang et al., 2019)-based submission achieved the second best Exact Match and F1 in the MRQA leaderboard competition.

Cite

CITATION STYLE

APA

Longpre, S., Lu, Y., Tu, Z., & DuBois, C. (2019). An exploration of data augmentation and sampling techniques for domain-agnostic question answering. In MRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering (pp. 220–227). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d19-5829

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free