Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering

21Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We tackle the problem of question answering directly on a large document collection, combining simple "bag of words" passage retrieval with a BERT-based reader for extracting answer spans. In the context of this architecture, we present a data augmentation technique using distant supervision to automatically annotate paragraphs as either positive or negative examples to supplement existing training data, which are then used together to fine-tune BERT. We explore a number of details that are critical to achieving high accuracy in this setup: the proper sequencing of different datasets during fine-tuning, the balance between "difficult" vs. "easy" examples, and different approaches to gathering negative examples. Experimental results show that, with the appropriate settings, we can achieve large gains in effectiveness on two English and two Chinese QA datasets. We are able to achieve results at or near the state of the art without any modeling advances, which once again affirms the cliché "there's no data like more data".

Author supplied keywords

Cite

CITATION STYLE

APA

Xie, Y., Yang, W., Tan, L., Xiong, K., Yuan, N. J., Huai, B., … Lin, J. (2020). Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 2934–2940). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380060

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free