Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data

3Citations
Citations of this article
49Readers
Mendeley users who have this article in their library.

Abstract

Despite considerable progress, most machine reading comprehension (MRC) tasks still lack sufficient training data to fully exploit powerful deep neural network models with millions of parameters, and it is laborious, expensive, and time-consuming to create largescale, high-quality MRC data through crowdsourcing. This paper focuses on generating more training data for MRC tasks by leveraging existing question-answering (QA) data. We first collect a large-scale multi-subject multiple-choice QA dataset for Chinese, ExamQA. We next use incomplete, yet relevant snippets returned by a web search engine as the context for each QA instance to convert it into a weakly-labeled MRC instance. To better use the weakly-labeled data to improve a target MRC task, we evaluate and compare several methods and further propose a self-teaching paradigm. Experimental results show that, upon state-of-the-art MRC baselines, we can obtain +5.1% in accuracy on a multiple-choice Chinese MRC dataset, C3, and +3.8% in exact match on an extractive Chinese MRC dataset, CMRC 2018, demonstrating the usefulness of the generated QAbased weakly-labeled data for different types of MRC tasks as well as the effectiveness of self-teaching. ExamQA will be available at https://dataset.org/examqa/.

Cite

CITATION STYLE

APA

Yu, D., Sun, K., Yu, D., & Cardie, C. (2021). Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 56–68). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free