Relation-Guided Pre-Training for Open-Domain Question Answering

5Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

Abstract

Answering complex open-domain questions requires understanding the latent relations between involving entities. However, we found that the existing QA datasets are extremely imbalanced in some types of relations, which hurts the generalization performance over questions with long-tail relations. To remedy this problem, in this paper, we propose a Relation-Guided Pre-Training (RGPT-QA) framework1. We first generate a relational QA dataset covering a wide range of relations from both the Wikidata triplets and Wikipedia hyperlinks. We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity. We demonstrate that by pretraining with propoed RGPT-QA techique, the popular open-domain QA model, Dense Passage Retriever (DPR), achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions. Particularly, we show that RGPT-QA improves significantly on questions with long-tail relations.

References Powered by Scopus

Wikidata: A free collaborative knowledgebase

2863Citations
N/AReaders
Get full text

Natural Questions: A Benchmark for Question Answering Research

1803Citations
N/AReaders
Get full text

Self-training with noisy student improves imagenet classification

1641Citations
N/AReaders
Get full text

Cited by Powered by Scopus

You Only Need One Model for Open-domain Question Answering

12Citations
N/AReaders
Get full text

Momentum Contrastive Pre-training for Question Answering

1Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hu, Z., Sun, Y., & Chang, K. W. (2021). Relation-Guided Pre-Training for Open-Domain Question Answering. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 3431–3448). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.292

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 14

70%

Researcher 3

15%

Lecturer / Post doc 2

10%

Professor / Associate Prof. 1

5%

Readers' Discipline

Tooltip

Computer Science 20

74%

Linguistics 4

15%

Social Sciences 2

7%

Neuroscience 1

4%

Save time finding and organizing research with Mendeley

Sign up for free