GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval

19Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.

Abstract

A major challenge of research on non-English machine reading for question answering (QA) is the lack of annotated datasets. In this paper, we present GermanQuAD, a dataset of 13,722 extractive question/answer pairs. To improve the reproducibility of the dataset creation approach and foster QA research on other languages, we summarize lessons learned and evaluate reformulation of question/answer pairs as a way to speed up the annotation process. An extractive QA model trained on GermanQuAD significantly outperforms multilingual models and also shows that machine-translated training data cannot fully substitute hand-annotated training data in the target language. Finally, we demonstrate the wide range of applications of GermanQuAD by adapting it to GermanDPR, a training dataset for dense passage retrieval (DPR), and train and evaluate one of the first non-English DPR models.

References Powered by Scopus

Natural Questions: A Benchmark for Question Answering Research

1798Citations
N/AReaders
Get full text

Know what you don’t know: Unanswerable questions for SQuAD

1351Citations
N/AReaders
Get full text

SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis

45Citations
N/AReaders
Get full text

Cited by Powered by Scopus

FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models

14Citations
N/AReaders
Get full text

The BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

7Citations
N/AReaders
Get full text

Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya

7Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Möller, T., Risch, J., & Pietsch, M. (2021). GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval. In Proceedings of the 3rd Workshop on Machine Reading for Question Answering, MRQA 2021 (pp. 42–50). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.mrqa-1.4

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 16

55%

Researcher 9

31%

Professor / Associate Prof. 2

7%

Lecturer / Post doc 2

7%

Readers' Discipline

Tooltip

Computer Science 32

80%

Linguistics 4

10%

Neuroscience 2

5%

Social Sciences 2

5%

Article Metrics

Tooltip
Mentions
References: 1

Save time finding and organizing research with Mendeley

Sign up for free