Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

Viktor Schlegel; Goran Nenadic; Riza Batista-Navarro

Conference ProceedingsOPEN ACCESS

Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

35th AAAI Conference on Artificial Intelligence, AAAI 2021 (2021) 15 13762-13770

DOI: 10.1609/aaai.v35i15.17622

7Citations

21Readers

Abstract

Advances in NLP have yielded impressive results for the task of machine reading comprehension (MRC), with approaches having been reported to achieve performance comparable to that of humans. In this paper, we investigate whether stateof- the-art MRC models are able to correctly process Semantics Altering Modifications (SAM): linguistically-motivated phenomena that alter the semantics of a sentence while preserving most of its lexical surface form. We present a method to automatically generate and align challenge sets featuring original and altered examples. We further propose a novel evaluation methodology to correctly assess the capability of MRC systems to process these examples independent of the data they were optimised on, by discounting for effects introduced by domain shift. In a large-scale empirical study, we apply the methodology in order to evaluate extractive MRC models with regard to their capability to correctly process SAM-enriched data. We comprehensively cover 12 different state-of-the-art neural architecture configurations and four training datasets and find that - despite their well-known remarkable performance - optimised models consistently struggle to correctly process semantically altered data.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Schlegel, V., Nenadic, G., & Batista-Navarro, R. (2021). Semantics Altering Modifications for Evaluating Comprehension in Machine Reading. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 15, pp. 13762–13770). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i15.17622

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 8

73%

Researcher 3

27%

Readers' Discipline

Computer Science 10

83%

Physics and Astronomy 1

Business, Management and Accounting 1

Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

Abstract

References Powered by Scopus

SQuad: 100,000+ questions for machine comprehension of text

Adversarial examples for evaluating reading comprehension systems

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

Cited by Powered by Scopus

Measure and Improve Robustness in NLP Models: A Survey

Improving the robustness of machine reading comprehension via contrastive learning

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline