EQUATE: A benchmark evaluation framework for quantitative reasoning in natural language inference

Abhilasha Ravichander; Aakanksha Naik; Carolyn Rose; Eduard Hovy

Conference ProceedingsOPEN ACCESS

EQUATE: A benchmark evaluation framework for quantitative reasoning in natural language inference

CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference (2019) 349-361

DOI: 10.18653/v1/k19-1033

40Citations

105Readers

Abstract

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE1 (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2%), but has limited verbal reasoning capabilities (-8.1%). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.

Cite

CITATION STYLE

APA

Ravichander, A., Naik, A., Rose, C., & Hovy, E. (2019). EQUATE: A benchmark evaluation framework for quantitative reasoning in natural language inference. In CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 349–361). Association for Computational Linguistics. https://doi.org/10.18653/v1/k19-1033

EQUATE: A benchmark evaluation framework for quantitative reasoning in natural language inference

Abstract

Cite

Register to see more suggestions