Large-scale cloze test dataset created by teachers

Qizhe Xie; Guokun Lai; Zihang Dai; Eduard Hovy

Conference ProceedingsOPEN ACCESS

Large-scale cloze test dataset created by teachers

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (2018) 2344-2356

DOI: 10.18653/v1/d18-1257

N/ACitations

160Readers

Abstract

Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH 1 2, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck.

Cite

CITATION STYLE

APA

Xie, Q., Lai, G., Dai, Z., & Hovy, E. (2018). Large-scale cloze test dataset created by teachers. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 2344–2356). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1257

Readers' Seniority

PhD / Post grad / Masters / Doc 57

79%

Researcher 9

13%

Professor / Associate Prof. 3

Lecturer / Post doc 3

Readers' Discipline

Computer Science 68

80%

Linguistics 7

Engineering 6

Business, Management and Accounting 4

Large-scale cloze test dataset created by teachers

Abstract

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline