We present a Stratified MAchine Reading Test (SMART) data set for Chinese in which each question is assigned a “level” that reflects the type of reasoning that is needed to answer the question. This data set consists of close to 40 K question-answer pairs and its stratified design allows machine reading researchers to quickly focus in on areas that present the most challenge for a machine comprehension system. We further establish a baseline for future research with BERT, and present results that show the levels we have designed correspond well with the level of difficulty that BERT experiences in answering these questions, as reflected by the lower accuracy for higher levels. We have also collected human answers to the questions in the test portion of this data set, and show that humans and the machine have different challenges when answering these questions. This means that even though the machine is approaching human-level performance on this task, humans and the machine perform this task with very different mechanisms.
CITATION STYLE
Yao, J., Feng, M., Feng, H., Wang, Z., Zhang, Y., & Xue, N. (2019). SMART: A Stratified Machine Reading Test. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11838 LNAI, pp. 67–79). Springer. https://doi.org/10.1007/978-3-030-32233-5_6
Mendeley helps you to discover research relevant for your work.