A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Hanyu Zhao; Sha Yuan; Jiahong Leng; Xiang Pan; Zhao Xue; Quanyue Ma; Yangxiao Liang

Conference Proceedings

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 12869 LNAI 268-279

DOI: 10.1007/978-3-030-84186-7_18

1Citations

1Readers

Get full text

Abstract

Machine reading comprehension (MRC) is a typical natural language processing (NLP) task and has developed rapidly in the last few years. Various reading comprehension datasets have been built to support MRC studies. However, large-scale and high-quality datasets are rare due to the high complexity and huge workforce cost of making such a dataset. Besides, most reading comprehension datasets are in English, and Chinese datasets are insufficient. In this paper, we propose an automatic method for MRC dataset generation, and build the largest Chinese medical reading comprehension dataset presently named CMedRC. Our dataset contains 17k questions generated by our automatic method and some seed questions. We obtain the corresponding answers from a medical knowledge graph and manually check all of them. Finally, we test BiLSTM and BERT-based pre-trained language models (PLMs) on our dataset and propose a baseline for the following studies. Results show that the automatic MRC dataset generation method is considerable for future model improvements.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhao, H., Yuan, S., Leng, J., Pan, X., Xue, Z., Ma, Q., & Liang, Y. (2021). A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12869 LNAI, pp. 268–279). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-84186-7_18

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

Abstract

Author supplied keywords

Cite

Register to see more suggestions