CMeIE: Construction and Evaluation of Chinese Medical Information Extraction Dataset

Tongfeng Guan; Hongying Zan; Xiabing Zhou; Hongfei Xu; Kunli Zhang

Conference Proceedings

CMeIE: Construction and Evaluation of Chinese Medical Information Extraction Dataset

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12430 LNAI 270-282

DOI: 10.1007/978-3-030-60450-9_22

21Citations

7Readers

Get full text

Abstract

In this paper, we present the Chinese Medical Information Extraction (CMeIE) dataset, consisting of 28, 008 sentences, 85, 282 triplets, 11 entities, and 44 relations derived from medical textbooks and clinical practices, constructed by several rounds of manual annotation. Additionally, we evaluate performances of the most recent state-of-the-art frameworks and pre-trained language models for the joint extraction of entities and relations task on the CMeIE dataset. Experiment results show that even these most advanced models still have a large space to improve on our dataset; currently, the best F1 score on the dataset is 58.44%. Our analysis points out several challenges and multiple potential future research directions for the task specialized in the medical domain.

Author supplied keywords

Cite

CITATION STYLE

APA

Guan, T., Zan, H., Zhou, X., Xu, H., & Zhang, K. (2020). CMeIE: Construction and Evaluation of Chinese Medical Information Extraction Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12430 LNAI, pp. 270–282). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60450-9_22

CMeIE: Construction and Evaluation of Chinese Medical Information Extraction Dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions