Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.
CITATION STYLE
Song, B., Bao, Z., Wang, Y. Z., Zhang, W., & Sun, C. (2020). Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12431 LNAI, pp. 481–489). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60457-8_39
Mendeley helps you to discover research relevant for your work.