Clinical Named Entity Recognition (CNER), the task of identifying the entity boundaries in clinical texts, is essential for many applications. Previous methods usually follow the traditional NER methods that heavily rely on language specific features (i.e. linguistics and lexicons) and high quality annotated data. However, due to the problem of Limited Availability of Annotated Data and Informal Clinical Texts, CNER becomes more challenging. In this paper, we propose a novel method that learn multiple representations for each category, namely category-multi-representation (CMR) that captures the semantic relatedness between words and clinical categories from different perspectives. CMR is learned based on a large scale unannotated corpus and a small set of annotated data, which greatly alleviates the burden of human effort. Instead of the language specific features, our proposed method uses more evidential features without any additional NLP tools, and enjoys a lightweight adaption among languages. We conduct a series of experiments to verify our new CMR features can further improve the performance of NER significantly without leveraging any external lexicons.
CITATION STYLE
Zhang, J., Li, J., Wang, S., Zhang, Y., Cao, Y., Hou, L., & Li, X. L. (2018). Category multi-representation: A unified solution for named entity recognition in clinical texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10938 LNAI, pp. 275–287). Springer Verlag. https://doi.org/10.1007/978-3-319-93037-4_22
Mendeley helps you to discover research relevant for your work.