We investigate the quality of task specific word embeddings created with relatively small, targeted corpora. We present a comprehensive evaluation framework including both intrinsic and extrinsic evaluation that can be expanded to named entities beyond drug name. Intrinsic evaluation results tell that drug name embeddings created with a domain specific document corpus outperformed the previously published versions that derived from a very large general text corpus. Extrinsic evaluation uses word embedding for the task of drug name recognition with Bi-LSTM model and the results demonstrate the advantage of using domain-specific word embeddings as the only input feature for drug name recognition with F1-score achieving 0.91. This work suggests that it may be advantageous to derive domain specific embeddings for certain tasks even when the domain specific corpus is of limited size.
CITATION STYLE
Zhao, M., Masino, A. J., & Yang, C. C. (2018). A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity. In BioNLP 2018 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 17th BioNLP Workshop (pp. 156–160). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-2319
Mendeley helps you to discover research relevant for your work.