Using Empirically Constructed Lexical Resources for Named Entity Recognition

  • Jonnalagadda S
  • Cohen T
  • Wu S
  • et al.
N/ACitations
Citations of this article
105Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes.

Cite

CITATION STYLE

APA

Jonnalagadda, S., Cohen, T., Wu, S., Liu, H., & Gonzalez, G. (2013). Using Empirically Constructed Lexical Resources for Named Entity Recognition. Biomedical Informatics Insights, 6s1, BII.S11664. https://doi.org/10.4137/bii.s11664

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free