Abstract
We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic contextfree grammars for gene and protein name classification, and compared the results with the new model. For all three methods, we used a variable order Markov model to predict the nature of strings not represented in the training data. The Priority Model achieves an F-measure of 0.958-0.960, consistently higher than the statistical language model and probabilistic context-free grammar.
Cite
CITATION STYLE
Tanabe, L., & Wilbur, W. J. (2006). A priority model for named entities. In HLT-NAACL 2006 - BioNLP 2006: Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis, Proceedings of the Workshop (pp. 33–40). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1567619.1567626
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.