Incorporating dictionary features into conditional random fields for gene/protein named entity recognition

Hongfei Lin; Yanpeng Li; Zhihao Yang

Conference Proceedings

Incorporating dictionary features into conditional random fields for gene/protein named entity recognition

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4819 LNAI 162-173

DOI: 10.1007/978-3-540-77018-3_18

6Citations

6Readers

Get full text

Abstract

Biomedical Named Entity Recognition (BioNER) is an important preliminary step for biomedical text mining. Previous researchers built dictionaries of gene/protein names from online databases and incorporated them into machine learning models as features, but the effects were very limited. This paper gives a quality assessment of four dictionaries derived form online resources, and investigate the impacts of two factors (i.e., dictionary coverage and noisy terms) that may lead to the poor performance of dictionary features. Experiments are performed by comparing performances of the external dictionaries and a dictionary derived from GENETAG corpus, using Conditional Random Fields (CRFs) with dictionary features. We also make observations of the impacts regarding long names and short names. The results show that low coverage of long names and noises of short names are the main problems of current online resources and a high quality dictionary could substantially improve the accuracy of BioNER. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Lin, H., Li, Y., & Yang, Z. (2007). Incorporating dictionary features into conditional random fields for gene/protein named entity recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 162–173). Springer Verlag. https://doi.org/10.1007/978-3-540-77018-3_18

Incorporating dictionary features into conditional random fields for gene/protein named entity recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions