Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

48Citations
Citations of this article
104Readers
Mendeley users who have this article in their library.

Abstract

Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources. We have tested our system on the Genia corpus and the BioCreative Task 1B mouse and yeast corpora and achieved a level of performance comparable to state-of-the-art systems that require supervised learning and manual dictionary creation. Our technique should also work for organisms following similar naming conventions as mouse, such as human. Further evaluation and improvement of gene/protein NER and normalization systems is somewhat hampered by the lack of larger test collections and collections for additional organisms, such as human.

References Powered by Scopus

GENIA corpus - A semantically annotated corpus for bio-textmining

977Citations
N/AReaders
Get full text

A survey of current work in biomedical text mining

594Citations
N/AReaders
Get full text

Tagging gene and protein names in biomedical text

216Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Automatic summarization

379Citations
N/AReaders
Get full text

Overview of BioCreative II gene normalization

258Citations
N/AReaders
Get full text

Frontiers of biomedical text mining: Current progress

220Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Cohen, A. M. (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In ACL-ISMB 2005 - Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Proceedings of the Workshop (pp. 17–24). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1641484.1641487

Readers over time

‘09‘10‘11‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘2407142128

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 42

65%

Researcher 14

22%

Professor / Associate Prof. 6

9%

Lecturer / Post doc 3

5%

Readers' Discipline

Tooltip

Computer Science 53

83%

Linguistics 5

8%

Agricultural and Biological Sciences 4

6%

Engineering 2

3%

Save time finding and organizing research with Mendeley

Sign up for free
0