Protein name tagging guidelines: Lessons learned

16Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper addresses the issue of a lack of standard definition of the problem of protein name tagging. We describe the lessons learned in developing a set of guidelines and present the first set of inter-coder results, viewed as an upper bound on system performance. Problems coders face include: (a) the ambiguity of names that can refer to either genes or proteins; (b) the difficulty of getting the exact extents of long protein names; and (c) the complexity of the guidelines. These problems have been addressed in two ways: (a) defining the tagging targets as protein named entities used in the literature to describe proteins or protein-associated or -related objects, such as domains, pathways, expression or genes, and (b) using two types of tags, protein tags and long-form tags, with the latter being used to optionally extend the boundaries of the protein tag when the name boundary is difficult to determine. Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure. The guidelines and annotated datasets, along with automatic tools, are available for research use. Copyright © 2005 John Wiley & Sons, Ltd.

References Powered by Scopus

GENIA corpus - A semantically annotated corpus for bio-textmining

1028Citations
N/AReaders
Get full text

The protein information resource

378Citations
N/AReaders
Get full text

Accomplishments and challenges in literature data mining for biology

223Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Concept annotation in the CRAFT corpus

210Citations
N/AReaders
Get full text

High-performance gene name normalization with GeNo

92Citations
N/AReaders
Get full text

BioCreative III interactive task: An overview

71Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Mani, I., Hu, Z., Jang, S. B., Samuel, K., Krause, M., Phillips, J., & Wu, C. H. (2005). Protein name tagging guidelines: Lessons learned. In Comparative and Functional Genomics (Vol. 6, pp. 72–76). https://doi.org/10.1002/cfg.452

Readers over time

‘09‘10‘11‘12‘13‘14‘15‘16‘18‘20‘2202468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 8

42%

Researcher 6

32%

Professor / Associate Prof. 4

21%

Lecturer / Post doc 1

5%

Readers' Discipline

Tooltip

Computer Science 9

60%

Agricultural and Biological Sciences 3

20%

Medicine and Dentistry 2

13%

Chemistry 1

7%

Save time finding and organizing research with Mendeley

Sign up for free
0