Rule-based protein term identification with help from automatic species tagging

Wang Xinglong

Conference Proceedings

Rule-based protein term identification with help from automatic species tagging

Xinglong W

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4394 LNCS 288-298

DOI: 10.1007/978-3-540-70939-8_26

3Citations

14Readers

Get full text

Abstract

In biomedical articles, terms often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions described in literature, which only work on gene/protein mentions on a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared the performance of our automatic system to that of human annotators, with very promising results. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Xinglong, W. (2007). Rule-based protein term identification with help from automatic species tagging. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4394 LNCS, pp. 288–298). https://doi.org/10.1007/978-3-540-70939-8_26

Rule-based protein term identification with help from automatic species tagging

Abstract

Cite

Register to see more suggestions