Use of morphological analysis in protein name recognition

Kaoru Yamamoto; Taku Kudo; Akihiko Konagaya; Yuji Matsumoto

Journal ArticleOPEN ACCESS

Use of morphological analysis in protein name recognition

Journal of Biomedical Informatics (2004) 37(6) 471-482

DOI: 10.1016/j.jbi.2004.08.001

14Citations

21Readers

Abstract

Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time. © 2004 Elsevier Inc. All rights reserved.

Author supplied keywords

Cite

CITATION STYLE

APA

Yamamoto, K., Kudo, T., Konagaya, A., & Matsumoto, Y. (2004). Use of morphological analysis in protein name recognition. Journal of Biomedical Informatics, 37(6), 471–482. https://doi.org/10.1016/j.jbi.2004.08.001

Use of morphological analysis in protein name recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions