Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach

Matthias Hartung; Roman Klinger; Matthias Zwick; Philipp Cimiano

Conference ProceedingsOPEN ACCESS

Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2014) 118-127

DOI: 10.3115/v1/w14-3418

2Citations

73Readers

Abstract

Retrieving information about highly ambiguous gene/protein homonyms is a challenge, in particular where their non-protein meanings are more frequent than their protein meaning (e. g., SAH or HF). Due to their limited coverage in common benchmarking data sets, the performance of existing gene/protein recognition tools on these problematic cases is hard to assess. We uniformly sample a corpus of eight ambiguous gene/protein abbreviations from MEDLINEr and provide manual annotations for each mention of these abbreviations.1 Based on this resource, we show that available gene recognition tools such as conditional random fields (CRF) trained on BioCreative 2 NER data or GNAT tend to underperform on this phenomenon. We propose to extend existing gene recognition approaches by combining a CRF and a support vector machine. In a cross-entity evaluation and without taking any entity-specific information into account, our model achieves a gain of 6 points F1-Measure over our best baseline which checks for the occurrence of a long form of the abbreviation and more than 9 points over all existing tools investigated.

Cite

CITATION STYLE

APA

Hartung, M., Klinger, R., Zwick, M., & Cimiano, P. (2014). Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 118–127). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3418

Towards Gene Recognition from Rare and Ambiguous Abbreviations using a Filtering Approach

Abstract

Cite

Register to see more suggestions