Identification of chemical entities in patent documents

Tiago Grego; Piotr Pȩzik; Francisco M. Couto; Dietrich Rebholz-Schuhmann

Conference Proceedings

Identification of chemical entities in patent documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5518 LNCS(PART 2) 942-949

DOI: 10.1007/978-3-642-02481-8_144

16Citations

22Readers

Get full text

Abstract

Biomedical literature is an important source of information for chemical compounds. However, different representations and nomenclatures for chemical entities exist, which makes the reference of chemical entities ambiguous. Many systems already exist for gene and protein entity recognition, however very few exist for chemical entities. The main reason for this is the lack of corpus to train named entity recognition systems and perform evaluation. In this paper we present a chemical entity recognizer that uses a machine learning approach based on conditional random fields (CRF) and compare the performance with dictionary-based approaches using several terminological resources. For the training and evaluation, a gold standard of manually curated patent documents was used. While the dictionary-based systems perform well in partial identification of chemical entities, the machine learning approach performs better (10% increase in F-score in comparison to the best dictionary-based system) when identifying complete entities. © 2009 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Grego, T., Pȩzik, P., Couto, F. M., & Rebholz-Schuhmann, D. (2009). Identification of chemical entities in patent documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5518 LNCS, pp. 942–949). https://doi.org/10.1007/978-3-642-02481-8_144

Identification of chemical entities in patent documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions