Evaluating named-entity recognition approaches in plant molecular biology

Huy Do; Khoat Than; Pierre Larmande

Conference Proceedings

Evaluating named-entity recognition approaches in plant molecular biology

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11248 LNAI 219-225

DOI: 10.1007/978-3-030-03014-8_19

4Citations

16Readers

Get full text

Abstract

Text mining research is becoming an important topic in biology with the aim to extract biological entities from scientific papers in order to extend the biological knowledge. However, few thorough studies are developed for plant molecular biology data, especially rice, thus resulting a lack of datasets available to exploit advanced machine learning methods able to detect entities such as genes and proteins. In this article, we first developed a dataset from the Ozyzabase - a database of rice gene, and used it as the benchmark. Then, we evaluated the performance of two Name Entities Recognition (NER) methods for sequence tagging: a Long Short Term Memory (LSTM) model, combined with Conditional Random Fields (CRFs), and a hybrid method based on the dictionary lookup combining with some machine learning systems to improve result. We analyzed the performance of these methods when apply to the Oryzabase dataset and improved the results. On average, the result from LSTM-CRF reaching 86% in F1 is more exploitable.

Author supplied keywords

Cite

CITATION STYLE

APA

Do, H., Than, K., & Larmande, P. (2018). Evaluating named-entity recognition approaches in plant molecular biology. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11248 LNAI, pp. 219–225). Springer Verlag. https://doi.org/10.1007/978-3-030-03014-8_19

Evaluating named-entity recognition approaches in plant molecular biology

Abstract

Author supplied keywords

Cite

Register to see more suggestions