Text mining research is becoming an important topic in biology with the aim to extract biological entities from scientific papers in order to extend the biological knowledge. However, few thorough studies are developed for plant molecular biology data, especially rice, thus resulting a lack of datasets available to exploit advanced machine learning methods able to detect entities such as genes and proteins. In this article, we first developed a dataset from the Ozyzabase - a database of rice gene, and used it as the benchmark. Then, we evaluated the performance of two Name Entities Recognition (NER) methods for sequence tagging: a Long Short Term Memory (LSTM) model, combined with Conditional Random Fields (CRFs), and a hybrid method based on the dictionary lookup combining with some machine learning systems to improve result. We analyzed the performance of these methods when apply to the Oryzabase dataset and improved the results. On average, the result from LSTM-CRF reaching 86% in F1 is more exploitable.
CITATION STYLE
Do, H., Than, K., & Larmande, P. (2018). Evaluating named-entity recognition approaches in plant molecular biology. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11248 LNAI, pp. 219–225). Springer Verlag. https://doi.org/10.1007/978-3-030-03014-8_19
Mendeley helps you to discover research relevant for your work.