Improving document prioritization for protein-protein interaction extraction using shallow linguistics and word embeddings

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Understanding of biological processes, associated to disease or pharmacological action for example, requires the analysis of large amounts of interconnected information. Protein interaction networks form part of this puzzle, and extracting this information from the scientific literature is an important but challenging task. In this work, we present a supervised classification approach for identifying and ranking literature documents that contain information regarding protein interactions. We studied the use of word embedding together with simple chunking features, and show that the combination of these features with baseline bag-of-words can lead to similar or even improved results when compared to the use of features based on deep linguistic parsing. When applied to the BioCreative III Article Classification Task dataset, our approach achieves an area under the precision-recall curve of 0.70 and a Matthew’s correlation coefficient of 0.56.

Cite

CITATION STYLE

APA

Matos, S. (2017). Improving document prioritization for protein-protein interaction extraction using shallow linguistics and word embeddings. In Advances in Intelligent Systems and Computing (Vol. 616, pp. 43–49). Springer Verlag. https://doi.org/10.1007/978-3-319-60816-7_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free