Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

13Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases. Results: Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%). Conclusions: Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics. © 2008 Gobeill et al.; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Gobeill, J., Tbahriti, I., Ehrler, F., Mottaz, A., Veuthey, A. L., & Ruch, P. (2008). Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction. In BMC Bioinformatics (Vol. 9). https://doi.org/10.1186/1471-2105-9-S3-S9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free