GO for gene documents

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Annotating genes and their products with Gene Ontology codes is an important area of research. One approach is to use the information available about these genes in the biomedical literature. The goal in this paper, based on this approach, is to develop automatic annotation methods that can supplement the expensive manual annotation processes currently in place. Results: Using a set of Support Vector Machines (SVM) classifiers we were able to achieve Fscores of 0.49, 0.41 and 0.33 for codes of the molecular function, cellular component and biological process GO hierarchies respectively. We find that alternative term weighting strategies are not different from each other in performance and feature selection strategies reduce performance. The best thresholding strategy is one where a single threshold is picked for each hierarchy. Hierarchy level is important especially for molecular function and biological process. The cellular component hierarchy stands apart from the other two in many respects. This may be due to fundamental differences in link semantics. This research shows that it is possible to beneficially exploit the hierarchical structures by defining and testing a relaxed criteria for classification correctness. Finally it is possible to build classifiers for codes with very few associated documents but as expected a huge penalty is paid in performance. Conclusion: The GO annotation problem is complex. Several key observations have been made as for example about topic drift that may be important to consider in annotation strategies. © 2007 Srinivasan and Qiu; licensee BioMed Central Ltd.

References Powered by Scopus

Gene ontology: Tool for the unification of biology

32326Citations
N/AReaders
Get full text

Hierarchical classification of web content

680Citations
N/AReaders
Get full text

Pivoted document length normalization

638Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Supervised learning models to predict firm performance with annual reports: An empirical study

31Citations
N/AReaders
Get full text

A hybrid ontology-based information extraction system

26Citations
N/AReaders
Get full text

BioBERT-Based Model for COVID-Related Named Entity Recognition

1Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Srinivasan, P., & Qiu, X. Y. (2007). GO for gene documents. In BMC Bioinformatics (Vol. 8). https://doi.org/10.1186/1471-2105-8-S9-S3

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 5

56%

Professor / Associate Prof. 2

22%

Lecturer / Post doc 1

11%

Researcher 1

11%

Readers' Discipline

Tooltip

Medicine and Dentistry 4

44%

Computer Science 3

33%

Agricultural and Biological Sciences 1

11%

Engineering 1

11%

Save time finding and organizing research with Mendeley

Sign up for free