dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations

16Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.

Abstract

I introduce an open-source R package ‘dcGOR’ to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

Figures

  • Table 1. A summary of ontologies, infrastructures and functions included in dcGOR.
  • Figure 1. Domain-based enrichment analysis using GOBP terms. Only the most significant 5 terms/nodes (outlined in black; explained in the bottom-right panel) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values. doi:10.1371/journal.pcbi.1003929.g001
  • Figure 2. In-depth analysis for network-level understanding. (A) Heatmap visualisation of the semantic similarity between pairs of domains according to their annotations by Disease Ontology (DO). (B) Network representation of the pairwise domain semantic similarity. It is a weighted and undirected network, with edge thickness indicating semantic similarity between a pair of domains/nodes. Nodes are labeled by both numeric id and textual description. (C) A table listing GOMF terms and their annotated domains (used as domain seeds for random walk with restart, RWR). Notably, terms used here are only those with at least 3 annotatable domains that are also in the domain network (see Figure 2B). (D) Contact (statistical significance) network between GOMF terms in Figure 2C, as estimated by RWR on the domain network in Figure 2B. Only those significant contacts/ edges (adjusted p-values,0.1) are shown, with thickness indicating the contact strength (z-score). doi:10.1371/journal.pcbi.1003929.g002
  • Figure 3. Enrichment analysis of promiscuous Pfam domains using GOBP terms (left) and GOMF terms (right). Only the most significant terms/nodes (adjusted p-values,0.05; outlined in black) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values. doi:10.1371/journal.pcbi.1003929.g003
  • Figure 4. Heatmap visualisation of the GO overall semantic similarity between pairs of promiscuous Pfam domains. Domains are ordered according to hierarchical clustering by the package ‘supraHex’. doi:10.1371/journal.pcbi.1003929.g004

References Powered by Scopus

Gene ontology: Tool for the unification of biology

32484Citations
N/AReaders
Get full text

SCOP: A structural classification of proteins database for the investigation of sequences and structures

5733Citations
N/AReaders
Get full text

The Pfam protein families database

3061Citations
N/AReaders
Get full text

Cited by Powered by Scopus

CTCF-Mediated Chromatin Loops between Promoter and Gene Body Regulate Alternative Splicing across Individuals

67Citations
N/AReaders
Get full text

Horizontal gene transfer in human-associated microorganisms inferred by phylogenetic reconstruction and reconciliation

52Citations
N/AReaders
Get full text

The Disease Ontology: fostering interoperability between biological and clinical human disease-related data

48Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Fang, H. (2014). dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations. PLoS Computational Biology, 10(10). https://doi.org/10.1371/journal.pcbi.1003929

Readers over time

‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘2405101520

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 35

50%

Researcher 27

39%

Professor / Associate Prof. 7

10%

Lecturer / Post doc 1

1%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 31

45%

Biochemistry, Genetics and Molecular Bi... 24

35%

Computer Science 12

17%

Engineering 2

3%

Article Metrics

Tooltip
Mentions
References: 1

Save time finding and organizing research with Mendeley

Sign up for free
0