A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

25Citations
Citations of this article
99Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. Methods: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. Results: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset. © 2012 Li et al.; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Li, L., Guo, Y., Wu, W., Shi, Y., Cheng, J., & Tao, S. (2012). A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Mining, 5(1). https://doi.org/10.1186/1756-0381-5-8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free