Discovering multi-level structures in bio-molecular data through the Bernstein inequality

Alberto Bertoni; Giorgio Valentini

Journal ArticleOPEN ACCESS

Discovering multi-level structures in bio-molecular data through the Bernstein inequality

BMC Bioinformatics (2008) 9(SUPPL. 2)

DOI: 10.1186/1471-2105-9-S2-S4

19Citations

17Readers

Abstract

Background: The unsupervised discovery of structures (i.e. clusterings) underlying data is a central issue in several branches of bioinformatics. Methods based on the concept of stability have been recently proposed to assess the reliability of a clustering procedure and to estimate the "optimal" number of clusters in bio-molecular data. A major problem with stability-based methods is the detection of multi-level structures (e.g. hierarchical functional classes of genes), and the assessment of their statistical significance. In this context, a chi-square based statistical test of hypothesis has been proposed; however, to assure the correctness of this technique some assumptions about the distribution of the data are needed. Results: To assess the statistical significance and to discover multi-level structures in bio-molecular data, a new method based on Bernstein's inequality is proposed. This approach makes no assumptions about the distribution of the data, thus assuring a reliable application to a large range of bioinformatics problems. Results with synthetic and DNA microarray data show the effectiveness of the proposed method. Conclusions: The Bernstein test, due to its loose assumptions, is more sensitive than the chi-square test to the detection of multiple structures simultaneously present in the data. Nevertheless it is less selective, that is subject to more false positives, but adding independence assumptions, a more selective variant of the Bernstein inequality-based test is also presented. The proposed methods can be applied to discover multiple structures and to assess their significance in different types of bio-molecular data. © 2008 Bertoni and Valentini; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Bertoni, A., & Valentini, G. (2008). Discovering multi-level structures in bio-molecular data through the Bernstein inequality. BMC Bioinformatics, 9(SUPPL. 2). https://doi.org/10.1186/1471-2105-9-S2-S4

Discovering multi-level structures in bio-molecular data through the Bernstein inequality

Abstract

Cite

Register to see more suggestions