Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

Nikhil R. Pal; Kripamoy Aguan; Animesh Sharma; Shun Ichi Amari

Journal ArticleOPEN ACCESS

Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

BMC Bioinformatics (2007) 8

DOI: 10.1186/1471-2105-8-5

57Citations

91Readers

Abstract

Background: The four heterogeneous childhood cancers, neuroblastoma, non-Hodgkin lymphoma, rhabdomyosarcoma, and Ewing sarcoma present a similar histology of small round blue cell tumor (SRBCT) and thus often leads to misdiagnosis. Identification of biomarkers for distinguishing these cancers is a well studied problem. Existing methods typically evaluate each gene separately and do not take into account the nonlinear interaction between genes and the tools that are used to design the diagnostic prediction system. Consequently, more genes are usually identified as necessary for prediction. We propose a general scheme for finding a small set of biomarkers to design a diagnostic system for accurate classification of the cancer subgroups. We use multilayer networks with online gene selection ability and relational fuzzy clustering to identify a small set of biomarkers for accurate classification of the training and blind test cases of a well studied data set. Results: Our method discerned just seven biomarkers that precisely categorized the four subgroups of cancer both in training and blind samples. For the same problem, others suggested 19-94 genes. These seven biomarkers include three novel genes (NAB2, LSP1 and EHD1 - not identified by others) with distinct class-specific signatures and important role in cancer biology, including cellular proliferation, transendothelial migration and trafficking of MHC class antigens. Interestingly, NAB2 is downregulated in other tumors including Non-Hodgkin lymphoma and Neuroblastoma but we observed moderate to high upregulation in a few cases of Ewing sarcoma and Rabhdomyosarcoma, suggesting that NAB2 might be mutated in these tumors. These genes can discover the subgroups correctly with unsupervised learning, can differentiate non-SRBCT samples and they perform equally well with other machine learning tools including support vector machines. These biomarkers lead to four simple human interpretable rules for the diagnostic task. Conclusion: Although the proposed method is tested on a SRBCT data set, it is quite general and can be applied to other cancer data sets. Our scheme takes into account the interaction between genes as well as that between genes and the tool and thus is able find a very small set and can discover novel genes. Our findings suggest the possibility of developing specialized microarray chips or use of real-time qPCR assays or antibody based methods such as ELISA and western blot analysis for an easy and low cost diagnosis of the subgroups. © 2007 Pal et al; licensee BioMed Central Ltd.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Pal, N. R., Aguan, K., Sharma, A., & Amari, S. I. (2007). Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering. BMC Bioinformatics, 8. https://doi.org/10.1186/1471-2105-8-5

Readers' Seniority

PhD / Post grad / Masters / Doc 42

67%

Researcher 16

25%

Professor / Associate Prof. 5

Readers' Discipline

Medicine and Dentistry 18

35%

Agricultural and Biological Sciences 17

33%

Computer Science 9

17%

Engineering 8

15%

Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

Abstract

References Powered by Scopus

Bagging predictors

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring

Cited by Powered by Scopus

Whole-exome sequencing identifies a recurrent NAB2-STAT6 fusion in solitary fibrous tumors

Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

An introduction to artificial neural networks in bioinformatics - Application to complex microarray and mass spectrometry datasets in cancer studies

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline