Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

57Citations
Citations of this article
91Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: The four heterogeneous childhood cancers, neuroblastoma, non-Hodgkin lymphoma, rhabdomyosarcoma, and Ewing sarcoma present a similar histology of small round blue cell tumor (SRBCT) and thus often leads to misdiagnosis. Identification of biomarkers for distinguishing these cancers is a well studied problem. Existing methods typically evaluate each gene separately and do not take into account the nonlinear interaction between genes and the tools that are used to design the diagnostic prediction system. Consequently, more genes are usually identified as necessary for prediction. We propose a general scheme for finding a small set of biomarkers to design a diagnostic system for accurate classification of the cancer subgroups. We use multilayer networks with online gene selection ability and relational fuzzy clustering to identify a small set of biomarkers for accurate classification of the training and blind test cases of a well studied data set. Results: Our method discerned just seven biomarkers that precisely categorized the four subgroups of cancer both in training and blind samples. For the same problem, others suggested 19-94 genes. These seven biomarkers include three novel genes (NAB2, LSP1 and EHD1 - not identified by others) with distinct class-specific signatures and important role in cancer biology, including cellular proliferation, transendothelial migration and trafficking of MHC class antigens. Interestingly, NAB2 is downregulated in other tumors including Non-Hodgkin lymphoma and Neuroblastoma but we observed moderate to high upregulation in a few cases of Ewing sarcoma and Rabhdomyosarcoma, suggesting that NAB2 might be mutated in these tumors. These genes can discover the subgroups correctly with unsupervised learning, can differentiate non-SRBCT samples and they perform equally well with other machine learning tools including support vector machines. These biomarkers lead to four simple human interpretable rules for the diagnostic task. Conclusion: Although the proposed method is tested on a SRBCT data set, it is quite general and can be applied to other cancer data sets. Our scheme takes into account the interaction between genes as well as that between genes and the tool and thus is able find a very small set and can discover novel genes. Our findings suggest the possibility of developing specialized microarray chips or use of real-time qPCR assays or antibody based methods such as ELISA and western blot analysis for an easy and low cost diagnosis of the subgroups. © 2007 Pal et al; licensee BioMed Central Ltd.

References Powered by Scopus

Bagging predictors

19042Citations
N/AReaders
Get full text

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

13166Citations
N/AReaders
Get full text

Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring

9608Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Whole-exome sequencing identifies a recurrent NAB2-STAT6 fusion in solitary fibrous tumors

483Citations
N/AReaders
Get full text

Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

168Citations
N/AReaders
Get full text

An introduction to artificial neural networks in bioinformatics - Application to complex microarray and mass spectrometry datasets in cancer studies

156Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Pal, N. R., Aguan, K., Sharma, A., & Amari, S. I. (2007). Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering. BMC Bioinformatics, 8. https://doi.org/10.1186/1471-2105-8-5

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 42

67%

Researcher 16

25%

Professor / Associate Prof. 5

8%

Readers' Discipline

Tooltip

Medicine and Dentistry 18

35%

Agricultural and Biological Sciences 17

33%

Computer Science 9

17%

Engineering 8

15%

Save time finding and organizing research with Mendeley

Sign up for free