Cross-Cancer Genome Analysis on Cancer Classification Using Both Unsupervised and Supervised Approaches

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many problems exist within the current cancer diagnosis pipeline, one of which is alarmingly high over-diagnosis rates in breast, prostate, and lung cancer. Through quantifying gene expression levels, next-generation sequencing techniques such as RNA-Seq offer an opportunity for researchers and clinicians to gain a more complete view of a cell’s transcriptome. With the adoption of this new data source, cross-cancer methods for cancer diagnosis have become more viable. We utilize mutual information in conjunction with a Gaussian mixture model and t-SNE to evaluate the separability of cancer and non-cancer tissue samples from RNA-Seq expression data. The Gaussian mixture and t-SNE combination produced clear clustering without supervision, suggesting the ability to separate tissue samples algorithmically. Afterwards, we use a collection of deep neural networks to classify tissue origin and status from tissue sample gene expressions. We use genes selected based on the prior mutual information technique. First, we select the top 500 genes from candidate genes without considerations for overlap in the predictability of those genes. We then applied Recursive Feature Elimination (RFE) to select 200 genes, thus accounting for covariation. We find that the performance using the top 500 genes is only slightly better than the 200 genes selected using RFE, and the two approaches achieved similar performance overall, indicating that only a small subset of genes is required for the identification of status and origin. This work indicates that RNA sequencing data is a useful tool for cross-cancer studies. Next steps include the implementation of a greater amount of non-cancer data from other datasets to decrease bias in model training.

Cite

CITATION STYLE

APA

Zhou, J., Chen, B., & Zhou, N. (2020). Cross-Cancer Genome Analysis on Cancer Classification Using Both Unsupervised and Supervised Approaches. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12402 LNCS, pp. 206–219). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59612-5_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free