Improved prediction of smoking status via isoform-aware RNA-seq deep learning models

7Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.

References Powered by Scopus

Deep learning

64791Citations
N/AReaders
Get full text

STAR: Ultrafast universal RNA-seq aligner

30467Citations
N/AReaders
Get full text

Cluster analysis and display of genome-wide expression patterns

13639Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Targeted DNA methylation analysis and prediction of smoking habits in blood based on massively parallel sequencing

10Citations
N/AReaders
Get full text

A partial form of AIRE deficiency underlies a mild form of autoimmune polyendocrine syndrome type 1

6Citations
N/AReaders
Get full text

The effect of non-linear signal in classification problems using gene expression

5Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wang, Z., Masoomi, A., Xu, Z., Boueiz, A., Lee, S., Zhao, T., … Castaldi, P. J. (2021). Improved prediction of smoking status via isoform-aware RNA-seq deep learning models. PLoS Computational Biology, 17(10). https://doi.org/10.1371/journal.pcbi.1009433

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

50%

Researcher 2

50%

Readers' Discipline

Tooltip

Biochemistry, Genetics and Molecular Bi... 2

50%

Agricultural and Biological Sciences 1

25%

Business, Management and Accounting 1

25%

Article Metrics

Tooltip
Mentions
Blog Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free