CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis

Gabriel Mejía; Natasha Bloch; Pablo Arbelaez

Conference Proceedings

CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13574 LNCS 68-77

DOI: 10.1007/978-3-031-17266-3_7

0Citations

3Readers

Get full text

Abstract

Automatic cancer diagnosis based on RNA-Seq profiles is at the intersection of transcriptome analysis and machine learning. Methods developed for this task could be a valuable support in clinical practice and provide insights into the cancer causal mechanisms. To correctly approach this problem, the largest existing resource (The Cancer Genome Atlas) must be complemented with healthy tissue samples from the Genotype-Tissue Expression project. In this work, we empirically prove that previous approaches to joining these databases suffer from translation biases and correct them using batch z-score normalization. Moreover, we propose CanDLE, a multinomial logistic regression model that achieves state of the art performance in multilabel cancer/healthy tissue type classification (94.1 % balanced accuracy) and all-vs-one cancer type detection (78.0 % average max F1 ).

Author supplied keywords

Cite

CITATION STYLE

APA

Mejía, G., Bloch, N., & Arbelaez, P. (2022). CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13574 LNCS, pp. 68–77). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-17266-3_7

CanDLE: Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis

Abstract

Author supplied keywords

Cite

Register to see more suggestions