Language model enables end-to-end accurate detection of cancer from cell-free DNA

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

We present a language model Affordable Cancer Interception and Diagnostics (ACID) that can achieve high classification performance in thediagnosisof cancerexclusivelyfromusingraw cfDNA sequencingreads.WeformulateACID as an autoregressivelanguagemodel. ACID is pretrained with language sentences that are obtained from concatenation of raw sequencing reads and diagnostic labels. We benchmark ACID against three methods. On testing set subjected to whole-genome sequencing, ACID significantly outperforms the best benchmarked method in diagnosis of cancer [Area Under the Receiver Operating Curve (AUROC), 0.924 versus 0.853; P< 0.001] and detection of hepatocellular carcinoma (AUROC, 0.981 versus 0.917; P <0.001). ACID can achieve high accuracy with just 10 000 reads per sample.Meanwhile,ACID achieves the best performance on testing sets that were subjected to bisulfite sequencing compared with benchmarked methods.In summary,we present an affordable,simple yet efficient end-to-end paradigm for cancer detection using raw cfDNA sequencing reads.

Cite

CITATION STYLE

APA

Shen, H., Liu, J., Chen, K., & Li, X. (2024). Language model enables end-to-end accurate detection of cancer from cell-free DNA. Briefings in Bioinformatics, 25(2). https://doi.org/10.1093/bib/bbae053

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free