Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project

Dmitry Kolobkov; Satyarth Mishra Sharma; Aleksandr Medvedev; Mikhail Lebedev; Egor Kosaretskiy; Ruslan Vakhitov

Journal ArticleOPEN ACCESS

Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project

Frontiers in Big Data (2024) 7

DOI: 10.3389/fdata.2024.1266031

23Citations

39Readers

Get full text

Abstract

Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.

Author supplied keywords

Cite

CITATION STYLE

APA

Kolobkov, D., Mishra Sharma, S., Medvedev, A., Lebedev, M., Kosaretskiy, E., & Vakhitov, R. (2024). Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. Frontiers in Big Data, 7. https://doi.org/10.3389/fdata.2024.1266031

Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project

Abstract

Author supplied keywords

Cite

Register to see more suggestions