Abstract
Background: Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination (p≫ n). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects. Results: We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case–control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis. Conclusions: In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.
Author supplied keywords
Cite
CITATION STYLE
Verplaetse, N., Passemiers, A., Arany, A., Moreau, Y., & Raimondi, D. (2023). Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease. Genome Biology, 24(1). https://doi.org/10.1186/s13059-023-03064-y
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.