Genome-wide association studies (GWAS) have linked thousands of genetic variants to the susceptibility of many common human diseases. However, the genetic explanations of diseases are often heterogeneous, imposing a substantial challenge for GWAS. We propose a feature construction method using genetic algorithm (GA) to recognize the heterogeneous risk effects of different genetic variable groups. Multiple GA-based feature selection runs are used to collect an ensemble of the high-performing feature subsets. We generate a feature co-selection network from the ensemble, where nodes represent genetic variables and edges represent their co-selection frequencies. A new synthetic feature, namely community risk score (CRS), is created for each network community. CRS quantifies the risk of a community of variables and allows for more effective heterogeneity analysis. We applied our method to two colorectal cancer GWAS datasets, one for training and the other for validation. We ran the GA-based feature selection on the training dataset and constructed the co-selection network. CRS was then created for each community in the network. We identified three colorectal cancer subtypes using the CRSs and clustering algorithms on the validation dataset. The function enrichment analysis in our results further highlighted gastric cancer related genes, tumor suppressors and DNA methylation genes.
CITATION STYLE
Sha, Z., Chen, Y., & Hu, T. (2022). Genetic heterogeneity analysis using genetic algorithm and network science. In GECCO 2022 Companion - Proceedings of the 2022 Genetic and Evolutionary Computation Conference (pp. 763–766). Association for Computing Machinery, Inc. https://doi.org/10.1145/3520304.3529027
Mendeley helps you to discover research relevant for your work.