Understanding the link between non-coding sequence variants, identified in genome-wide association studies, and the pathophysiology of complex diseases remains challenging due to a lack of annotations in non-coding regions. To overcome this, we developed DIVAN, a novel feature selection and ensemble learning framework, which identifies disease-specific risk variants by leveraging a comprehensive collection of genome-wide epigenomic profiles across cell types and factors, along with other static genomic features. DIVAN accurately and robustly recognizes non-coding disease-specific risk variants under multiple testing scenarios; among all the features, histone marks, especially those marks associated with repressed chromatin, are often more informative than others.
Chen, L., Jin, P., & Qin, Z. S. (2016). DIVAN: Accurate identification of non-coding disease-specific risk variants using multi-omics profiles. Genome Biology, 17(1). https://doi.org/10.1186/s13059-016-1112-z