Abstract
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.
Author supplied keywords
Cite
CITATION STYLE
Caron, B., Luo, Y., & Rausell, A. (2019). NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans. Genome Biology, 20(1). https://doi.org/10.1186/s13059-019-1634-2
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.