Determining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a 'one-stop' framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants' pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact.
CITATION STYLE
Ge, F., Li, C., Iqbal, S., Muhammad, A., Li, F., Thafar, M. A., … Yu, D. J. (2023). VPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants. Briefings in Bioinformatics, 24(1). https://doi.org/10.1093/bib/bbac535
Mendeley helps you to discover research relevant for your work.