ClinVar is a web platform that stores ∼789,000 genetic associations with complex diseases. A partial set of these cataloged genetic associations has challenged clinicians and geneticists, often lead-ing to conflicting interpretations or uncertain clinical impact significance. In this study, we addressed the (re)classification of genetic variants by AmazonForest, which is a random-forest-based pathogenic-ity metaprediction model that works by combining functional impact data from eight prediction tools. We evaluated the performance of representation learning algorithms such as autoencoders to propose a better strategy. All metaprediction models were trained with ClinVar data, and genetic variants were annotated with eight functional impact predictors cataloged with SnpEff/SnpSift. AmazonForest implements the best random forest model with a one hot data-encoding strategy, which shows an Area Under ROC Curve of ≥0.93. AmazonForest was employed for pathogenicity prediction of a set of ∼101,000 genetic variants of uncertain significance or conflict of interpretation. Our findings revealed ∼24,000 variants with high pathogenic probability (RFprob ≥ 0.9). In addition, we show results for Alzheimer’s Disease as a demonstration of its application in clinical interpretation of genetic variants in complex diseases. Lastly, AmazonForest is available as a web tool and R object that can be loaded to perform pathogenicity predictions.
CITATION STYLE
Palheta, H. G. A., Gonçalves, W. G., Brito, L. M., Dos Santos, A. R., Matsumoto, M. D. R., Ribeiro-Dos-santos, Â., & de Araújo, G. S. (2022). AmazonForest: In Silico Metaprediction of Pathogenic Variants. Biology, 11(4). https://doi.org/10.3390/biology11040538
Mendeley helps you to discover research relevant for your work.