A variant by any name: Quantifying annotation discordance across tools and clinical databases

Jennifer L. Yen; Sarah Garcia; Aldrin Montana; Jason Harris; Stephen Chervitz; Massimo Morra; John West; Richard Chen; Deanna M. Church

Journal ArticleOPEN ACCESS

A variant by any name: Quantifying annotation discordance across tools and clinical databases

Genome Medicine (2017) 9(1)

DOI: 10.1186/s13073-016-0396-7

48Citations

134Readers

Abstract

Background: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. Methods: We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. Results: We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90% for coding and 50 and 70% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88% and less than 15% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. Conclusions: Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.

Author supplied keywords

Cite

CITATION STYLE

APA

Yen, J. L., Garcia, S., Montana, A., Harris, J., Chervitz, S., Morra, M., … Church, D. M. (2017). A variant by any name: Quantifying annotation discordance across tools and clinical databases. Genome Medicine, 9(1). https://doi.org/10.1186/s13073-016-0396-7

A variant by any name: Quantifying annotation discordance across tools and clinical databases

Abstract

Author supplied keywords

Cite

Register to see more suggestions