Genome-wide SNP calling from genotyping by sequencing (GBS) data: A comparison of seven pipelines and two sequencing technologies

99Citations
Citations of this article
514Readers
Mendeley users who have this article in their library.

Abstract

Next-generation sequencing (NGS) has revolutionized plant and animal research in many ways including new methods of high throughput genotyping. Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. Herein we describe a comprehensive comparison of seven GBS bioinformatics pipelines developed to process raw GBS sequence data into SNP genotypes. We compared five pipelines requiring a reference genome (TASSEL-GBS v1 & v2, Stacks, IGST, and Fast-GBS) and two de novo pipelines that do not require a reference genome (UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called without a reference genome was lower (13k to 24k) than with a reference genome (25k to 54k SNPs) while accuracy was high (92.3 to 98.7%) for all but one pipeline (TASSEL-GBSv1, 76.1%). Among pipelines offering a high accuracy (>95%), Fast-GBS called the greatest number of polymorphisms (close to 35,000 SNPs + Indels) and yielded the highest accuracy (98.7%). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of TASSEL-GBSv2. It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (95.2 vs 91.1%). Typically, SNP catalogues called from the same sequencing data using different pipelines resulted in highly overlapping SNP catalogues (79-92% overlap). In contrast, overlap between SNP catalogues obtained using the same pipeline but different sequencing technologies was less extensive (∼50-70%).

Figures

  • Table 1. Number of SNPs and indels detected among 24 soybean lines using seven different bioinformatics pipelines on Illumina reads. The time and amount of memory needed to run each pipeline are also provided.
  • Table 2. Accuracy of GBS SNP data derived from Illumina platform using different bioinformatics pipeline.
  • Table 3. Degree of overlap among SNP loci called using Fast-GBS and six other bioinformatics pipelines
  • Fig 1. Venn diagram representing the degree of overlap among SNP loci called using seven bioinformatics pipelines. The percentages indicate the estimated accuracy for all groups of SNPs (unique or shared).
  • Fig 2. Systematic approach used to investigate the possible causes of unique inaccurate SNP calls.
  • Table 4. Number and characteristics of unique inaccurate SNPs called by different pipelines.
  • Table 5. Number of SNPs and indels detected among 24 soybean lines using Ion Torrent reads and two different bioinformatics pipelines
  • Table 6. Accuracy of SNP data derived using Ion Torrent reads and two different bioinformatics pipelines

References Powered by Scopus

41548Citations
21687Readers

This article is free to access.

This article is free to access.

This article is free to access.

Cited by Powered by Scopus

This article is free to access.

This article is free to access.

This article is free to access.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Torkamaneh, D., Laroche, J., & Belzile, F. (2016). Genome-wide SNP calling from genotyping by sequencing (GBS) data: A comparison of seven pipelines and two sequencing technologies. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0161333

Readers over time

‘16‘17‘18‘19‘20‘21‘22‘23‘24‘250306090120

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 248

67%

Researcher 97

26%

Professor / Associate Prof. 20

5%

Lecturer / Post doc 6

2%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 266

71%

Biochemistry, Genetics and Molecular Bi... 88

24%

Environmental Science 13

3%

Computer Science 7

2%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 20

Save time finding and organizing research with Mendeley

Sign up for free
0