WhopGenome: High-speed access to whole-genome variation and sequence data in R

1Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files.

Cite

CITATION STYLE

APA

Wittelsbürger, U., Pfeifer, B., & Lercher, M. J. (2015). WhopGenome: High-speed access to whole-genome variation and sequence data in R. Bioinformatics, 31(3), 413–415. https://doi.org/10.1093/bioinformatics/btu636

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free