Parallel R - Data Analysis in the Distributed World

Q. Ethan McCallum; Stephen Weston

Book

Parallel R - Data Analysis in the Distributed World

McCallum Q
Weston S

(2011), 122

N/ACitations

18Readers

Abstract

It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You’ll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don’t. With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier. Snow: works well in a traditional cluster environment Multicore: popular for multiprocessor and multicore computers Parallel: part of the upcoming R 2.14.0 release R+Hadoop: provides low-level access to a popular form of cluster computing RHIPE: uses Hadoop’s power with R’s language and interactive shell Segue: lets you use Elastic MapReduce as a backend for lapply-style operations

Cite

CITATION STYLE

APA

McCallum, Q. E., & Weston, S. (2011). Parallel R - Data Analysis in the Distributed World. Chemistry & (p. 122). Retrieved from http://shop.oreilly.com/product/0636920021421.do

Parallel R - Data Analysis in the Distributed World

Abstract

Cite

Register to see more suggestions