Parallel R - Data Analysis in the Distributed World

  • McCallum Q
  • Weston S
N/ACitations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You’ll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don’t. With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier. Snow: works well in a traditional cluster environment Multicore: popular for multiprocessor and multicore computers Parallel: part of the upcoming R 2.14.0 release R+Hadoop: provides low-level access to a popular form of cluster computing RHIPE: uses Hadoop’s power with R’s language and interactive shell Segue: lets you use Elastic MapReduce as a backend for lapply-style operations

Cite

CITATION STYLE

APA

McCallum, Q. E., & Weston, S. (2011). Parallel R - Data Analysis in the Distributed World. Chemistry & (p. 122). Retrieved from http://shop.oreilly.com/product/0636920021421.do

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free