The National Scalable Cluster Project: Three Lessons about High Performance Data Mining and Data Intensive Computing

Robert Grossman; Robert Hollebeek

Book Chapter

The National Scalable Cluster Project: Three Lessons about High Performance Data Mining and Data Intensive Computing

Grossman R
Hollebeek R

DOI: 10.1007/978-1-4615-0005-6_23

N/ACitations

1Readers

Get full text

Abstract

We discuss three principles learned from experience with the National Scalable Cluster Project. Storing, managing and mining massive data requires systems that exploit parallelism. This can be achieved with shared-nothing clusters and careful attention to I/O paths. Also, exploiting data parallelism at the file and record level provides efficient mapping of data-intensive problems onto clusters and is particularly well suited to data mining. Finally, the repetitive nature of data mining demands special attention be given to data layout on the hardware and to software access patterns while maintaining a storage schema easily derived from the legacy form of the data.

Cite

CITATION STYLE

APA

Grossman, R., & Hollebeek, R. (2002). The National Scalable Cluster Project: Three Lessons about High Performance Data Mining and Data Intensive Computing (pp. 853–874). https://doi.org/10.1007/978-1-4615-0005-6_23

The National Scalable Cluster Project: Three Lessons about High Performance Data Mining and Data Intensive Computing

Abstract

Cite

Register to see more suggestions