Abstract
Google and its competitors have created a new class of large- scale computer systems to support Internet search. These “Data-Intensive Super Computing” (DI SC) systems differ from conventional supercomputers in their focus on data: they acquire and main tain continually changing data sets, in addition to performing large-scale computations over the d ata. With the massive amounts of data arising from such diverse sources as telescope imagery, med ical records, online transaction records, and web pages, DISC systems have the potential to achieve maj or advances in science, health care, business efficiencies, and information access. DISC opens up many important research topics in system design, resource management, programming models, p arallel algorithms, and applications. By engaging the academic research community in these issues , we can more systematically and in a more open forum explore fundamental aspects of a societall y important style of computing. When a teenage boy wants to find information about his idol by using Google with the search query “Britney Spears,” he unleashes the power of several hundred processors operating on a data set of over 200 terabytes. Why then can’t a scientist seeking a cure for cancer invoke large amounts of computation over a terabyte-sized database of DNA microarray data at the click of a button? Recent papers on parallel programming by researchers at Google [13] and Microsoft [19] present the results of using up to 1800 processors to performcomputations accessing up to 10 terabytes of data. How can university researchers demonstrate the credibility of their work without having comparable computing facilities available?
Author supplied keywords
Cite
CITATION STYLE
Bryant, R. E. (2007). Data-intensive supercomputing: The case for DISC. SC, (CMU-CS-07-128), 1–22. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.9328&rep=rep1&type=pdf
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.