Scheduling high performance data mining tasks on a data grid environment

11Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Increasingly the datasets used for data mining are becoming huge and physically distributed. Since the distributed knowledge discovery process is bothdata and computational intensive, the Grid is a natural platform for deploying a high performance data mining service. The focus of this paper is on the core services of such a Grid infrastructure. In particular we concentrate our attention on the design and implementation of specialized broker aware of data source locations and resource needs of data mining tasks. Allocation and scheduling decisions are taken on the basis of performance cost metrics and models that exploit knowledge about previous executions, and use sampling to acquire estimate about execution behavior.

Cite

CITATION STYLE

APA

Orlando, S., Palmerini, P., Perego, R., & Silvestri, F. (2002). Scheduling high performance data mining tasks on a data grid environment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2400, pp. 375–384). Springer Verlag. https://doi.org/10.1007/3-540-45706-2_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free