Large-scale applications that require executing very large numbers of tasks are only feasible through parallelism. In this work we present a system that automatically handles large numbers of experiments and data in the context of machine learning. Our system controls all experiments, including re-submission of failed jobs and relies on available resource managers to spawn jobs through pools of machines. Our results show that we can manage a very large number of experiments, using a reasonable amount of idle CPU cycles, with very little user intervention. © Springer-Verlag 2003.
CITATION STYLE
Dutra, I., Page, D., Costa, V. S., Shavlik, J., & Waddell, M. (2004). Toward automatic management of embarrassingly parallel applications. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2790, 509–516. https://doi.org/10.1007/978-3-540-45209-6_73
Mendeley helps you to discover research relevant for your work.