Pilot factory - A Condor-based system for scalable Pilot Job generation in the Panda WMS framework

8Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The Panda Workload Management System is designed around the concept of the Pilot Job - a "smart wrapper" for the payload executable that can probe the environment on the remote worker node before pulling down the payload from the server and executing it. Such design allows for improved logging and monitoring capabilities as well as flexibility in Workload Management. In the Grid environment (such as the Open Science Grid), Panda Pilot Jobs are submitted to remote sites via mechanisms that ultimately rely on Condor-G. As our experience has shown, in cases where a large number of Panda jobs are simultaneously routed to a particular remote site, the increased load on the head node of the cluster, which is caused by the Pilot Job submission, may lead to overall lack of scalability. We have developed a Condor-inspired solution to this problem, which is using the schedd-based glidein, whose mission is to redirect pilots to the native batch system. Once a glidein schedd is installed and running, it can be utilized exactly the same way as local schedds and therefore, from the user's perspective, Pilots thus submitted are quite similar to jobs submitted to the local Condor pool. © 2010 IOP Publishing Ltd.

Cite

CITATION STYLE

APA

Chiu, P. H., & Potekhin, M. (2010). Pilot factory - A Condor-based system for scalable Pilot Job generation in the Panda WMS framework. In Journal of Physics: Conference Series (Vol. 219). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/219/6/062041

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free