Policy Adaptation for Parallel Job Scheduling
Abstract: The parallel job scheduler is a key part of high-performanc computing resource management software. This scheduler uses a scheduling policy to help decide which jobs to start first. The choice of policy has an important impact on user-centric metrics, such as the average waiting time or average job slowdown.
In this presentation, we will show how to make this scheduling policy adaptive to the workload. Through an experimental study, we will answer the following questions:
- Is there a best scheduling policy in general?
- Can a good policy be selected automatically?
- Is simulation of the system helpful, when applicable?
- Can a policy be learned on the fly, in an online setting, and how?
We will describe a successful approach to policy adaptivity based on multi-armed bandits