In the 10 years since the Quest high-performance computing cluster was first introduced, usage of computational services has continued to grow and expand across Northwestern, including more research areas with diverse computing needs. To keep up with the evolving demands and requirements for computing infrastructure, changes needed to be made to Quest.
After careful consideration, the teams in Northwestern IT that support Quest decided to move to a “fair share” scheduling policy. With fair share, as you run more jobs on the cluster, your priority to run jobs is scaled down and gradually recovers over time. This new scheduling policy enables us to remove hard limits on compute hours for basic (free) allocations, meaning researchers are not working from a fixed pool of resources and cannot run out of compute hours. Equitable usage is also ensured by the fair share scheduling policy.
The scheduler on Quest was also changed from Moab to Slurm on May 1, 2019. I’m excited about this change, as Slurm is used by a majority of academic institutions, with good reason. We decided to move to Slurm after a thorough review of schedulers available. There are many benefits to Slurm, including the following:
- Since Slurm is more widely used, transitions for our faculty, students, and postdocs to and from Northwestern should be easier.
- Slurm provides improved features and flexibility that better position Quest to fulfill Northwestern’s growing and adapting researcher needs, including flexibility in allocation models, tools to allow more jobs to land on the cluster, fewer lost jobs, and even cloud integration.
- There will be less downtime and scheduler restarts, since Slurm allows us to make live configuration changes.
- Improved resiliency, reliability, and responsiveness to better support the thousands of jobs submitted.
- Support available from a wide community and experts, such as SchedMD.
While the moves to fair share and Slurm introduced several changes for both the researchers using Quest and those who support it, the result is that we are better positioned to improve and sustain the productivity of the Northwestern research community.
I am grateful for the team of Northwestern IT staff who worked diligently on this project and ensured the smoothest possible transition to the new scheduler through new documentation, numerous trainings, and thoughtful planning. Our Northwestern research community also actively participated in preparations for this change, providing us with initial feedback during the pilot cluster phase and using the Slurm testing cluster in impressive numbers to prepare for the changes.
Thank you to our research community and to the dedicated Northwestern IT staff who made this change a success! I am looking forward to providing our researchers with better tools to get their do their research in the years to come.