Non-exclusive Scheduling in Torque/Maui

Torque (PBS), the cluster resource manager, defaults to allocating any single compute server to only one job at a time. This makes sense most of the time, especially for parallel jobs which use MPI or another message-passing routine to handle communication between processes on different servers. If a server were shared by multiple jobs, contention for access to the network card could slow down all jobs by quite a lot. However, when the jobs are single-threaded or embarrassingly parallel, this can result in a lot of wasted resources: for example, a single-threaded job tying up an eight-core server.

Torque, and the associated scheduler Maui, can be configured to allow non-exclusive allocation so that single-threaded jobs can share a compute node. But while it’s pretty simple to accomplish (two config settings), it’s very hard to find any documentation on how to do it! As I discovered last week. So I’ll share my research.

First: make sure that the nodes file in TORQUE_HOME/server_priv/ specifies the number of virtual processors on each node. For example,


node1 np=8
node2 np=8
node3 np=4

np can be the physical number of cores, but doesn’t have to be. Second, make sure that your MAUI_HOME/maui.cfg includes the following line:

NODEACCESSPOLICY SHARED

That’s it! One thing to note, however, is that Torque’s allocation policy always starts at the first available slot. So if you submit a nonexclusive job with 2 independent threads, and there are 4 servers available, both threads will still run on the same server. This differs from Grid Engine, which defaults to non-exclusive scheduling (and it’s a pain to make it exclusive!), but defaults to a round-robin policy.

Trivial, I know, but you would not believe how hard it was to figure it out. This is why documentation is important!

Questions, comments, interesting anecdotes? Tweet to me at @ajdecon, or send me an email at ajdecon@ajdecon.org.