Difference between revisions of "Administration tips and tricks"
Line 18: | Line 18: | ||
In the case of GPS, you can define "user callbacks" which are triggered ''locally on each node'' at certain events. Creating such a callback (using <code>systemd</code> and <code>slurmd</code> as an example): | In the case of GPS, you can define "user callbacks" which are triggered ''locally on each node'' at certain events. Creating such a callback (using <code>systemd</code> and <code>slurmd</code> as an example): | ||
− | mmaddcallback YourNameOfCB --command '''/bin/systemctl''' --parms "'''start slurmd'''" --event startup -N all,my,compute,node,classes,as,defined,in,GPFS | + | mmaddcallback YourNameOfCB --command '''/bin/systemctl''' --parms "'''start slurmd'''" --event ''startup'' -N all,my,compute,node,classes,as,defined,in,GPFS |
The event ''startup'' is in fact GPFS's "local full readiness" state. The callback will thus be called on each node ''only after'' it has completed all GPFS joining and mounting stuff, then in effect running | The event ''startup'' is in fact GPFS's "local full readiness" state. The callback will thus be called on each node ''only after'' it has completed all GPFS joining and mounting stuff, then in effect running |
Revision as of 16:21, 1 October 2019
Mutual dependencies of services
- Problem
After reboot or power cycle/failure, the local compute nodes' scheduler daemon is started too early: the global filesystem is not ready yet and the first job fails on those nodes.
When the local scheduler daemon starts, most likely it will report the node as "ready to receive jobs" to its master daemon. If the mounts of remote filesystems are initiated, but not finished yet, the first job(s) will fail due to missing directories and files.
You could now write node-local checker scripts trying to read or write on mount points, with all bells and whistles like using timeout ... touch /mount/point/tmp/$(uname -n).checker
.
Or you could write fine-grained systemd dependencies (with PathExists=
or DirectoryNotEmpty=
.
All these will fail inevitably, if the shared filesystem takes longer than expected to get operational.
- Suggestion
Try to "turn the tables" and check whether your shared filesystem supports any kind of "Now, I am really ready and operational" callback or signal. Then, have your shared filesystem start up your local scheduler daemon--when all is ready.
In the case of GPS, you can define "user callbacks" which are triggered locally on each node at certain events. Creating such a callback (using systemd
and slurmd
as an example):
mmaddcallback YourNameOfCB --command /bin/systemctl --parms "start slurmd" --event startup -N all,my,compute,node,classes,as,defined,in,GPFS
The event startup is in fact GPFS's "local full readiness" state. The callback will thus be called on each node only after it has completed all GPFS joining and mounting stuff, then in effect running
/bin/systemctl start slurmd
On your nodes, simply disable the systemd unit of your local scheduler's daemon:
systemctl disable slurmd
and watch the next reboot for the orderly coming up of GPFS, followed by slurmd
.