#
Setup SLURM Server
The following section has to be done on the head node.
Swap to the head node (unless already on it).
Install munge.
yum install -y munge munge-libs perl-Switch numactl
Install the SLURM packages.
yum install -y flight-slurm flight-slurm-slurmctld flight-slurm-devel flight-slurm-perlapi flight-slurm-torque flight-slurm-slurmd flight-slurm-example-configs flight-slurm-libpmi
Open the file
/opt/flight/opt/slurm/etc/slurm.conf
and add this information:ControlMachine=chead1 SlurmUser=nobody SlurmctldPort=6827 SlurmdPort=6828 AuthType=auth/munge StateSaveLocation=/opt/flight/opt/slurm/var/spool/slurm.state SlurmdSpoolDir=/opt/flight/opt/slurm/var/spool/slurmd.spool SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/opt/flight/opt/slurm/var/run/slurmctld.pid ClusterName=mycluster1 SlurmdPidFile=/opt/flight/opt/slurm/var/run/slurmd.pid ProctrackType=proctrack/pgid ReturnToService=2 SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CORE_Memory SlurmctldDebug=3 SlurmctldLogFile=/opt/flight/opt/slurm/var/log/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/opt/flight/opt/slurm/var/log/slurmd.log JobCompType=jobcomp/none NodeName=cnode[01-02] PartitionName=all Nodes=ALL Default=YES MaxTime=UNLIMITED
Create directories for SLURM.
mkdir -p /opt/flight/opt/slurm/var/{log,run,spool/slurm.state}
Set the owner of the directories.
chown -R nobody: /opt/flight/opt/slurm/var/{log,run,spool}
Generate a random 64 digit alphanumeric string to be used as a munge key.
tr -dc A-Za-z0-9 </dev/urandom | head -c 64 ; echo ''
Copy the 64 digit code from the terminal, navigate to the file
/etc/munge/munge.key
, then paste it in.Set the owner of the munge key.
chown munge: /etc/munge/munge.key
Set permissions on the munge key.
chmod 400 /etc/munge/munge.key
Start and enable munge and SLURM.
systemctl start munge systemctl enable munge systemctl start flight-slurmctld systemctl enable flight-slurmctld
Now the SLURM server has been set up, the SLURM clients need to be set up on the other nodes.
#
Testing
If all was successful, then the following should be the case on the head node:
The command
systemctl status munge
shows the service as active with no errors.The command
systemctl status flight-slurmctld
shows the service as active with no errors.The file
/opt/flight/opt/slurm/etc/slurm.conf
has all the necessary options set. An example file is given in the instructions.The
/opt/flight/opt/slurm/var
directory exists, and contains these three directories:[root@chead1 (mycluster1) ~]# ls /opt/flight/opt/slurm/var/ log run spool
The
/opt/flight/opt/slurm/var
directory has these permissions:[root@chead1 (mycluster1) ~]# ls -l /opt/flight/opt/slurm/var/ total 0 drwxr-xr-x. 2 nobody nobody 27 Sep 20 14:22 log drwxr-xr-x. 2 nobody nobody 27 Sep 20 14:22 run drwxr-xr-x. 3 nobody nobody 25 Sep 20 14:11 spool
The munge key (
/etc/munge/munge.key
) should be the same on all nodes.The munge key (
/etc/munge/munge.key
) should have these permissions:[root@chead1 (mycluster1) ~]# ls -l /etc/munge/munge.key -r--------. 1 munge munge 65 Sep 20 14:18 /etc/munge/munge.key