Basic Slurm Commands
Salloc
Salloc runs an interactive job on the cluster. You can request a shell to run commands or submit a script or app interactively.
Example usage:
salloc --nodes=1 --ntasks=1 --cpus-per-task=8 --time=1:00:00 --partition=a100-xl --gres=gpu:1 --mem=16GB --job-name=interactive-test
The command line arguments describe the job configuration, and are listed in detail on the slurm website.
See full salloc docs on the slurm website
Sbatch
Sbatch submits a script to be run as a batch job on the cluster. This allocates a job, similar to salloc, but you do not get a shell into the job and cannot (easily) control the job once started.
Before running sbatch, you must define a script. E.g. nano myjob
and paste
the following text:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --tasks=1
#SBATCH --cores-per-task=8
#SBATCH --time=04:00:00
#SBATCH --partition=a100-xl
#SBATCH --mem=16GB
#SBATCH --job-name=test-sbatch
#SBATCH --output=myjob_%j.out
#SBATCH --error=myjob_%j.err
#SBATCH --gres=gpu:1
# do stuff here ...
The #SBATCH
lines describe paramters used to allocate and configure the job.
E.g. #SBATCH --nodes=1
tells the scheduler to allocate one node for the job.
These are mostly identical to the command line options for salloc. output
and
error
list files where STDOUT and STDERR will be written, respectively. The
symbol %j
in the output and error file names will be replaced by the job
number.
To submit the myjob
script you wrote, run the following command:
sbatch myjob
See full sbatch docs on the slurm website
Squeue
Squeue is used to see the slurm queue. This include jobs which are currently running, pending, configuring, etc. It will not show jobs which have completed or failed.
See full squeue docs on the slurm website
Example usage:
squeue
You can show jobs for a specific user with
squeue --user=$USER
#or, to show jobs for your own account
squeue --me
Sinfo
Sinfo shows partition and node information.
You can list partitions using:
sinfo
See full sinfo docs on the slurm website
Sacct
Sacct shows accounting and historical information for slurm jobs. Unlike squeue which only shows active jobs, sacct allows you to get information about jobs which have completed or failed.
E.g. get information about a specific job
sacct -j 1234
where 1234
is replaced by the job ID
Useful snippets
List detailed information about a job:
scontrol show jobid -dd $YOUR_JOBID_HERE
Cancel all running/queued jobs that you have submitted:
scancel -u `whoami`
See full sacct docs on the slurm website