Job Submission
There are two basic ways to submit jobs to SLURM:
- Interactive - In this method you get an allocation of resources and then login to the node directly and use it interactivly
- Batch - In this method you submit you job with a job script defining all of the steps you want to accomplish. This method is best for mass quantities of jobs that you have already tested.
Interactive Job Submission
salloc
runs an interactive job on the cluster. You can request a shell to run
commands or submit a script or app interactively.
Example usage:
[smy190@ip-10-37-171-122 ~]$ salloc --nodes=1 --ntasks=1 --cpus-per-task=8 --time=1:00:00 --partition=g6-xl --gres=gpu:1 --mem=16GB --job-name=interactive-test
salloc: Granted job allocation 464
salloc: Waiting for resource configuration
Note: On the AWS Cluster, nodes have to be provisioned behind the scenes before the job can be dispatched. This can take ~5 min.
The command line arguments describe the job configuration, and are listed in detail on the slurm website.
See full salloc docs on the slurm website
Batch Job Submission
sbatch
submits a script to be run as a batch job on the cluster. This allocates
a job, similar to salloc, but you do not get a shell into the job and cannot
(easily) control the job once started.
Before running sbatch
, you must define a script. E.g. nano my_slurm.job
. Below is an example job script.
Job Script Example
Note: There are may useful SLURM environment variables. Consider using these in your job script.
#!/bin/bash
#SBATCH --job-name=my_job # Job name
#SBATCH --output=%x_%j.o # Output file (%x expands to Job name, %j expands to job ID)
#SBATCH --error=%x_%j.e # Error file
#SBATCH --ntasks=1 # Number of tasks (processes)
#SBATCH --cpus-per-task=4 # Number of CPU cores per task
#SBATCH --mem=4G # Total memory per node
#SBATCH --time=00:30:00 # Time limit (hh:mm:ss)
#SBATCH --partition=urcdtest-med # Partition name
# Load necessary modules
module load openmpi
# Run your application
python my_script.py
The #SBATCH
directives describe parameters used to allocate and configure the job.
E.g. #SBATCH --nodes=1
tells the scheduler to allocate one node for the job.
These are mostly identical to the command line options for salloc. output
and
error
list files where STDOUT and STDERR will be written, respectively. The
symbol %j
in the output and error file names will be replaced by the job
number and %x
will be replaced by job name.
Submit the job
sbatch my_slurm.job
Job Dependencies
Submit a job that starts only after another completes:
sbatch --dependency=afterok:JOBID my_dependent_job.slurm
This line schedules my_dependent_job.slurm to start only if JOBID finishes successfully.
Advanced Tips
- Resource Optimization: Adjust --cpus-per-task and --mem according to your job's requirements for optimal resource use.
- Array Jobs: Easily submit multiple similar jobs using job arrays with sbatch --array=0-9 my_slurm.job