Job Submission and Execution
Don’t use the #SBATCH --exclusive parameter as it is already included in the SLURM configuration.
Use the #SBATCH --mem= parameter on qfat only. On cpu_ queues, whole nodes are allocated.
Accelerated nodes (gpu_ queues) are divided each into eight parts with corresponding memory.
Introduction
Slurm workload manager is used to allocate and access Karolina’s, Barbora’s and Complementary systems’ resources.
A man page exists for all Slurm commands, as well as the --help command option,
which provides a brief summary of options.
Slurm documentation and man pages are also available online.
Getting Partition Information
Display partitions/queues on system:
$ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
qcpu* up 2-00:00:00 1/191/0/192 cn[1-192]
qcpu_biz up 2-00:00:00 1/191/0/192 cn[1-192]
qcpu_exp up 1:00:00 1/191/0/192 cn[1-192]
qcpu_free up 18:00:00 1/191/0/192 cn[1-192]
qcpu_long up 6-00:00:00 1/191/0/192 cn[1-192]
qcpu_preempt up 12:00:00 1/191/0/192 cn[1-192]
qgpu up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_biz up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_exp up 1:00:00 0/8/0/8 cn[193-200]
qgpu_free up 18:00:00 0/8/0/8 cn[193-200]
qgpu_preempt up 12:00:00 0/8/0/8 cn[193-200]
qfat up 2-00:00:00 0/1/0/1 cn201
qdgx up 2-00:00:00 0/1/0/1 cn202
qviz up 8:00:00 0/2/0/2 vizserv[1-2]NODES(A/I/O/T) column summarizes node count per state, where the A/I/O/T stands for allocated/idle/other/total.
Example output is from Barbora cluster.
Graphical representation of clusters’ usage, partitions, nodes, and jobs could be found
- for Karolina at https://extranet.it4i.cz/rsweb/karolina
- for Barbora at https://extranet.it4i.cz/rsweb/barbora
- for Complementary Systems at https://extranet.it4i.cz/rsweb/compsys
On Karolina cluster
- all cpu queues/partitions provide full node allocation, whole nodes are allocated to job
- other queues/partitions (gpu, fat, viz) provide partial node allocation
See Karolina Slurm Specifics for details.
On Barbora cluster, all queues/partitions provide full node allocation, whole nodes are allocated to job.
On Complementary systems, only some queues/partitions provide full node allocation, see Complementary systems documentation for details.
Running Interactive Jobs
Sometimes you may want to run your job interactively, for example for debugging, running your commands one by one from the command line.
Run interactive job - queue qcpu_exp, one node by default, one task by default:
$ salloc -A PROJECT-ID -p qcpu_expRun interactive job on four nodes, 128 tasks per node (Karolina cluster, CPU partition recommended value based on node core count), two hours time limit:
$ salloc -A PROJECT-ID -p qcpu -N 4 --ntasks-per-node 128 -t 2:00:00Run interactive job, with X11 forwarding:
$ salloc -A PROJECT-ID -p qcpu_exp --x11To finish the interactive job, use the Ctrl+D (^D) control sequence.
Do not use srun for initiating interactive jobs, subsequent srun, mpirun invocations would block forever.
Running Batch Jobs
Batch jobs is the standard way of running jobs and utilizing HPC clusters.
Job Script
Create example job script called script.sh with the following content:
#!/usr/bin/bash
#SBATCH --job-name MyJobName
#SBATCH --account PROJECT-ID
#SBATCH --partition qcpu
#SBATCH --nodes 4
#SBATCH --ntasks-per-node 128
#SBATCH --time 12:00:00
ml purge
ml OpenMPI/4.1.4-GCC-11.3.0
srun hostname | sort | uniq -cScript will:
-
use bash shell interpreter
-
use
MyJobNameas job name -
use project
PROJECT-IDfor job access and accounting -
use partition/queue
qcpu -
use
4nodes -
use
128tasks per node - value used by MPI -
set job time limit to
12hours -
load appropriate module
-
run command,
srunserves as Slurm’s native way of executing MPI-enabled applications,hostnameis used in the example just for sake of simplicity
Use #SBATCH --exclude=<node_name_list> directive to exclude specific nodes from your job, e.g.: #SBATCH --exclude=cn001,cn002,cn003.
Submit directory will be used as working directory for submitted job,
so there is no need to change directory in the job script.
Alternatively you can specify job working directory using the sbatch --chdir (or shortly -D) option.
Srun Over mpirun
While mpirun can be used to run parallel jobs on our Slurm-managed clusters, we recommend using srun for better integration with Slurm’s scheduling and resource management. srun ensures more efficient job execution and resource control by leveraging Slurm’s features directly, and it simplifies the process by reducing the need for additional configurations often required with mpirun.
Job Submit
Submit batch job:
$ cd my_work_dir
$ sbatch script.shA path to script.sh (relative or absolute) should be given
if the job script is in a different location than the job working directory.
By default, job output is stored in a file called slurm-JOBID.out and contains both job standard output and error output.
This can be changed using the sbatch options --output (shortly -o) and --error (shortly -e).
Example output of the job:
128 cn017.karolina.it4i.cz
128 cn018.karolina.it4i.cz
128 cn019.karolina.it4i.cz
128 cn020.karolina.it4i.czJob Environment Variables
Slurm provides useful information to the job via environment variables.
Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
See all Slurm variables
$ set | grep ^SLURMCommonly used variables are:
| variable name | description | example |
|---|---|---|
| SLURM_JOB_ID | job id of the executing job | 593 |
| SLURM_JOB_NODELIST | nodes allocated to the job | cn[101-102] |
| SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
| SLURM_STEP_NODELIST | nodes allocated to the job step | cn101 |
| SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
| SLURM_JOB_PARTITION | name of the partition | qcpu |
| SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
See relevant Slurm documentation for details.
Get job nodelist:
$ echo $SLURM_JOB_NODELIST
cn[101-102]Expand nodelist to list of nodes:
$ scontrol show hostnames
cn101
cn102Job Management
Getting Job Information
Show all jobs on system:
$ squeueShow my jobs:
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
104 qcpu interact user R 1:48 2 cn[101-102]Show job details for a specific job:
$ scontrol show job JOBIDShow job details for executing job from job session:
$ scontrol show job $SLURM_JOBIDShow my jobs using a long output format which includes time limit:
$ squeue --me -lShow my jobs in running state:
$ squeue --me -t runningShow my jobs in pending state:
$ squeue --me -t pendingShow jobs for a given project:
$ squeue -A PROJECT-IDJob States
The most common job states are (in alphabetical order):
| Code | Job State | Explanation |
|---|---|---|
| CA | CANCELLED | Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated. |
| CD | COMPLETED | Job has terminated all processes on all nodes with an exit code of zero. |
| CG | COMPLETING | Job is in the process of completing. Some processes on some nodes may still be active. |
| F | FAILED | Job terminated with non-zero exit code or other failure condition. |
| NF | NODE_FAIL | Job terminated due to failure of one or more allocated nodes. |
| OOM | OUT_OF_MEMORY | Job experienced out of memory error. |
| PD | PENDING | Job is awaiting resource allocation. |
| PR | PREEMPTED | Job terminated due to preemption. |
| R | RUNNING | Job currently has an allocation. |
| RQ | REQUEUED | Completing job is being requeued. |
| SI | SIGNALING | Job is being signaled. |
| TO | TIMEOUT | Job terminated upon reaching its time limit. |
Modifying Jobs
In general:
$ scontrol update JobId=JOBID ATTR=VALUEModify job’s time limit:
$ scontrol update JobId=JOBID timelimit=4:00:00Set/modify job’s comment:
$ scontrol update JobId=JOBID Comment='The best job ever'Deleting Jobs
Delete a job by job ID:
$ scancel JOBIDDelete all my jobs:
$ scancel --meDelete all my jobs in interactive mode, confirming every action:
$ scancel --me -iDelete all my running jobs:
$ scancel --me -t runningDelete all my pending jobs:
$ scancel --me -t pendingDelete all my pending jobs for a project PROJECT-ID:
$ scancel --me -t pending -A PROJECT-IDTroubleshooting
Invalid Account
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Possible causes:
- Invalid account (i.e. project) was specified in job submission.
- User does not have access to given account/project.
- Given account/project does not have access to given partition.
- Access to given partition was retracted due to the project’s allocation exhaustion.

