Slurm¶
The UPPMAX clusters are a shared resource. To ensure fair use, UPPMAX uses a scheduling system. A scheduling system decides at what time which calculation is done. The software used is called Slurm.
Why not write SLURM?
Indeed, Slurm started as an abbreviation of 'Simple Linux Utility for Resource Management'. However, the Slurm homepage uses 'Slurm' to describe the tool, hence we use Slurm too.
This page describes how to use Slurm in general. See optimizing jobs how to optimize Slurm jobs. See Slurm troubleshooting how to fix Slurm errors.
For information specific to clusters, see:
Slurm Commands¶
The Slurm system is accessed using the following commands:
interactive
- Start an interactive session. This is described in-depth for Bianca and Rackhamsbatch
- Submit and run a batch job scriptsrun
- Typically used inside batch job scripts for running parallel jobs (See examples further down)scancel
- Cancel one or more of your jobs.
flowchart TD
subgraph sub_inside[IP inside SUNET]
subgraph sub_cluster_env[Cluster environment]
login_node(User on login node)
interactive_node(User on interactive node)
computation_node(Computation node):::calculation_node
end
end
login_node --> |move user, interative|interactive_node
login_node ==> |submit jobs, sbatch|computation_node
computation_node -.-> |can become| interactive_node
The different types of nodes an UPPMAX cluster has. White nodes: nodes a user can interact with. Blue nodes: nodes a user cannot interact with. The thick edge shows the topic of this page: how to submit jobs to a computation node.
Job parameters¶
This session describes how to specify a Slurm job:
- Getting started redirects to the cluster-specific pages
- Partitions specify the type of job
Getting started¶
To let Slurm schedule a job, one uses sbatch
, like:
for example:
Minimal and complete examples of using sbatch
is described at the respective cluster guides:
Partitions¶
Partitions are a way to tell what type of job you are submitting, e.g. if it needs to reserve a whole node, or part of a node.
To let Slurm schedule a job using a partition,
use the --partition
(or -p
) flag like this:
for example:
These are the partition names and their descriptions:
Partition name | Description |
---|---|
core |
Use one or more cores |
node |
Use a full node's set of cores |
devel |
Development job |
devcore |
Development job |
The core
partition¶
The core
partition allows one to use one or more cores.
Here is the minimal use for one core:
For example:
To specify multiple cores, use --ntasks
(or -n
) like this:
For example:
Here, two cores are used.
What is the relation between ntasks
and number of cores?
Agreed, the flag ntasks
only indicates the number of threads.
However, by default, the number of tasks per core is set to one.
One can make this link explicit by using:
This is especially important if you might adjust core usage of the job to be something less than a full node.
The node
partition¶
Whenever -p node is specified, an entire node is used, no matter how many cores are specifically requested with -n [no_of_cores].
For example, some bioinformatics tools show minimal increase in performance when more than 8-10 cores/job; in this case, specify "-p core -n 8" to ensure that only 8 cores (less than a single node) are allocated for such a job.
The devel
partition¶
The devcore
partition¶
Specifying job parameters¶
Whether you use the UPPMAX clusters interactively or in batch mode, you always have to specify a few things, like number of cores needed, running time etc. These things can be specified in two ways:
Either as flags sent to the different Slurm commands (sbatch
, srun
, the
interactive
command, etc.), like so:
or, when using the sbatch
command, it can be specified inside the job script
file itself, by using special SBATCH
comments, for example:
#!/bin/bash -l
#SBATCH -A p2012999
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -J some_job_name
If doing this, then one will only need to start the script like so, without any flags:
How to see how many resources my project has used?
Use projplot.