Skip to content

Compute nodes, Slurm and debugging jobs on Bianca

More Slurm and other advanced UPPMAX techniques

  • A closer look at Slurm
  • Using the GPUs
  • Debugging
  • Job efficiency with the jobstats tool
  • Advanced job submission

The Slurm Workload Manager

More on sbatch

Recap:

sbatch -A sens2023598 -t 10:00 -p core -n 10 my_job.sh
slurm batch project name max runtime partition ("job type") #cores job script

More on time limits

  • Format -t dd-hh:mm:ss
  • Examples and variants on syntax

    • 0-00:10:00 = 00:10:00 = 10:00 = 10
    • 0-12:00:00 = 12:00:00
    • 3-00:00:00 = 3-0
    • 3-12:10:15

Job walltime

When you have no idea how long a program will take to run, what should you book?

A: very long time, e.g. 10-00:00:00

When you have an idea of how long a program would take to run, what should you book?

A: overbook by 50%

More on partitions

  • -p core

    • “core” is the default partition
    • ≤ 16 cores on Bianca
    • a script or program written without any thought on parallelism will use 1 core
  • -p node

    • if you wish to book full node(s)

Quick testing

  • The “devel” partition
    • max 2 nodes per job
    • up to 1 hour in length
    • only 1 at a time
    • -p devcore, -p devel
Any free nodes in the devel partition? Check status with
  • sinfo -p devel
  • jobinfo -p devel
  • more on these tools later
  • High priority queue for short jobs

    • 4 nodes
    • up to 15 minutes
    • --qos=short

Debugging or complicated workflows

  • Interactive jobs

    • handy for debugging a code or a script by executing it line by line or for using programs with a graphical user interface
    • salloc -n 80 -t 03:00:00 -A sens2023598
    • interactive -n 80 -t 03:00:00 -A sens2023598

    • up to 12 hours

    • useful together with the --begin=<time> flag
    • salloc -A snic2022-22-50 --begin=2022-02-17T08:00:00

    • asks for an interactive job that will start earliest tomorrow at 08:00

Parameters in the job script or the command line?

  • Command line parameters override script parameters
  • A typical script may be:
#!/bin/bash
#SBATCH -A sens2023598
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 24:00:00

Just a quick test:

sbatch -p devcore -t 00:15:00 jobscript.sh
Hands-on #1: sbatch/jobinfo
  • login to Bianca
  • find out which projects you’re a member of using projinfo
  • submit a short (10 min) test job; note the job ID
  • find out if there are any free nodes in the devel partition
  • submit a new job to use the devel partition
  • write in the HackMD when you’re done

Memory in core or devcore jobs

  • -n X
  • Bianca: 8GB per core
  • Slurm reports the available memory in the prompt at the start of an interactive job

More flags

  • -J <jobname>
  • email:

    • --mail-type=BEGIN,END,FAIL,TIME_LIMIT_80
  • out/err redirection:

    • --output=slurm-%j.out and —-error=slurm-%j.err

      • by default, where %j will be replaced by the job ID
    • --output=my.output.file

    • --error=my.error.file

Monitoring jobs

  • jobinfo - a wrapper around squeue

    • lists running and pending jobs
    • jobinfo -u username
    • jobinfo -A sens2023598
    • jobinfo -u username --state=running
    • jobinfo -u username --state=pending
  • You may also use the squeue command.

    • This will give you a list of jobs in the present project (possibly other users within the project)

Get a view of the whole queue, including all projects

  • Use the command bianca_combined_jobinfo (queued jobs of all projects)
  • That makes it easier to see how the resources are used and what the odds are that you can start your job soon!

Monitoring and modifying jobs

  • scontrol

    • scontrol show job [jobid]
  • possible to modify the job details after the job has been submitted; some options, like maximum runtime, may be modified (=shortened) even after the job started

    • scontrol update JobID=jobid QOS=short
    • scontrol update JobID=jobid TimeLimit=1-00:00:00
    • scontrol update JobID=jobid NumNodes=10
    • scontrol update JobID=jobid Features=mem1TB

When a job goes wrong

  • scancel [jobid]

    • -u username - to cancel all your jobs
    • -t [state] - cancel pending or running jobs
    • -n name - cancel jobs with a given name
    • -i - ask for confirmation

Priority

  • Roughly:

    • The first job of the day has elevated priority
    • Other normal jobs run in the order of submission (subject to scheduling)
    • Projects exceeding their allocation get successively into the lower priority category
    • Bonus jobs run after the jobs in the higher priority categories
  • In practice:

    • submit early = run early
    • bonus jobs always run eventually, but may need to wait until the night or weekend
    • In detail: jobinfo
Hands-on #2: sbatch/squeue/scancel/scontrol/jobinfo
  • submit a new job; note the job ID
  • check all your running jobs
  • what is the priority or your recently-submitted job?
  • submit a new job to run for 24h; note the job ID
  • modify the name of the job to “wrongjob”
  • cancel your job with name “wrongjob”

Determining job efficiency

  • jobstats - custom-made UPPMAX tool

Job efficiency

  • jobstats - a tool in the fight for productivity

    • it works only for jobs longer than 5-15 minutes
    • -r jobid - check running jobs
    • A project - check all recent jobs of a given project
    • p jobid - produce a CPU and memory usage plot
  • Jobstats user guide

Hands-on #3: jobstats
    • Firstly, find some job IDs from this month
    • Run finishedjobinfo -m username
    • Write down the IDs from some interesting jobs
    • Generate the images:

    Generate jobstats plots for your jobs

    $ jobstats -p ID1 ID2 ID3
    
  • Look at the images

$ eog *png &
  • Which of the plots
    • Show good CPU or memory usage?
    • Indicate that the job requires a fat node?

Different flavours of Slurm: Job script examples and workflows

Simple workflow

#!/bin/bash
#SBATCH -J jobname
#SBATCH -A sens2023598
#SBATCH -p core
#SBATCH -n 10
#SBATCH -t 10:00:00

module load software/version
module load python/3.9.5

./my-script.sh
./another-script.sh
./myprogram.exe

Job dependencies

  • sbatch jobscript.sh submitted job with jobid1
  • sbatch anotherjobscript.sh submitted job with jobid2
  • --dependency=afterok:jobid1:jobid2 job will only start running after the successful end of jobs jobid1:jobid2
  • very handy for clearly defined workflows
  • You may also use --dependency=afternotok:jobid in case you’d like to resubmit a failed job, OOM (out of memory) for example, to a node with a higher memory: -C mem215GB or -C mem512GB

I/O intensive jobs: $SNIC_TMP

#!/bin/bash
#SBATCH -J jobname
#SBATCH -A sens2023598
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 10:00:00

module load bioinfotools
module load bwa/0.7.17 samtools/1.14

export SRCDIR=$HOME/path-to-input

cp $SRCDIR/foo.pl $SRCDIR/bar.txt $SNIC_TMP/.
cd $SNIC_TMP

./foo.pl bar.txt

cp *.out $SRCDIR/path-to-output/.

OpenMP or multi-threaded job

#!/bin/bash
#SBATCH -A sens2023598
#SBATCH --exclusive
#SBATCH -p node
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH -t 01:00:00

module load uppasd
export OMP_NUM_THREADS=20

sd > out.log

GPU nodes on Bianca

All Bianca GPU nodes have at least 256 GB RAM (fat nodes) with 16 CPU cores.

NVIDIA A100 40 GB

10 nodes with two NVIDIA A100 GPUs each. 20 GPUs in total.

In order to avoid GPU misuse, a project cannot request more than 7 of these GPU nodes in total.

Example job script:

#SBATCH --gpus=2            #number of GPUs requested
#SBATCH --gpus-per-node=2   #number of GPUs per node

nvidia-smi

NVIDIA T4 16 GB

17 nodes with one NVIDIA T4 GPU each.

Example job script:

#SBATCH --gpus=t4:1            #number of GPUs requested
#SBATCH --gpus-per-node=t4:1   #number of GPUs per node

nvidia-smi

Running on several nodes: MPI jobs

#!/bin/bash -l
#SBATCH -J rsptjob
#SBATCH —mail-type=FAIL
#SBATCH -A sens2023598
#SBATCH -t 00-07:00:00
#SBATCH -p node
#SBATCH -N 4
### for jobs shorter than 15 min (max 4 nodes):
###SBATCH --qos=short

module load RSPt/2021-10-04
export RSPT_SCRATCH=$SNIC_TMP

srun -n 80 rspt

rm -f apts dmft_lock_file e_entropy efgArray.dat.0 efgData.out.0 energy_matrices eparm_last interstitialenergy jacob1 jacob2 locust.* out_last pot_last rspt_fft_wisdom.* runs.a symcof_new

Job arrays

  • Submit many jobs at once with the same or similar parameters
  • Use $SLURM_ARRAY_TASK_ID in the script in order to find the correct path
#!/bin/bash
#SBATCH -A sens2023598
#SBATCH -p node
#SBATCH -N 2
#SBATCH -t 01:00:00
#SBATCH -J jobarray
#SBATCH --array=0-19
#SBATCH --mail-type=ALL,ARRAY_TASKS

# SLURM_ARRAY_TASK_ID tells the script which iteration to run
echo $SLURM_ARRAY_TASK_ID

cd /pathtomydirectory/dir_$SLURM_ARRAY_TASK_ID/

srun -n 40 my-program
env
  • You may use scontrol to modify some of the job arrays.

Snakemake and Nextflow

  • Conceptually similar, but with different flavours
  • First define steps, each with an input, an output, and a command that transforms the input into output
  • Then just ask for the desired output and the system will handle the rest
  • Snakemake hackathon (re-occurring event)
  • Nextflow training
Hands-on #4: make it your own
  • use 2 or 3 of the sample job scripts as a starting point for your own job script
  • tweak them so that you run something closer to your research; or just feel free to experiment
  • paste at least one of the examples in the HackMD
  • great if you could add a comment what the job script is about

Where to go from here?