Skip to content

Commit 8e3bfd0

Browse files
codeharrisgkaf89
authored andcommitted
Add instructions on how to use job arrays in HPC
1 parent b1b8876 commit 8e3bfd0

File tree

2 files changed

+66
-1
lines changed

2 files changed

+66
-1
lines changed

docs/jobs/arrays.md

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,65 @@
1-
# Job Arrays
1+
# Job Arrays in HPC systems
2+
3+
In HPC systems, cluster policy may enforce job submission limits in order to protect the scheduler from overload.
4+
5+
When you want to submit multiple jobs that share the same initial options (e.g. qos, time limit etc.) but with different input parameters the naive way is to manually or programatically generate and submit multiple scripts with different parameters each with its own sbatch job allocation. But doing this may quickly hit the cluster limits and risks having your job submission rejected.
6+
7+
If you are planning to execute jobs campaigns that require more than 10 job allocations per minute then consider [GNU parallel](/jobs/gnu-parallel/) but if your job allocations are less than 10 jobs per minute consider using [Job Arrays](https://slurm.schedmd.com/job_array.html).
8+
9+
Job arrays provides you with a mechanism for submitting and managing collections of similar jobs quickly and easily, while still giving you fine control over the maximum simultaneously running tasks from the Job array.
10+
11+
## Using Job Arrays
12+
13+
[Job Arrays](https://slurm.schedmd.com/job_array.html) are supported for `batch` jobs by specifying array index values using `--array` or `-a`option either as a comment inside the SLURM script `#SBATCH --array=<start_index>-<end_index>:<increment>` or by specifying the array range directly when you run the `sbatch` command `sbatch --array=1-100 array_script.sh`
14+
15+
16+
The option arguments can be either
17+
18+
- specific array index values `--array=0-31`
19+
- a range of index values `--array=1,3,5,7`
20+
- optional step sizes `--array=1-7:2` (step size 2)
21+
- `<start_index>` an Integer > 0 that defines the Task ID for the first job in the array
22+
- `<end_index>` an Integer > `<start_index>` that defines the Task ID of the last job in the array
23+
- `<increment>` an Integer > 0 that specifies the increment or step size between the Task IDs it is default to '1' if not specified
24+
25+
26+
```
27+
#!/bin/bash --login
28+
#SBATCH --job-name=array_script
29+
#SBATCH --array=10-30:10
30+
#SBATCH --partition=batch
31+
#SBATCH --qos=normal
32+
#SBATCH --nodes=4
33+
#SBATCH --ntasks-per-node=8
34+
#SBATCH --cpus-per-task=16
35+
#SBATCH --time=02:00:00
36+
#SBATCH --output=%A_%a.out
37+
#SBATCH --error=%A_%a.err
38+
39+
declare test_duration=${SLURM_ARRAY_TASK_ID}
40+
41+
srun \
42+
--nodes=1 \
43+
--ntasks=1 \
44+
stress-ng \
45+
--cpu ${SLURM_CPUS_PER_TASK} \
46+
--timeout "${test_duration}"
47+
48+
```
49+
50+
Additionally you can specify the maximum number of concurrent running tasks from the job array by ising a `%` separator for example `--array=0-31%4` will limit the number of simultaneously running tasks from this job array to 4. Note that the minimum index value is zero and the maximum value is a Slurm configuration parameter (MaxArraySize minus one).
51+
52+
53+
54+
??? info "Additional enviroment variables for Job Arrays"
55+
Job arrays will have additional environment variables set
56+
57+
- `SLURM_ARRAY_JOB_ID`: job ID of the array
58+
- `SLURM_ARRAY_TASK_ID`: job array index value
59+
- `SLURM_ARRAY_TASK_COUNT`: the number of tasks in the job array
60+
- `SLURM_ARRAY_TASK_MAX`: the highest job array index value
61+
- `SLURM_ARRAY_TASK_MIN`: the lowest job array index value
62+
63+
64+
65+

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ nav:
103103
- Launcher Scripts Examples: 'slurm/launchers.md'
104104
- GNU parallel: 'jobs/gnu-parallel.md'
105105
# - Affinity: 'jobs/affinity.md'
106+
- Job Arrays: 'jobs/arrays.md'
106107
# - (Multi-)GPU Jobs: 'jobs/gpu.md'
107108
# - Memory Management: 'jobs/memory.md'
108109
# - Best Practices: 'jobs/best-practices.md'

0 commit comments

Comments
 (0)