|
1 |
| -# Job Arrays |
| 1 | +# Job Arrays in HPC systems |
| 2 | + |
| 3 | +In HPC systems, cluster policy may enforce job submission limits in order to protect the scheduler from overload. |
| 4 | + |
| 5 | +When you want to submit multiple jobs that share the same initial options (e.g. qos, time limit etc.) but with different input parameters the naive way is to manually or programatically generate and submit multiple scripts with different parameters each with its own sbatch job allocation. But doing this may quickly hit the cluster limits and risks having your job submission rejected. |
| 6 | + |
| 7 | +If you are planning to execute jobs campaigns that require more than 10 job allocations per minute then consider [GNU parallel](/jobs/gnu-parallel/) but if your job allocations are less than 10 jobs per minute consider using [Job Arrays](https://slurm.schedmd.com/job_array.html). |
| 8 | + |
| 9 | +Job arrays provides you with a mechanism for submitting and managing collections of similar jobs quickly and easily, while still giving you fine control over the maximum simultaneously running tasks from the Job array. |
| 10 | + |
| 11 | +## Using Job Arrays |
| 12 | + |
| 13 | +[Job Arrays](https://slurm.schedmd.com/job_array.html) are supported for `batch` jobs by specifying array index values using `--array` or `-a`option either as a comment inside the SLURM script `#SBATCH --array=<start_index>-<end_index>:<increment>` or by specifying the array range directly when you run the `sbatch` command `sbatch --array=1-100 array_script.sh` |
| 14 | + |
| 15 | + |
| 16 | +The option arguments can be either |
| 17 | + |
| 18 | +- specific array index values `--array=0-31` |
| 19 | +- a range of index values `--array=1,3,5,7` |
| 20 | +- optional step sizes `--array=1-7:2` (step size 2) |
| 21 | +- `<start_index>` an Integer > 0 that defines the Task ID for the first job in the array |
| 22 | +- `<end_index>` an Integer > `<start_index>` that defines the Task ID of the last job in the array |
| 23 | +- `<increment>` an Integer > 0 that specifies the increment or step size between the Task IDs it is default to '1' if not specified |
| 24 | + |
| 25 | + |
| 26 | +``` |
| 27 | +#!/bin/bash --login |
| 28 | +#SBATCH --job-name=array_script |
| 29 | +#SBATCH --array=10-30:10 |
| 30 | +#SBATCH --partition=batch |
| 31 | +#SBATCH --qos=normal |
| 32 | +#SBATCH --nodes=4 |
| 33 | +#SBATCH --ntasks-per-node=8 |
| 34 | +#SBATCH --cpus-per-task=16 |
| 35 | +#SBATCH --time=02:00:00 |
| 36 | +#SBATCH --output=%A_%a.out |
| 37 | +#SBATCH --error=%A_%a.err |
| 38 | +
|
| 39 | +declare test_duration=${SLURM_ARRAY_TASK_ID} |
| 40 | +
|
| 41 | + srun \ |
| 42 | + --nodes=1 \ |
| 43 | + --ntasks=1 \ |
| 44 | + stress-ng \ |
| 45 | + --cpu ${SLURM_CPUS_PER_TASK} \ |
| 46 | + --timeout "${test_duration}" |
| 47 | +
|
| 48 | +``` |
| 49 | + |
| 50 | +Additionally you can specify the maximum number of concurrent running tasks from the job array by ising a `%` separator for example `--array=0-31%4` will limit the number of simultaneously running tasks from this job array to 4. Note that the minimum index value is zero and the maximum value is a Slurm configuration parameter (MaxArraySize minus one). |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | +??? info "Additional enviroment variables for Job Arrays" |
| 55 | + Job arrays will have additional environment variables set |
| 56 | + |
| 57 | + - `SLURM_ARRAY_JOB_ID`: job ID of the array |
| 58 | + - `SLURM_ARRAY_TASK_ID`: job array index value |
| 59 | + - `SLURM_ARRAY_TASK_COUNT`: the number of tasks in the job array |
| 60 | + - `SLURM_ARRAY_TASK_MAX`: the highest job array index value |
| 61 | + - `SLURM_ARRAY_TASK_MIN`: the lowest job array index value |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | + |
0 commit comments