You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* remove drain and resume functionality
* allow install and runtime taskbooks to be used directly
* fix linter complaints
* fix slurmctld state
* move common tasks to pre.yml
* remove unused openhpc_slurm_service
* fix ini_file use for some community.general versions
* fix var precedence in molecule test13
* fix var precedence in all molecule tests
* fix slurmd always starting on control node
* move install to install-ohpc.yml
* remove unused ohpc_slurm_services var
* add install-generic for binary-only install
* distinguish between system and user slurm binaries for generic install
* remove support for CentOS7 / OpenHPC
* remove post-configure, not needed as of slurm v20.02
* add openmpi/IMB-MPI1 by default for generic install
* allow removal of slurm.conf options
* update README
* enable openhpc_extra_repos for both generic and ohpc installs
* README tweak
* add openhpc_config_files parameter
* change library_dir to lib_dir
* fix perms
* fix/silence linter warnings
* remove packages only required for hpctests
* document openhpc_config_files restart behaviour
* bugfix missing newline in slurm.conf
* make path for slurm.conf configurable
* make slurm.conf template src configurable
* symlink slurm user tools so monitoring works
* fix slurm directories
* fix slurmdbd path for non-default slurm.conf paths
* default gres.conf to correct directory
* document <absent> for openhpc_config
* minor merge diff fixes
* Fix EPEL not getting installed
* build RL9.3 container images with systemd
* allow use on image containing slurm binaries
* prepend slurm binaries to PATH instead of symlinking
* ensure cgroup.conf is always next to slurm.conf and allow overriding template
* Add group.node_params to partitions/groups. (#182) (#185)
* Add group.node_params to partitions/groups. (#182)
Allows Features, etc., to be added to partitions.
* update SelectType from legacy to current default (#167)
---------
Co-authored-by: Kurt Bendl <[email protected]>
* update readme
* fixup mode parameters
* tidy slurmd restart line
---------
Co-authored-by: Kurt Bendl <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+33-15Lines changed: 33 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,35 +2,35 @@
2
2
3
3
# stackhpc.openhpc
4
4
5
-
This Ansible role installs packages and performs configuration to provide an OpenHPC v2.x Slurm cluster.
5
+
This Ansible role installs packages and performs configuration to provide a Slurm cluster. By default this uses packages from [OpenHPC](https://openhpc.community/) but it can also use user-provided Slurm binaries.
6
6
7
7
As a role it must be used from a playbook, for which a simple example is given below. This approach means it is totally modular with no assumptions about available networks or any cluster features except for some hostname conventions. Any desired cluster fileystem or other required functionality may be freely integrated using additional Ansible roles or other approaches.
8
8
9
9
The minimal image for nodes is a RockyLinux 8 GenericCloud image.
10
10
11
+
## Task files
12
+
This role provides four task files which can be selected by using the `tasks_from` parameter of Ansible's `import_role` or `include_role` modules:
13
+
-`main.yml`: Runs `install-ohpc.yml` and `runtime.yml`. Default if no `tasks_from` parameter is used.
14
+
-`install-ohpc.yml`: Installs repos and packages for OpenHPC.
15
+
-`install-generic.yml`: Installs systemd units etc. for user-provided binaries.
16
+
-`runtime.yml`: Slurm/service configuration.
17
+
11
18
## Role Variables
12
19
20
+
Variables only relevant for `install-ohpc.yml` or `install-generic.yml` task files are marked as such below.
21
+
13
22
`openhpc_extra_repos`: Optional list. Extra Yum repository definitions to configure, following the format of the Ansible
14
-
[yum_repository](https://docs.ansible.com/ansible/2.9/modules/yum_repository_module.html) module. Respected keys for
15
-
each list element:
16
-
*`name`: Required
17
-
*`description`: Optional
18
-
*`file`: Required
19
-
*`baseurl`: Optional
20
-
*`metalink`: Optional
21
-
*`mirrorlist`: Optional
22
-
*`gpgcheck`: Optional
23
-
*`gpgkey`: Optional
24
-
25
-
`openhpc_slurm_service_enabled`: boolean, whether to enable the appropriate slurm service (slurmd/slurmctld).
`openhpc_slurm_service_enabled`: Optional boolean, whether to enable the appropriate slurm service (slurmd/slurmctld). Default `true`.
26
26
27
27
`openhpc_slurm_service_started`: Optional boolean. Whether to start slurm services. If set to false, all services will be stopped. Defaults to `openhpc_slurm_service_enabled`.
28
28
29
29
`openhpc_slurm_control_host`: Required string. Ansible inventory hostname (and short hostname) of the controller e.g. `"{{ groups['cluster_control'] | first }}"`.
30
30
31
31
`openhpc_slurm_control_host_address`: Optional string. IP address or name to use for the `openhpc_slurm_control_host`, e.g. to use a different interface than is resolved from `openhpc_slurm_control_host`.
32
32
33
-
`openhpc_packages`: additional OpenHPC packages to install.
33
+
`openhpc_packages`: Optional list. Additional OpenHPC packages to install (`install-ohpc.yml` only).
34
34
35
35
`openhpc_enable`:
36
36
*`control`: whether to enable control host
@@ -44,7 +44,15 @@ each list element:
44
44
45
45
`openhpc_login_only_nodes`: Optional. If using "configless" mode specify the name of an ansible group containing nodes which are login-only nodes (i.e. not also control nodes), if required. These nodes will run `slurmd` to contact the control node for config.
46
46
47
-
`openhpc_module_system_install`: Optional, default true. Whether or not to install an environment module system. If true, lmod will be installed. If false, You can either supply your own module system or go without one.
47
+
`openhpc_module_system_install`: Optional, default true. Whether or not to install an environment module system. If true, lmod will be installed. If false, You can either supply your own module system or go without one (`install-ohpc.yml` only).
48
+
49
+
`openhpc_generic_packages`: Optional. List of system packages to install, see `defaults/main.yml` for details (`install-generic.yml` only).
50
+
51
+
`openhpc_sbin_dir`: Optional. Path to slurm daemon binaries such as `slurmctld`, default `/usr/sbin` (`install-generic.yml` only).
52
+
53
+
`openhpc_bin_dir`: Optional. Path to Slurm user binaries such as `sinfo`, default `/usr/bin` (`install-generic.yml` only).
54
+
55
+
`openhpc_lib_dir`: Optional. Path to Slurm libraries, default `/usr/lib64/slurm` (`install-generic.yml` only).
48
56
49
57
### slurm.conf
50
58
@@ -122,6 +130,16 @@ that this is *not the same* as the Ansible `omit` [special variable](https://doc
122
130
123
131
`openhpc_state_save_location`: Optional. Absolute path for Slurm controller state (`slurm.conf` parameter [StateSaveLocation](https://slurm.schedmd.com/slurm.conf.html#OPT_StateSaveLocation))
124
132
133
+
`openhpc_slurmd_spool_dir`: Optional. Absolute path for slurmd state (`slurm.conf` parameter [SlurmdSpoolDir](https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir))
134
+
135
+
`openhpc_slurm_conf_template`: Optional. Path of Jinja template for `slurm.conf` configuration file. Default is `slurm.conf.j2` template in role. **NB:** The required templating is complex, if just setting specific parameters use `openhpc_config` intead.
136
+
137
+
`openhpc_slurm_conf_path`: Optional. Path to template `slurm.conf` configuration file to. Default `/etc/slurm/slurm.conf`
138
+
139
+
`openhpc_gres_template`: Optional. Path of Jinja template for `gres.conf` configuration file. Default is `gres.conf.j2` template in role.
140
+
141
+
`openhpc_cgroup_template`: Optional. Path of Jinja template for `cgroup.conf` configuration file. Default is `cgroup.conf.j2` template in role.
142
+
125
143
#### Accounting
126
144
127
145
By default, no accounting storage is configured. OpenHPC v1.x and un-updated OpenHPC v2.0 clusters support file-based accounting storage which can be selected by setting the role variable `openhpc_slurm_accounting_storage_type` to `accounting_storage/filetxt`<supid="accounting_storage">[1](#slurm_ver_footnote)</sup>. Accounting for OpenHPC v2.1 and updated OpenHPC v2.0 clusters requires the Slurm database daemon, `slurmdbd` (although job completion may be a limited alternative, see [below](#Job-accounting). To enable accounting:
mode: "0644"# perms/ownership based off src from ohpc package
91
98
owner: root
92
99
group: root
@@ -96,15 +103,6 @@
96
103
register: ohpc_cgroup_conf
97
104
# NB uses restart rather than reload as this is needed in some cases
98
105
99
-
- name: Remove local tempfile for slurm.conf templating
100
-
ansible.builtin.file:
101
-
path: "{{ _slurm_conf_tmpfile.path }}"
102
-
state: absent
103
-
when: _slurm_conf_tmpfile.path is defined
104
-
delegate_to: localhost
105
-
changed_when: false # so molecule doesn't fail
106
-
become: no
107
-
108
106
- name: Ensure Munge service is running
109
107
service:
110
108
name: munge
@@ -129,7 +127,9 @@
129
127
changed_when: true
130
128
when:
131
129
- openhpc_slurm_control_host in ansible_play_hosts
132
-
- hostvars[openhpc_slurm_control_host].ohpc_slurm_conf.changed or hostvars[openhpc_slurm_control_host].ohpc_cgroup_conf.changed or hostvars[openhpc_slurm_control_host].ohpc_gres_conf.changed # noqa no-handler
130
+
- hostvars[openhpc_slurm_control_host].ohpc_slurm_conf.changed or
131
+
hostvars[openhpc_slurm_control_host].ohpc_cgroup_conf.changed or
0 commit comments