You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At least two environments should be created using cookiecutter, which will derive from the `site` base environment:
111
+
At least two environments should be created using cookiecutter, which will
112
+
derive from the `site` base environment:
112
113
-`production`: production environment
113
114
-`staging`: staging environment
114
115
115
116
A `dev` environment should also be created if considered required, or this can
116
117
be left until later.
117
118
118
-
In general only the `inventory/groups` file in the `site` environment is needed -
119
-
it can be modified as required to
120
-
enable features for all environments at the site.
119
+
In general only the `inventory/groups` file in the `site` environment is needed
120
+
-it can be modified as required to enable features for all environments at the
121
+
site.
121
122
122
123
To avoid divergence of configuration all possible overrides for group/role
123
124
vars should be placed in `environments/site/inventory/group_vars/all/*.yml`
@@ -135,7 +136,8 @@ and referenced from the `site` and `production` environments, e.g.:
135
136
136
137
When setting OpenTofu configurations:
137
138
- Environment-specific variables (`cluster_name`) should be hardcoded
138
-
as arguments into the cluster module block at `environments/$ENV/tofu/main.tf`.
139
+
as arguments into the cluster module block at
140
+
`environments/$ENV/tofu/main.tf`.
139
141
- Environment-independent variables (e.g. maybe `cluster_net` if the
140
142
same is used for staging and production) should be set as *defaults*
141
143
in `environments/site/tofu/variables.tf`, and then don't need to
@@ -278,13 +280,13 @@ Once it completes you can log in to the cluster using:
278
280
either for a specific environment within the cluster module block in
279
281
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
280
282
default in `environments/site/tofu/variables.tf`.
281
-
283
+
282
284
For a development environment allowing OpenTofu to manage the volumes using
283
285
the default value of `"manage"` for those varibles is usually appropriate, as
284
286
it allows for multiple clusters to be created with this environment.
285
-
286
-
If no home volume at all is required because the home directories are provided
287
-
by a parallel filesystem (e.g. manila) set
287
+
288
+
If no home volume at all is required because the home directories are
289
+
provided by a parallel filesystem (e.g. manila) set
288
290
289
291
home_volume_provisioning = "none"
290
292
@@ -302,21 +304,23 @@ Once it completes you can log in to the cluster using:
302
304
303
305
- Consider whether Prometheus storage configuration is required. By default:
304
306
- A 200GB state volume is provisioned (but see above)
305
-
- The common environment [sets](../environments/common/inventory/group_vars/all/prometheus.yml)
306
-
a maximum retention of 100 GB and 31 days
307
+
- The common environment
308
+
[sets](../environments/common/inventory/group_vars/all/prometheus.yml) a
309
+
maximum retention of 100 GB and 31 days.
307
310
These may or may not be appropriate depending on the number of nodes, the
308
311
scrape interval, and other uses of the state volume (primarily the `slurmctld`
309
-
state and the `slurmdbd` database). See[docs/monitoring-and-logging](./monitoring-and-logging.md)
310
-
for more options.
312
+
state and the `slurmdbd` database). See
313
+
[docs/monitoring-and-logging](./monitoring-and-logging.md)for more options.
311
314
312
315
- Configure Open OnDemand - see [specific documentation](openondemand.md) which
313
316
notes specific variables required.
314
317
315
318
- Configure Open OnDemand - see [specific documentation](openondemand.md).
316
319
317
-
- Remove the `demo_user` user from `environments/$ENV/inventory/group_vars/all/basic_users.yml`.
318
-
Replace the `hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml` with
319
-
an appropriately configured user.
320
+
- Remove the `demo_user` user from
321
+
`environments/$ENV/inventory/group_vars/all/basic_users.yml`. Replace the
322
+
`hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml`
323
+
with an appropriately configured user.
320
324
321
325
- Consider whether having (read-only) access to Grafana without login is OK. If
322
326
not, remove `grafana_auth_anonymous` in
@@ -325,15 +329,21 @@ Once it completes you can log in to the cluster using:
325
329
- A production deployment may have a more complex networking requirements than
326
330
just a simple network. See the [networks docs](networks.md) for details.
327
331
328
-
- If floating IPs are required for login nodes, create these in OpenStack and add the IPs into
329
-
the OpenTofu `login` definition.
330
-
331
-
- Consider enabling topology aware scheduling. This is currently only supported if your cluster does not include any baremetal nodes. This can be enabled by:
332
-
1. Creating Availability Zones in your OpenStack project for each physical rack
333
-
2. Setting the `availability_zone` fields of compute groups in your OpenTofu configuration
334
-
3. Adding the `compute` group as a child of `topology` in `environments/$ENV/inventory/groups`
335
-
4. (Optional) If you are aware of the physical topology of switches above the rack-level, override `topology_above_rack_topology` in your groups vars
336
-
(see [topology docs](../ansible/roles/topology/README.md) for more detail)
332
+
- If floating IPs are required for login nodes, create these in OpenStack and
333
+
add the IPs into the OpenTofu `login` definition.
334
+
335
+
- Consider enabling topology aware scheduling. This is currently only supported
336
+
if your cluster does not include any baremetal nodes. This can be enabled by:
337
+
1. Creating Availability Zones in your OpenStack project for each physical
338
+
rack
339
+
2. Setting the `availability_zone` fields of compute groups in your
340
+
OpenTofu configuration
341
+
3. Adding the `compute` group as a child of `topology` in
342
+
`environments/$ENV/inventory/groups`
343
+
4. (Optional) If you are aware of the physical topology of switches above
344
+
the rack-level, override `topology_above_rack_topology` in your groups
345
+
vars (see [topology docs](../ansible/roles/topology/README.md) for more
346
+
detail)
337
347
338
348
- Consider whether mapping of baremetal nodes to ironic nodes is required. See
0 commit comments