You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Environment-specific variables (`cluster_name`) should be hardcoded
138
-
as arguments into the cluster module block at `environments/$ENV/tofu/main.tf`.
139
-
- Environment-independent variables (e.g. maybe `cluster_net` if the
140
-
same is used for staging and production) should be set as *defaults*
141
-
in `environments/site/tofu/variables.tf`, and then don't need to
142
-
be passed in to the module.
143
-
144
137
OpenTofu configurations should be defined in the `site` environment and used
145
138
as a module from the other environments. This can be done with the
146
139
cookie-cutter generated configurations:
@@ -278,13 +271,13 @@ Once it completes you can log in to the cluster using:
278
271
either for a specific environment within the cluster module block in
279
272
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
280
273
default in `environments/site/tofu/variables.tf`.
281
-
274
+
282
275
For a development environment allowing OpenTofu to manage the volumes using
283
276
the default value of `"manage"` for those varibles is usually appropriate, as
284
277
it allows for multiple clusters to be created with this environment.
285
-
286
-
If no home volume at all is required because the home directories are provided
287
-
by a parallel filesystem (e.g. manila) set
278
+
279
+
If no home volume at all is required because the home directories are
280
+
provided by a parallel filesystem (e.g. manila) set
288
281
289
282
home_volume_provisioning = "none"
290
283
@@ -302,21 +295,23 @@ Once it completes you can log in to the cluster using:
302
295
303
296
- Consider whether Prometheus storage configuration is required. By default:
304
297
- A 200GB state volume is provisioned (but see above)
305
-
- The common environment [sets](../environments/common/inventory/group_vars/all/prometheus.yml)
306
-
a maximum retention of 100 GB and 31 days
298
+
- The common environment
299
+
[sets](../environments/common/inventory/group_vars/all/prometheus.yml) a
300
+
maximum retention of 100 GB and 31 days.
307
301
These may or may not be appropriate depending on the number of nodes, the
308
302
scrape interval, and other uses of the state volume (primarily the `slurmctld`
309
-
state and the `slurmdbd` database). See[docs/monitoring-and-logging](./monitoring-and-logging.md)
310
-
for more options.
303
+
state and the `slurmdbd` database). See
304
+
[docs/monitoring-and-logging](./monitoring-and-logging.md)for more options.
311
305
312
306
- Configure Open OnDemand - see [specific documentation](openondemand.md) which
313
307
notes specific variables required.
314
308
315
309
- Configure Open OnDemand - see [specific documentation](openondemand.md).
316
310
317
-
- Remove the `demo_user` user from `environments/$ENV/inventory/group_vars/all/basic_users.yml`.
318
-
Replace the `hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml` with
319
-
an appropriately configured user.
311
+
- Remove the `demo_user` user from
312
+
`environments/$ENV/inventory/group_vars/all/basic_users.yml`. Replace the
313
+
`hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml`
314
+
with an appropriately configured user.
320
315
321
316
- Consider whether having (read-only) access to Grafana without login is OK. If
322
317
not, remove `grafana_auth_anonymous` in
@@ -325,15 +320,21 @@ Once it completes you can log in to the cluster using:
325
320
- A production deployment may have a more complex networking requirements than
326
321
just a simple network. See the [networks docs](networks.md) for details.
327
322
328
-
- If floating IPs are required for login nodes, create these in OpenStack and add the IPs into
329
-
the OpenTofu `login` definition.
330
-
331
-
- Consider enabling topology aware scheduling. This is currently only supported if your cluster does not include any baremetal nodes. This can be enabled by:
332
-
1. Creating Availability Zones in your OpenStack project for each physical rack
333
-
2. Setting the `availability_zone` fields of compute groups in your OpenTofu configuration
334
-
3. Adding the `compute` group as a child of `topology` in `environments/$ENV/inventory/groups`
335
-
4. (Optional) If you are aware of the physical topology of switches above the rack-level, override `topology_above_rack_topology` in your groups vars
336
-
(see [topology docs](../ansible/roles/topology/README.md) for more detail)
323
+
- If floating IPs are required for login nodes, create these in OpenStack and
324
+
add the IPs into the OpenTofu `login` definition.
325
+
326
+
- Consider enabling topology aware scheduling. This is currently only supported
327
+
if your cluster does not include any baremetal nodes. This can be enabled by:
328
+
1. Creating Availability Zones in your OpenStack project for each physical
329
+
rack
330
+
2. Setting the `availability_zone` fields of compute groups in your
331
+
OpenTofu configuration
332
+
3. Adding the `compute` group as a child of `topology` in
333
+
`environments/$ENV/inventory/groups`
334
+
4. (Optional) If you are aware of the physical topology of switches above
335
+
the rack-level, override `topology_above_rack_topology` in your groups
336
+
vars (see [topology docs](../ansible/roles/topology/README.md) for more
337
+
detail)
337
338
338
339
- Consider whether mapping of baremetal nodes to ironic nodes is required. See
0 commit comments