Skip to content

Commit ac6b256

Browse files
committed
Formatting changes
1 parent cda5d97 commit ac6b256

File tree

1 file changed

+35
-25
lines changed

1 file changed

+35
-25
lines changed

docs/production.md

Lines changed: 35 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -108,16 +108,17 @@ requires instance deletion/recreation.
108108

109109
### Environments structure
110110

111-
At least two environments should be created using cookiecutter, which will derive from the `site` base environment:
111+
At least two environments should be created using cookiecutter, which will
112+
derive from the `site` base environment:
112113
- `production`: production environment
113114
- `staging`: staging environment
114115

115116
A `dev` environment should also be created if considered required, or this can
116117
be left until later.
117118

118-
In general only the `inventory/groups` file in the `site` environment is needed -
119-
it can be modified as required to
120-
enable features for all environments at the site.
119+
In general only the `inventory/groups` file in the `site` environment is needed
120+
- it can be modified as required to enable features for all environments at the
121+
site.
121122

122123
To avoid divergence of configuration all possible overrides for group/role
123124
vars should be placed in `environments/site/inventory/group_vars/all/*.yml`
@@ -135,7 +136,8 @@ and referenced from the `site` and `production` environments, e.g.:
135136
136137
When setting OpenTofu configurations:
137138
- Environment-specific variables (`cluster_name`) should be hardcoded
138-
as arguments into the cluster module block at `environments/$ENV/tofu/main.tf`.
139+
as arguments into the cluster module block at
140+
`environments/$ENV/tofu/main.tf`.
139141
- Environment-independent variables (e.g. maybe `cluster_net` if the
140142
same is used for staging and production) should be set as *defaults*
141143
in `environments/site/tofu/variables.tf`, and then don't need to
@@ -278,13 +280,13 @@ Once it completes you can log in to the cluster using:
278280
either for a specific environment within the cluster module block in
279281
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
280282
default in `environments/site/tofu/variables.tf`.
281-
283+
282284
For a development environment allowing OpenTofu to manage the volumes using
283285
the default value of `"manage"` for those varibles is usually appropriate, as
284286
it allows for multiple clusters to be created with this environment.
285-
286-
If no home volume at all is required because the home directories are provided
287-
by a parallel filesystem (e.g. manila) set
287+
288+
If no home volume at all is required because the home directories are
289+
provided by a parallel filesystem (e.g. manila) set
288290

289291
home_volume_provisioning = "none"
290292

@@ -302,21 +304,23 @@ Once it completes you can log in to the cluster using:
302304

303305
- Consider whether Prometheus storage configuration is required. By default:
304306
- A 200GB state volume is provisioned (but see above)
305-
- The common environment [sets](../environments/common/inventory/group_vars/all/prometheus.yml)
306-
a maximum retention of 100 GB and 31 days
307+
- The common environment
308+
[sets](../environments/common/inventory/group_vars/all/prometheus.yml) a
309+
maximum retention of 100 GB and 31 days.
307310
These may or may not be appropriate depending on the number of nodes, the
308311
scrape interval, and other uses of the state volume (primarily the `slurmctld`
309-
state and the `slurmdbd` database). See [docs/monitoring-and-logging](./monitoring-and-logging.md)
310-
for more options.
312+
state and the `slurmdbd` database). See
313+
[docs/monitoring-and-logging](./monitoring-and-logging.md) for more options.
311314

312315
- Configure Open OnDemand - see [specific documentation](openondemand.md) which
313316
notes specific variables required.
314317

315318
- Configure Open OnDemand - see [specific documentation](openondemand.md).
316319

317-
- Remove the `demo_user` user from `environments/$ENV/inventory/group_vars/all/basic_users.yml`.
318-
Replace the `hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml` with
319-
an appropriately configured user.
320+
- Remove the `demo_user` user from
321+
`environments/$ENV/inventory/group_vars/all/basic_users.yml`. Replace the
322+
`hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml`
323+
with an appropriately configured user.
320324

321325
- Consider whether having (read-only) access to Grafana without login is OK. If
322326
not, remove `grafana_auth_anonymous` in
@@ -325,15 +329,21 @@ Once it completes you can log in to the cluster using:
325329
- A production deployment may have a more complex networking requirements than
326330
just a simple network. See the [networks docs](networks.md) for details.
327331

328-
- If floating IPs are required for login nodes, create these in OpenStack and add the IPs into
329-
the OpenTofu `login` definition.
330-
331-
- Consider enabling topology aware scheduling. This is currently only supported if your cluster does not include any baremetal nodes. This can be enabled by:
332-
1. Creating Availability Zones in your OpenStack project for each physical rack
333-
2. Setting the `availability_zone` fields of compute groups in your OpenTofu configuration
334-
3. Adding the `compute` group as a child of `topology` in `environments/$ENV/inventory/groups`
335-
4. (Optional) If you are aware of the physical topology of switches above the rack-level, override `topology_above_rack_topology` in your groups vars
336-
(see [topology docs](../ansible/roles/topology/README.md) for more detail)
332+
- If floating IPs are required for login nodes, create these in OpenStack and
333+
add the IPs into the OpenTofu `login` definition.
334+
335+
- Consider enabling topology aware scheduling. This is currently only supported
336+
if your cluster does not include any baremetal nodes. This can be enabled by:
337+
1. Creating Availability Zones in your OpenStack project for each physical
338+
rack
339+
2. Setting the `availability_zone` fields of compute groups in your
340+
OpenTofu configuration
341+
3. Adding the `compute` group as a child of `topology` in
342+
`environments/$ENV/inventory/groups`
343+
4. (Optional) If you are aware of the physical topology of switches above
344+
the rack-level, override `topology_above_rack_topology` in your groups
345+
vars (see [topology docs](../ansible/roles/topology/README.md) for more
346+
detail)
337347

338348
- Consider whether mapping of baremetal nodes to ironic nodes is required. See
339349
[PR 485](https://github.com/stackhpc/ansible-slurm-appliance/pull/485).

0 commit comments

Comments
 (0)