Skip to content

Commit 5aed4a0

Browse files
committed
Formatting changes
1 parent cda5d97 commit 5aed4a0

File tree

1 file changed

+33
-32
lines changed

1 file changed

+33
-32
lines changed

docs/production.md

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -108,16 +108,17 @@ requires instance deletion/recreation.
108108

109109
### Environments structure
110110

111-
At least two environments should be created using cookiecutter, which will derive from the `site` base environment:
111+
At least two environments should be created using cookiecutter, which will
112+
derive from the `site` base environment:
112113
- `production`: production environment
113114
- `staging`: staging environment
114115

115116
A `dev` environment should also be created if considered required, or this can
116117
be left until later.
117118

118-
In general only the `inventory/groups` file in the `site` environment is needed -
119-
it can be modified as required to
120-
enable features for all environments at the site.
119+
In general only the `inventory/groups` file in the `site` environment is needed
120+
- it can be modified as required to enable features for all environments at the
121+
site.
121122

122123
To avoid divergence of configuration all possible overrides for group/role
123124
vars should be placed in `environments/site/inventory/group_vars/all/*.yml`
@@ -133,14 +134,6 @@ and referenced from the `site` and `production` environments, e.g.:
133134
import_playbook: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}/../site/hooks/pre.yml"
134135
```
135136
136-
When setting OpenTofu configurations:
137-
- Environment-specific variables (`cluster_name`) should be hardcoded
138-
as arguments into the cluster module block at `environments/$ENV/tofu/main.tf`.
139-
- Environment-independent variables (e.g. maybe `cluster_net` if the
140-
same is used for staging and production) should be set as *defaults*
141-
in `environments/site/tofu/variables.tf`, and then don't need to
142-
be passed in to the module.
143-
144137
OpenTofu configurations should be defined in the `site` environment and used
145138
as a module from the other environments. This can be done with the
146139
cookie-cutter generated configurations:
@@ -278,13 +271,13 @@ Once it completes you can log in to the cluster using:
278271
either for a specific environment within the cluster module block in
279272
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
280273
default in `environments/site/tofu/variables.tf`.
281-
274+
282275
For a development environment allowing OpenTofu to manage the volumes using
283276
the default value of `"manage"` for those varibles is usually appropriate, as
284277
it allows for multiple clusters to be created with this environment.
285-
286-
If no home volume at all is required because the home directories are provided
287-
by a parallel filesystem (e.g. manila) set
278+
279+
If no home volume at all is required because the home directories are
280+
provided by a parallel filesystem (e.g. manila) set
288281

289282
home_volume_provisioning = "none"
290283

@@ -302,21 +295,23 @@ Once it completes you can log in to the cluster using:
302295

303296
- Consider whether Prometheus storage configuration is required. By default:
304297
- A 200GB state volume is provisioned (but see above)
305-
- The common environment [sets](../environments/common/inventory/group_vars/all/prometheus.yml)
306-
a maximum retention of 100 GB and 31 days
298+
- The common environment
299+
[sets](../environments/common/inventory/group_vars/all/prometheus.yml) a
300+
maximum retention of 100 GB and 31 days.
307301
These may or may not be appropriate depending on the number of nodes, the
308302
scrape interval, and other uses of the state volume (primarily the `slurmctld`
309-
state and the `slurmdbd` database). See [docs/monitoring-and-logging](./monitoring-and-logging.md)
310-
for more options.
303+
state and the `slurmdbd` database). See
304+
[docs/monitoring-and-logging](./monitoring-and-logging.md) for more options.
311305

312306
- Configure Open OnDemand - see [specific documentation](openondemand.md) which
313307
notes specific variables required.
314308

315309
- Configure Open OnDemand - see [specific documentation](openondemand.md).
316310

317-
- Remove the `demo_user` user from `environments/$ENV/inventory/group_vars/all/basic_users.yml`.
318-
Replace the `hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml` with
319-
an appropriately configured user.
311+
- Remove the `demo_user` user from
312+
`environments/$ENV/inventory/group_vars/all/basic_users.yml`. Replace the
313+
`hpctests_user` in `environments/$ENV/inventory/group_vars/all/hpctests.yml`
314+
with an appropriately configured user.
320315

321316
- Consider whether having (read-only) access to Grafana without login is OK. If
322317
not, remove `grafana_auth_anonymous` in
@@ -325,15 +320,21 @@ Once it completes you can log in to the cluster using:
325320
- A production deployment may have a more complex networking requirements than
326321
just a simple network. See the [networks docs](networks.md) for details.
327322

328-
- If floating IPs are required for login nodes, create these in OpenStack and add the IPs into
329-
the OpenTofu `login` definition.
330-
331-
- Consider enabling topology aware scheduling. This is currently only supported if your cluster does not include any baremetal nodes. This can be enabled by:
332-
1. Creating Availability Zones in your OpenStack project for each physical rack
333-
2. Setting the `availability_zone` fields of compute groups in your OpenTofu configuration
334-
3. Adding the `compute` group as a child of `topology` in `environments/$ENV/inventory/groups`
335-
4. (Optional) If you are aware of the physical topology of switches above the rack-level, override `topology_above_rack_topology` in your groups vars
336-
(see [topology docs](../ansible/roles/topology/README.md) for more detail)
323+
- If floating IPs are required for login nodes, create these in OpenStack and
324+
add the IPs into the OpenTofu `login` definition.
325+
326+
- Consider enabling topology aware scheduling. This is currently only supported
327+
if your cluster does not include any baremetal nodes. This can be enabled by:
328+
1. Creating Availability Zones in your OpenStack project for each physical
329+
rack
330+
2. Setting the `availability_zone` fields of compute groups in your
331+
OpenTofu configuration
332+
3. Adding the `compute` group as a child of `topology` in
333+
`environments/$ENV/inventory/groups`
334+
4. (Optional) If you are aware of the physical topology of switches above
335+
the rack-level, override `topology_above_rack_topology` in your groups
336+
vars (see [topology docs](../ansible/roles/topology/README.md) for more
337+
detail)
337338

338339
- Consider whether mapping of baremetal nodes to ironic nodes is required. See
339340
[PR 485](https://github.com/stackhpc/ansible-slurm-appliance/pull/485).

0 commit comments

Comments
 (0)