Skip to content

Commit 95005f9

Browse files
authored
Address various issues with production docs (#747)
* doc prom storage * doc ondemand certs * decribe site groups and groups upgrade * note extra image build needs config updating for new image
1 parent 64ddd42 commit 95005f9

File tree

3 files changed

+29
-3
lines changed

3 files changed

+29
-3
lines changed

docs/openondemand.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,13 @@ The above functionality is configured by running the `ansible/portal.yml` playbo
3131
See the [ansible/roles/openondemand/README.md](../ansible/roles/openondemand/README.md) for more details on the variables described below.
3232

3333
The following variables have been given default values to allow Open OnDemand to work in a newly created environment without additional configuration, but generally should be overridden in `environments/site/inventory/group_vars/all/` with site-specific values:
34-
- `openondemand_servername` - this must be defined for both `openondemand` and `grafana` hosts (when Grafana is enabled). Default is `ansible_host` (i.e. the IP address) of the first host in the `openondemand` group.
34+
- `openondemand_servername` - this must be defined for both `openondemand` and
35+
`grafana` hosts (when Grafana is enabled). The default is `ansible_host` (i.e.
36+
the IP address) of the first host in the `openondemand` group. For production
37+
environments this should probably be a DNS name.
38+
- `openondemand_ssl_cert` and `openondemand_ssl_cert_key` - by default a
39+
self-signed certificate is generated, which should probably be replaced for
40+
production environments.
3541
- `openondemand_auth` and any corresponding options. Defaults to `basic_pam`.
3642
- `openondemand_desktop_partition` and `openondemand_jupyter_partition` if the corresponding inventory groups are defined. Defaults to the first compute group defined in the `compute` OpenTofu variable in `environments/$ENV/tofu`.
3743

docs/production.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ production-ready deployments.
2424
inventory = ../common/inventory,../site/inventory,inventory
2525
```
2626

27+
In general only the `site` environment will need an `inventory/groups` file -
28+
this is templated out by cookiecutter and should be modified as required to
29+
enable features for all environments at the site.
30+
2731
- To avoid divergence of configuration all possible overrides for group/role
2832
vars should be placed in `environments/site/inventory/group_vars/all/*.yml`
2933
unless the value really is environment-specific (e.g. DNS names for
@@ -127,7 +131,17 @@ and referenced from the `site` and `production` environments, e.g.:
127131
set the "attach" options and run `tofu apply` again - this should show there
128132
are no changes planned.
129133
130-
- Configure Open OnDemand - see [specific documentation](openondemand.md).
134+
- Consider whether Prometheus storage configuration is required. By default:
135+
- A 200GB state volume is provisioned (but see above)
136+
- The common environment [sets](../environments/common/inventory/group_vars/all/prometheus.yml)
137+
a maximum retention of 100 GB and 31 days
138+
These may or may not be appropriate depending on the number of nodes, the
139+
scrape interval, and other uses of the state volume (primarily the `slurmctld`
140+
state and the `slurmdbd` database). See [docs/monitoring-and-logging](./monitoring-and-logging.md)
141+
for more options.
142+
143+
- Configure Open OnDemand - see [specific documentation](openondemand.md) which
144+
notes specific variables required.
131145
132146
- Remove the `demo_user` user from `environments/$ENV/inventory/group_vars/all/basic_users.yml`
133147

docs/upgrades.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ All other commands should be run on the Ansible deploy host.
5050
site-specific configuration. In general changes to existing functionality will aim to be
5151
backward compatible. Alteration of site-specific configuration will usually only be
5252
necessary to use new functionality or where functionality has been upstreamed as above.
53+
Note that the `environments/common/layouts/everything` file contains all possible
54+
groups which can be used to enable features; diff this against your e.g.
55+
`environments/site/inventory/groups` file to see new features which you may
56+
wish to enable in the latter file.
5357

5458
Make changes as necessary.
5559

@@ -60,7 +64,9 @@ All other commands should be run on the Ansible deploy host.
6064

6165
Note that some releases may not include new images. In this case use the image from the latest previous release with new images.
6266

63-
1. If required, build an "extra" image with local modifications, see [docs/image-build.md](./image-build.md).
67+
1. If an "extra" image build with local modifications is required, update the
68+
Packer build configuration to use the above new image and run a build. See
69+
[docs/image-build.md](./image-build.md).
6470

6571
1. Modify your site-specific environment to use this image, e.g. via `cluster_image_id` in `environments/$SITE_ENV/tofu/variables.tf`.
6672

0 commit comments

Comments
 (0)