Skip to content

Conversation

jcscottiii
Copy link
Collaborator

This commit introduces a new monitoring alert policy for Cloud Run jobs managed by the single_stage_go_workflow module.

The alert will fire with an "ERROR" severity if a job execution fails and there are no successful executions for that job within a 23-hour period. This helps to proactively detect and respond to persistent job failures and we don't have to rely on the existing dashboard.

To support this, a notification_channel_ids variable has been added and plumbed through the Terraform configuration to specify where alerts should be sent.

Additionally, this commit includes the following changes:

  • Adds a tf-lint command to the Makefile to validate Terraform configurations.
  • Removes the unused workflows/web-features-repo/workflows.yaml.tftpl file.

This commit introduces a new monitoring alert policy for Cloud Run jobs managed by the single_stage_go_workflow module.

The alert will fire with an "ERROR" severity if a job execution fails and there are no successful executions for that job within a 23-hour period. This helps to proactively detect and respond to persistent job failures and we don't have to rely on the existing dashboard.

To support this, a notification_channel_ids variable has been added and plumbed through the Terraform configuration to specify where alerts should be sent.

Additionally, this commit includes the following changes:

- Adds a tf-lint command to the Makefile to validate Terraform configurations.
- Removes the unused workflows/web-features-repo/workflows.yaml.tftpl file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant