feat(monitoring): Add failure alerts for Cloud Run jobs #1894

jcscottiii · 2025-10-03T21:02:26Z

This commit introduces a new monitoring alert policy for Cloud Run jobs managed by the single_stage_go_workflow module.

The alert will fire with an "ERROR" severity if a job execution fails and there are no successful executions for that job within a 23-hour period. This helps to proactively detect and respond to persistent job failures and we don't have to rely on the existing dashboard.

To support this, a notification_channel_ids variable has been added and plumbed through the Terraform configuration to specify where alerts should be sent.

Additionally, this commit includes the following changes:

Adds a tf-lint command to the Makefile to validate Terraform configurations.
Removes the unused workflows/web-features-repo/workflows.yaml.tftpl file.

This commit introduces a new monitoring alert policy for Cloud Run jobs managed by the single_stage_go_workflow module. The alert will fire with an "ERROR" severity if a job execution fails and there are no successful executions for that job within a 23-hour period. This helps to proactively detect and respond to persistent job failures and we don't have to rely on the existing dashboard. To support this, a notification_channel_ids variable has been added and plumbed through the Terraform configuration to specify where alerts should be sent. Additionally, this commit includes the following changes: - Adds a tf-lint command to the Makefile to validate Terraform configurations. - Removes the unused workflows/web-features-repo/workflows.yaml.tftpl file.

jcscottiii added 2 commits October 3, 2025 21:01

lint fix

c566198

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(monitoring): Add failure alerts for Cloud Run jobs #1894

feat(monitoring): Add failure alerts for Cloud Run jobs #1894

Uh oh!

jcscottiii commented Oct 3, 2025

Uh oh!

Uh oh!

feat(monitoring): Add failure alerts for Cloud Run jobs #1894

Are you sure you want to change the base?

feat(monitoring): Add failure alerts for Cloud Run jobs #1894

Uh oh!

Conversation

jcscottiii commented Oct 3, 2025

Uh oh!

Uh oh!