Skip to content

Conversation

@cabutlermit
Copy link
Contributor

@cabutlermit cabutlermit commented Nov 3, 2025

Purpose and background context

The prior PR was just about getting a functional ZIP file into AWS for initial testing in CloudWatch Synthetic Canaries with lots of manual steps. This PR is all about automating the build process for our Dev environment (and prepping for the Stage and Prod deployment workflows).

There are NO changes to the Javascript application itself; this is all about deployment.

Examples

If you would like to see this in action, there are two deployed Canaries in Dev1, one that is for testing and one that is part of the APT infrastructure. In Dev1, navigate to CloudWatch and look for the Synthetic Canaries in the left navigation bar (CloudWatch Synthetic Canaries). The js-check-lambda-function-url was built by the mitlib-tf-sandbox-examples repository as a PoC. The apt-dev-function-url-check was built by the mitlib-tf-workloads-apt repository as the first real use of a Canary.

If you navigate to CloudWatch Metrics (All metrics) you will see the CloudWatchSynthetics custom namespace and you can drill into that to see the metrics that are produced by the Canaries.

How can a reviewer manually see the effects of these changes?

  1. Checkout the branch and work through the various make targets, both at the root of the repository and in the lambda_function_url_checker folder. All of the make commands should work with no errors.
  2. Following the instructions in the main README, test the generate-matrix.sh script locally to see that it will generate the correct set of folders (just one) to pass to the 2nd job in the workflow.
  3. Check the Actions tab on this repository and review the actions that have run for this branch.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Why these changes are being introduced:
Some testing was done in a separate feature branch to work out the
details of how we will handle automated deployment using GitHub Actions.
It turns out that a simpler repository folder structure will make this
a little easier.

How this addresses that need:
* Remove the `nodejs` subfolder from the lambda_function_url_checker
and flatten the files/folders slightly
* Update the test code to find all the dependencies correctly in the
new folder structure
* Update the package.json to reflect the new file/folder structure
* Create a .runtime file for the developer to designate the expected
runtime for the application in CloudWatch Synthetic Canary
* Create a Makefile in the application folder for application-specific
commands/targets
* Update the main Makefile to simply call subfolder Makefiles instead
* Clean up the CI workflow file slightly (addressing some typos) and
set the trigger for PRs to main (and manual triggers in the UI)

Side effects of this change:
None.

Related Jira Tickets:
* https://mitlibraries.atlassian.net/browse/IN-1532
Why these changes are being introduced:
This is the first pass and a deployment workflow in GitHub Actions to
push the ZIP file to the S3 bucket and set SSM Parameter Store
parameters so that the Terraform repository using this function can
set the correct values.

How this is being implemented:
* Create a new dev-deploy.yml workflow
* Create a new generate-matrix.sh bash script to generate the matrix of
deployment jobx

More details on the dev-deploy workflow:
* Use the generate-matrix.sh script to generate the list of top level
folders that have changes
* For each folder list in the matrix, run a deployment job to zip the
application, push the ZIP to the shared-files S3 bucket, and update two
SSM Parameter Store parameters with the key to the ZIP and the expected
runtime in CloudWatch Synthetic Canaries

More details on the generate-matrix.sh script:
* This was written with assistance from GitHub Copilot (but is mostly
an update to the matrix method that was first used by DLS in the
archived mitlib-terraform repository)
* It was moved to a shell script to allow for easier local testing to
ensure that it was actually doing what was expected without having to
run it multiple times in GitHub as an Actions
* It creates a list of top-level folders that have changes (from a
git diff), accounting for the various ways the Action could be running
(first push on new branch, triggered by a PR, triggered manually in the
GitHub UI)
* It exports the list to the GITHUB_OUTPUT environment so that the
Action can use this information to build the matrix to run for as many
folders have updated code

Side effects:
None.

Related Jira tickets:
* https://mitlibraries.atlassian.net/browse/IN-1531
Why these changes are being introduced:
To make testing easier in the near future for the stage & prod
deployment workflows, we create stub workflows for now so that they
will exist in the `main` branch and allow for easier testing when we
finalize the steps.

How this addresses that need:
* Create a stage-deploy.yml workflow with just a stub set of steps
* Create a prod-deploy.yml workflow with just a stub set of steps

These two workflows will generate a matrix and also attempt to log in
to AWS with OIDC but will not create any ZIP files.

Side effects of this change:
None.
* Update the main README with more context and details for this repo
* Update the lambda_function_url_checker README
@cabutlermit cabutlermit requested a review from ghukill November 4, 2025 17:16
@cabutlermit cabutlermit marked this pull request as ready for review November 4, 2025 17:16
@cabutlermit
Copy link
Contributor Author

@ghukill I figured you might enjoy a little break from USE to take a look at this repository for building ZIP files for Synthetic Canaries. Might also be worth looking at the first PR in this repo that Jeremy reviewed (for a little historical perspective).

Hopefully the PR description is sufficient, but happy to have a detailed conversation about this here in the PR...

@ghukill ghukill self-assigned this Nov 6, 2025
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very near a full approval, but left a couple of README suggestions and a question about making the lambda URL callable locally for testing and development.

But I must say, really like the overall architecture. At least for me, it was critical to understand that each canary may cover multiple projects, and that "all" this repository does is define them and then build zip files per Github actions. Nice work!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you could perform chmod +x on the file and git commit that? I had to do that locally to get the script to run and I see it performed here in the actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that. I couldn't remember if the "execute" flag was passed properly into GHA (it probably is, but I should probably test this more thoroughty).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run chmod +x on the script, so I'll be curious to see if that is preserved through git push and git pull fo starters. I'm not quite ready to remove that line from the workflow yet...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you provided an overview and context, I think this README reads really well. It's dense, but the information one needs is there.

If I had one suggestion -- something I think would have helped me make a critical leap of understanding -- it would be to add a "Development" section. In this section, I think you could touch on some things:

  • creating a new "canary" is as simple as creating a root level folder (repeating, but that's okay)
  • each canary may perform checking for one or many applications; it provides a method but is agnostic to what application it's used for
  • when this repository builds, we update any modified canary zip file(s) in S3, thereby providing updating builds for the actual project's terraforms.

You likely say all this in the README already, but I just have this hunch that a "Development" section could be helpful. I'd hope to walk away from that section knowing:

  1. canaries are potentially one-to-many for applications
  2. I'm basically building a little canary lambda blueprint (zip file) that other terraform projects will utilize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- this is exactly the kind of feedback I was hoping for: where are the gaps in the framing information so that it will be easier for someone to pick this up reasonably easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README updated. Curious to know if the new version has sufficient context.

Comment on lines 5 to 7
## Overview

This is a CloudWatch Synthetic Canary application that is Javasctip/Nodejs/Puppeteer-based. The following file structure is expected for an application like this:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might propose in the "Overview" section here, explaining what this canary does. I realize the first line basicaslly explains this,

A generic-ish CloudWatch Synthetic Canary to health-check Lambda Function URLs using HTTP POST.

.... but I still think it could be helpful. Might be an opportunity to have language like,

"This canary can be used to test any deployed Lambda that has a Function URL. This is acheived by making an HTTP post request to the lambda's Function URL, with the URL and payload configured by env vars in that lambda's terraform project. Broadly speaking, a 200 response is all that's required to demonstrate the lambda is not idle."

Again, maybe this is touched on elsewhere? But I think it could be helpful in the first couple lines you encounter with the README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README updated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here on the lambda_function_url_checker's main index.js file.

As mentioned in our discussion, I wonder if it might be worth making this thing testable locally? Less about the synthetics + metrics intergrations, and more about the Lambda Function URL requesting.

For example, the index.js file could pickup when WORKSPACE=local and then skip all the synthetics and metrics stuff.

I threw this at AI and got back an updated index.js that did just that, with what look like pretty minimal changes. I was able to confirm this for the Dev1 APT lambda:

Env vars in .env file:

NODE_ENV=test
WORKSPACE=local
CANARY_URL=<URL HERE>
CANARY_PAYLOAD={"action":"ping","challenge_secret":"<SECRET HERE>","verbose":"true"}

Invoked:

WORKSPACE=local node lambda_function_url_checker/index.js
# [[email protected]] injecting env (3) from .env -- tip: 🔐 encrypt with Dotenvx: https://dotenvx.com
# Checking Lambda Function URL: <URL HERE> with payload {"action":"ping","challenge_secret":"<SECRET>","verbose":"true"}
# Received HTTP status code: 200
# Check completed in 154 ms

This might even lend itself to a Makefile command then, which is entirely driven by the env vars present (just like in terraform / deployed context!).

Please consider this purely optional for the PR itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a shot with some simple conditionals. It's feels slightly klugdy, but it was the most obvious path forward to me (you should have seen the over-engineered attempts that the LLMs were proposing...). And, there is now a make local-exec to run it locally.

Why these changes are being introduced:
This should address most of the issues raised in the PR conversation
and code comments.

How this addresses that need:
* Update the root README
* Update the lambda_function_url_checker README
* Update the generate-matrix.sh with `chmod +x` to default it to
executable
* Update index.js to allow for a local run (with some conditional logic
to bypass Synthetics when running locally)
* Update the lambda_function_url_checker Makefile with a local-exec
target that will run the app locally

Side effects of this change:
None.
@cabutlermit cabutlermit requested a review from ghukill November 7, 2025 20:59
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the additions.

I was also able to run the make local-exec for the function URL pinger canary.

Nice work!

Comment on lines +21 to +40
* Creating a new canary is as simple as creating a root level folder
* A single canary may perform checking for one or many applications; it provides a method but is agnostic to what application it's used to check
* Each canary in this repository (that is, each top-level folder) will have a few hard requirements:
* A `.runtime` file in the folder that designates the expected Synthetics Canary runtime (see [AWS: Synthetics runtime versions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_Library.html))
* A `Makefile` with (at a minimum) `install`, `update`, and `test` targets (as these are expected by the root [Makefile](./Makefile))
* A `zipmanifest.txt` file that designates how the ZIP will be built for loading into a Synthetics Canary (different Synthetic Canary runtimes have different requirements for the structure of the ZIP file)

## Deployment Overview

Deploying (or building) in this repository simply means copying a ZIP file of an application to a shared S3 bucket in AWS. That ZIP file is like a blueprint (or template) application that can be loaded into a Synthetic Canary that is built by Terraform to monitor an HTTP/HTTPS endpoint. It will be helpful to have both a `zip` and a `dev-publish` targets in the canary folder `Makefile` so that it will be easy for the developer to deploy the ZIP to Dev1 for testing in AWS.

The rest of the deployment process is handled by GitHub Actions, following the MITL typical GitHub-flow:

* Opening a PR to `main` will deploy the ZIP to our dev AWS account (and create/update SSM Parameters)
* Merging a PR to `main` will deploy the ZIP to our stage AWS account (and create/update SSM Parameters)
* Tagging a release on `main` will deploy the ZIP to our prod AWS account (and create/update SSM Parameters)

## General Overview

The general idea of this repository is to de-couple the actual CloudWatch Synthetic Canary application code from the Terraform repositories that will deploy the Canary to monitor web services. Since the application code is decoupled from the Terraform repositories, it means that we can write fewer applications that are more generic (they are like blueprints or templates). We can customize the generic applications in each Terraform repository that builds the Canary infrastructure by using environment variables that can be set by Terraform. This requires that we export some critical information from the deployment process for this repository to SSM Parameter Store so that the dependent Terraform repositories can read this information to build the infrastructure for the Canary correctly.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:chef's kiss:

Thanks for adding this! I feel like it's filling the gap I had on my first pass.

Comment on lines +90 to +94
if (process.env.WORKSPACE === "local") {
await _checkUrl(url, data);
} else {
await synthetics.executeStep("check_lambda_function_url", async () => {
await _checkUrl(url, data);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! This is feeling like all we need for a quick developer test locally.

@cabutlermit cabutlermit merged commit 8fe0da8 into main Nov 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants