Skip to content

Add container cleanup service on stop#732

Open
susienme wants to merge 15 commits intogoogle:mainfrom
susienme:sigterm-to-containers
Open

Add container cleanup service on stop#732
susienme wants to merge 15 commits intogoogle:mainfrom
susienme:sigterm-to-containers

Conversation

@susienme
Copy link
Copy Markdown
Collaborator

@susienme susienme commented Apr 1, 2026

Create a systemd service to send SIGTERM to running containerd tasks during shutdown/service stop to allow for graceful termination.

@Sibcgh
Copy link
Copy Markdown
Collaborator

Sibcgh commented Apr 1, 2026

/gcbrun

@Sibcgh Sibcgh requested a review from alexmwu April 2, 2026 21:26
@susienme susienme requested review from jkl73 April 2, 2026 22:05
@jkl73
Copy link
Copy Markdown
Contributor

jkl73 commented Apr 3, 2026

Could you add an image test in "https://github.com/google/go-tpm-tools/blob/main/launcher/image/test"

we may also need a new test container workload container that can catch the signal.

Copy link
Copy Markdown
Contributor

@alexmwu alexmwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread launcher/image/container-cleanup.service Outdated
Comment thread launcher/image/container-cleanup.service Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't do this from the launcher? https://github.com/containerd/containerd/blob/v1.7.30/process.go#L33-L54

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to not rely on the launcher because "container process and the runner process are mostly detached, we want the container process to continue running even the runner process get terminated for whatever reasons." (from the internal bug)

In other words, we consider the case when the launcher is not running.

susienme added 15 commits April 13, 2026 15:02
Create a systemd service to send SIGTERM to running containerd tasks
during shutdown/service stop to allow for graceful termination.
Also delete the comments added in the previous commit
When all workloads handle SIGTERM before the timeout, the system
can stop earlier.
Add a new integration test to verify that a workload receives SIGTERM for a graceful shutdown.
Map substitutions to environment variables in `script` blocks to prevent them from evaluating to empty strings and causing exit code 2 errors in `gcloud` commands.
Also, increase the timeout to 10 minutes
Running a UDP listener inside a Confidential Space (CS) VM is complex because the hardened environment restricts direct access to host devices (like `/dev/ttyS0` used to dump logs) and prevents standard troubleshooting of container networking.

To simplify the test, this commit switches the monitor to a standard GCE VM. The VM dynamically selects the latest x86_64 Debian image and the smallest available machine type meeting the minimum requirements of 2 vCPUs and 1 GB of memory (equivalent to `e2-micro`).
Dynamic lookup failed because some machine families (like `e4`) are not available within the project due to quota limits.
Pass `tee-cmd` as an array to fix the parsing error.
- Added `allow_cmd_override` label to the workload Dockerfile to permit `tee-cmd` usage.
- Redirected the `container-cleanup.service` output to `/dev/ttyS0` for serial console visibility.
…se power button press

As the hardened image lacks `systemd-logind`, the new service, power button listener, takes the responsibility of watching /dev/input/eventX which was previously done by `logind`. When it detects a power button press, or VM stop, it triggers systemd to stop services, including the listener itself. Then, it uses the service's ExecStop script to send SIGTERM to all containers.
@susienme susienme force-pushed the sigterm-to-containers branch from 9cb862b to df36b2c Compare April 13, 2026 22:32
@susienme
Copy link
Copy Markdown
Collaborator Author

re: the comment about the image test

During end-to-end testing, we found that the previous approach (solely using an ExecStop script) did not work in the hardened image because nothing was watching for power button press events. This is typically handled by systemd-logind (including in the debug image), but that service is not running in the hardened image.

To address this issue, we introduced the power button listener service which identifies the correct event device, listens to it, and sends a power-off event to systemd so that it can propagate the shutdown. The listener service also receives this event from systemd and executes the ExecStop script, which in turn sends SIGTERM to all containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants