Skip to content

Conversation

@adamthomason
Copy link
Contributor

Description of changes:

When using jailer with Daemonize: true, the jailer parent process exits immediately after forking the child firecracker process. This caused a race condition where the SDK's process monitoring goroutine would detect the parent exit and send to errCh before waitForSocket() could establish the socket connection, causing startVMM to fail prematurely.

This change modifies the process monitoring logic to treat a clean exit (status=0) of a daemonized jailer parent as expected behavior rather than an error condition. The goroutine now returns early in this case, allowing waitForSocket() and other initialization logic to complete normally.

The actual firecracker child process continues running in daemon mode and creates the API socket as expected.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

When using jailer with Daemonize: true, the jailer parent process exits
immediately after forking the child firecracker process. This caused a
race condition where the SDK's process monitoring goroutine would detect
the parent exit and send to errCh before waitForSocket() could establish
the socket connection, causing startVMM to fail prematurely.

This change modifies the process monitoring logic to treat a clean exit
(status=0) of a daemonized jailer parent as expected behavior rather than
an error condition. The goroutine now returns early in this case, allowing
waitForSocket() and other initialization logic to complete normally.

The actual firecracker child process continues running in daemon mode and
creates the API socket as expected.

Signed-off-by: Adam Thomason <[email protected]>
@adamthomason adamthomason requested a review from a team as a code owner July 23, 2025 10:17
…rom the Shutdown receiver to trigger startVMM to continue when using daemonized jailer process.

Signed-off-by: Adam Thomason <[email protected]>
@adamthomason
Copy link
Contributor Author

adamthomason commented Jul 24, 2025

The initial commit for this PR caused issues with cleanup, as the startVMM cleanup methods were never called due to immediate return. I've implemented an additional "shutdownCh" channel on the Machine struct which is closed when the Shtudown receiver is called.

Copy link
Contributor

@sondavidb sondavidb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the contribution! The changes look fine to me, ideally we'd add this to out testing suite but I can't really think of a good way to test this race condition. (Maybe someone else might have an idea for this?)

@adamthomason
Copy link
Contributor Author

Thanks for the review @sondavidb. Agreed, it would be nice for it to be added to tests but I too couldn't quite think of a nice way of doing this.

@sondavidb sondavidb merged commit ed6ff32 into firecracker-microvm:main Aug 18, 2025
18 checks passed
@adamthomason adamthomason deleted the fix/startvmm-jailer-race-condition branch August 21, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants