cmd, pkg/utils: Stop the container once the last session finishes #1679

debarshiray · 2025-07-06T12:14:07Z

Currently, once a Toolbx container gets started with podman start, as
part of the enter or run commands, it doesn't stop unless the host
is shut down or someone explicitly calls podman stop. This becomes
annoying if someone tries to remove the container because commands like
podman rm and such don't work without the --force flag, even if all
active enter and run sessions have ended.

A system of reference counting based on advisory file locks has been
used to automatically exit the container's entry point once all the
active sessions have ended. Two locks are used - a global lock that's
common for all containers, and a local lock that's specific to each
container. The initialization stamp file is conveniently used as the
local lock.

The enter and run sessions acquire shared file locks and the
container's entry point acquires ones that are exclusive. All attempts
at acquiring the locks are blocking unless otherwise noted.

The global lock is acquired at the beginning of enter and run before
they inspect the container, negotiate the path to the local lock (ie.,
the initialization stamp file) with the entry point and create it. Once
the local lock is known by enter and run, they acquire it and only
then release the global.

The Toolbx container's entry point tries to acquire the global lock as
it creates the initialization stamp file (ie., the local lock). This
waits for the enter and run invocations to receive the location of
the local lock, acquire it and release the global. Once the entry point
acquires the global lock, it releases it, and waits trying to acquire
the local lock.

This sequence of acquiring and releasing the locks lets the entry point
track the state of the enter and run invocations. It should only
try to acquire the local lock after the enter and run invocations
have acquired it before invoking podman exec.

The entry point is able to acquire the local lock after all enter and
run sessions end and release their local locks.

At this point, a new enter or run invocation might be in the process
of starting. Both sides need to be careful not to race against each
other and up in an invalid state. eg., a podman start being invoked
against a container whose entry point is just about to exit, or a
podman exec being invoked against a container whose entry point is
about to exit or has already exited.

Therefore, the entry point makes a non-blocking attempt to acquire the
global lock while holding the local. If it fails, then it's because a
new enter or run was invoked that is in the process of negotiating
the path to the local lock with the entry point. In this case, the
entry point releases the local lock and goes back trying to acquire the
global lock, as it did when creating the initialization stamp file (ie.,
the local lock). If it succeeds, then no new enter or run is in the
process of starting, and the entry point can exit.

If this system of reference counting is simplified to just the global
lock, then all the entry points of all Toolbx containers will exit only
after all the enter and 'run' sessions across all Toolbx containers
have ended. The local lock makes it possible to do this for each
container separately.

This system will not work without the global lock. It will cause a few
races if a new enter or run is invoked, just as the last of the
previous batch of sessions end, letting the entry point acquire the
local lock and prepare to exit.

Sometimes, a Toolbx container's entry point is started directly with
podman start, without going through the enter or run commands, for
debugging. Care was taken to detect this case by making a non-blocking
attempt to acquire the global lock from the entry point before creating
the initialization stamp file (ie., the local lock).

If it fails, then it's because an enter or run is waiting for the
container to get initialized by the entry point, and things proceed as
described above. If it succeeds, then it's because the entry point was
started directly. In this case, the entry point releases the global
lock, and adds a timeout after creating the initialization stamp file
before trying to acquire any other locks to give the user time to invoke
enter or run. A timeout of 25 seconds is used, as is the default
for D-Bus method calls [1] and when waiting for the entry point to
initialize the container.

[1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html

#114

softwarefactory-project-zuul · 2025-07-06T13:36:46Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/e263472a82e04155be04d544fadc8dfa

debarshiray · 2025-07-06T19:05:44Z

This pull request can't be merged until the work to enable Ptyxis to use toolbox(1) to enter Toolbx containers is finished (eg., #1675) and Ptyxis is switched over.

softwarefactory-project-zuul · 2025-07-06T20:08:08Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/8caa2c8c154c46cd9e24d8ea8db78061

softwarefactory-project-zuul · 2025-07-06T21:30:28Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/f93b77ecc3654e978971df1147215789

softwarefactory-project-zuul · 2025-07-06T22:54:47Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/59935450ad084353aca80efef5065f95

✔️ unit-test SUCCESS in 5m 31s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 35s
✔️ unit-test-restricted SUCCESS in 5m 45s
✔️ system-test-fedora-rawhide-commands-options SUCCESS in 1h 18m 37s
❌ system-test-fedora-rawhide-runtime-environment-arch-fedora FAILURE in 1h 05m 15s
✔️ system-test-fedora-rawhide-runtime-environment-ubuntu SUCCESS in 17m 09s
✔️ system-test-fedora-42-commands-options SUCCESS in 1h 15m 01s
✔️ system-test-fedora-42-runtime-environment-arch-fedora SUCCESS in 1h 04m 06s
✔️ system-test-fedora-42-runtime-environment-ubuntu SUCCESS in 17m 12s
✔️ system-test-fedora-41-commands-options SUCCESS in 1h 16m 25s
✔️ system-test-fedora-41-runtime-environment-arch-fedora SUCCESS in 1h 05m 36s
✔️ system-test-fedora-41-runtime-environment-ubuntu SUCCESS in 16m 24s

debarshiray · 2025-07-07T09:24:02Z

recheck

softwarefactory-project-zuul · 2025-07-07T10:44:09Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/62f009693f4f4cafb39ccbdaf274eb53

debarshiray · 2025-07-07T14:30:58Z

Weird. One of the tests for the HOSTNAME environment variable is randomly failing:

fedora-rawhide | not ok 86 environment variables: HOSTNAME inside the default container
fedora-rawhide | # tags: arch-fedora runtime-environment
fedora-rawhide | # (from function `assert_success' in file test/system/libs/bats-assert/src/assert.bash, line 114,
fedora-rawhide | #  in test file test/system/220-environment-variables.bats, line 383)
fedora-rawhide | #   `assert_success' failed
fedora-rawhide | #
fedora-rawhide | # -- command failed --
fedora-rawhide | # status : 1
fedora-rawhide | # output :
fedora-rawhide | # --
fedora-rawhide | #

fedora-42 | not ok 89 environment variables: HOSTNAME inside RHEL 8.10
fedora-42 | # tags: arch-fedora runtime-environment
fedora-42 | # (from function `assert_success' in file test/system/libs/bats-assert/src/assert.bash, line 114,
fedora-42 | #  in test file test/system/220-environment-variables.bats, line 416)
fedora-42 | #   `assert_success' failed
fedora-42 | #
fedora-42 | # -- command failed --
fedora-42 | # status : 1
fedora-42 | # output :
fedora-42 | # --
fedora-42 | #

debarshiray · 2025-07-07T14:31:31Z

recheck

softwarefactory-project-zuul · 2025-07-07T15:52:42Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/7507c2dad7ea425c8380548726e772c9

softwarefactory-project-zuul · 2025-07-10T23:25:04Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/4772014078864541be1c7ac3a3613139

softwarefactory-project-zuul · 2025-07-14T13:09:36Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/04c4a4bb1dd64dc8b603db63a84a450a

softwarefactory-project-zuul · 2025-07-22T21:45:20Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/ea4f816f4e2e4c9eb5632d67d3d84533

✔️ unit-test SUCCESS in 5m 39s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 59s
✔️ unit-test-restricted SUCCESS in 5m 35s
❌ system-test-fedora-rawhide-commands-options FAILURE in 6m 20s
❌ system-test-fedora-rawhide-runtime-environment-arch-fedora FAILURE in 17m 18s
❌ system-test-fedora-rawhide-runtime-environment-ubuntu FAILURE in 6m 24s
✔️ system-test-fedora-42-commands-options SUCCESS in 1h 14m 59s
✔️ system-test-fedora-42-runtime-environment-arch-fedora SUCCESS in 1h 01m 50s
✔️ system-test-fedora-42-runtime-environment-ubuntu SUCCESS in 17m 19s
✔️ system-test-fedora-41-commands-options SUCCESS in 1h 16m 24s
✔️ system-test-fedora-41-runtime-environment-arch-fedora SUCCESS in 1h 04m 07s
✔️ system-test-fedora-41-runtime-environment-ubuntu SUCCESS in 17m 22s

softwarefactory-project-zuul · 2025-07-23T00:15:53Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/e27b1f4e08c744c78f63408669c6ce62

✔️ unit-test SUCCESS in 5m 40s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 43s
✔️ unit-test-restricted SUCCESS in 5m 51s
❌ system-test-fedora-rawhide-commands-options FAILURE in 6m 20s
❌ system-test-fedora-rawhide-runtime-environment-arch-fedora FAILURE in 16m 55s
❌ system-test-fedora-rawhide-runtime-environment-ubuntu FAILURE in 6m 12s
✔️ system-test-fedora-42-commands-options SUCCESS in 1h 15m 31s
✔️ system-test-fedora-42-runtime-environment-arch-fedora SUCCESS in 1h 04m 56s
✔️ system-test-fedora-42-runtime-environment-ubuntu SUCCESS in 16m 48s
✔️ system-test-fedora-41-commands-options SUCCESS in 1h 16m 50s
✔️ system-test-fedora-41-runtime-environment-arch-fedora SUCCESS in 1h 05m 56s
✔️ system-test-fedora-41-runtime-environment-ubuntu SUCCESS in 17m 37s

softwarefactory-project-zuul · 2025-07-23T01:45:33Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/82b11e80f70c4a7193fca6b2b4ea3026

✔️ unit-test SUCCESS in 5m 52s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 35s
✔️ unit-test-restricted SUCCESS in 5m 34s
❌ system-test-fedora-rawhide-commands-options FAILURE in 6m 32s
❌ system-test-fedora-rawhide-runtime-environment-arch-fedora FAILURE in 16m 48s
❌ system-test-fedora-rawhide-runtime-environment-ubuntu FAILURE in 6m 29s
✔️ system-test-fedora-42-commands-options SUCCESS in 1h 15m 30s
✔️ system-test-fedora-42-runtime-environment-arch-fedora SUCCESS in 1h 05m 25s
✔️ system-test-fedora-42-runtime-environment-ubuntu SUCCESS in 17m 32s
✔️ system-test-fedora-41-commands-options SUCCESS in 1h 15m 19s
✔️ system-test-fedora-41-runtime-environment-arch-fedora SUCCESS in 1h 05m 48s
✔️ system-test-fedora-41-runtime-environment-ubuntu SUCCESS in 16m 55s

softwarefactory-project-zuul · 2025-08-08T00:04:22Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/9d129b9fc7fa42d09b9c6b34f68b1775

From now on, masch <[email protected]> will show up as Mario Sebastian Chacon <[email protected]>. containers#1703

softwarefactory-project-zuul · 2025-08-08T14:08:00Z

Build failed.
https://softwarefactory-project.io/zuul/t/local/buildset/2f5c6542feac489cb4fb16d0da448fe0

✔️ unit-test SUCCESS in 2m 24s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 3m 49s
✔️ unit-test-restricted SUCCESS in 2m 14s
❌ system-test-fedora-rawhide-commands-options FAILURE in 50m 19s
✔️ system-test-fedora-rawhide-runtime-environment-arch-fedora SUCCESS in 40m 53s
✔️ system-test-fedora-rawhide-runtime-environment-ubuntu SUCCESS in 14m 36s
❌ system-test-fedora-42-commands-options FAILURE in 1h 20m 32s
✔️ system-test-fedora-42-runtime-environment-arch-fedora SUCCESS in 1h 12m 10s
✔️ system-test-fedora-42-runtime-environment-ubuntu SUCCESS in 17m 21s
❌ system-test-fedora-41-commands-options FAILURE in 1h 20m 43s
✔️ system-test-fedora-41-runtime-environment-arch-fedora SUCCESS in 1h 08m 31s
✔️ system-test-fedora-41-runtime-environment-ubuntu SUCCESS in 21m 05s

Toolbx started out with a MAJOR.MINOR.MICRO versioning scheme. eg., 0.0.1, 0.0.2, etc.. A NANO version was reserved for releases to address brown paper bag bugs [1] or other critical issues, and release candidates. eg., a few releases used the MAJOR.MINOR.MICRO.NANO versioning scheme between 0.0.98 and 0.1.0 to act as an extended set of release candidates for the dot-zero 0.1.0 release. The MAJOR.MINOR.MICRO versioning scheme was meant to indicate the nascent nature of the Toolbx project and the ideas behind it when it first started in August 2018. It's been seven years since then, and both the project and the ideas that it implements are a lot more mature and widely adopted. So much so, that there are a few independent reimplementations today [2,3]. In version 0.0.90, Toolbx switched from a POSIX shell implementation to a Go implementation. The practice of bundling and statically linking the Go dependencies sometimes makes it necessary to update the dependencies to address security bugs or other critical issues. It's more convenient to do this as part of an upstream release than through downstream patches by distributors. Hence, it will be helpful for downstream distributors, especially those that offer long-term support, to have targeted bug-fix releases that only have the critical dependency updates or other critical fixes, and nothing else. To address this situation, future releases will default to not having a MICRO version and use a MAJOR.MINOR versioning scheme. A MICRO version will be reserved for the same purposes that a NANO version was reserved for until now. It's easier to read and remember a shorter MAJOR.MINOR version than a longer one, and appropriately conveys the maturity of the project. When a MICRO version is needed, it will also be easier to read and remember than a longer one with a NANO version. As per this new scheme, the next release will be version 0.2. [1] https://www.computer-dictionary-online.org/definitions-b/brown-paper-bag-bug [2] https://github.com/89luca89/distrobox/ [3] https://github.com/openSUSE/microos-toolbox/ containers#1703

The MAJOR version will always be 0, the MINOR version can't be 0 after the release of 0.1.0; until 1.0.0 or 1.0 is released, which won't happen in the short-term future. Similarly, the MICRO version can't be 0 after the release of 0.1.1, until 0.2.0 is released. Future releases will default to not having a MICRO version and use a MAJOR.MINOR versioning scheme. A MICRO version will be reserved for the same purposes that a NANO version was reserved for until now, and it will never be 0. Tighten the regular expression used to check the version to match this present reality. It can be revisited when 1.0 is eventually released. containers#1703

containers#1703

A subsequent commit will use this to stop the Toolbx container once the last 'enter' or 'run' session finishes. containers#114

Currently, once a Toolbx container gets started with 'podman start', as part of the 'enter' or 'run' commands, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'enter' and 'run' sessions have ended, and the lingering entry points of those containers are can be considered a waste of resources. A system of reference counting based on advisory file locks has been used to automatically exit the container's entry point once all the active sessions have ended. Two locks are used - a global lock that's common for all containers, and a local lock that's specific to each container. The initialization stamp file is conveniently used as the local lock. The 'enter' and 'run' sessions acquire shared file locks and the container's entry point acquires ones that are exclusive. All attempts at acquiring the locks are blocking unless otherwise noted. The global lock is acquired at the beginning of 'enter' and 'run' before they inspect the container, negotiate the path to the local lock (ie., the initialization stamp file) with the entry point, and the local lock is created by the entry point. Once the local lock is known by 'enter' and 'run', they acquire it and only then release the global. The Toolbx container's entry point tries to acquire the global lock as it creates the initialization stamp file (ie., the local lock). This waits for the 'enter' and 'run' invocations to receive the location of the local lock, acquire it and release the global. Once the entry point acquires the global lock, it releases it, and waits trying to acquire the local lock. This sequence of acquiring and releasing the locks lets the entry point track the state of the 'enter' and 'run' invocations. It should only try to acquire the local lock after the 'enter' and 'run' invocations have acquired it before invoking 'podman exec'. The entry point is able to acquire the local lock after all 'enter' and 'run' sessions end and release their local locks. At this point, a new 'enter' or 'run' invocation might be in the process of starting. Both sides need to be careful not to race against each other and up in an invalid state. eg., a 'podman start' being invoked against a container whose entry point is just about to exit, or a 'podman exec' being invoked against a container whose entry point is about to exit or has already exited. Therefore, the entry point makes a non-blocking attempt to acquire the global lock while holding the local. If it fails, then it's because a new 'enter' or 'run' was invoked that is in the process of negotiating the path to the local lock with the entry point. In this case, the entry point releases the local lock and goes back trying to acquire the global lock, as it did when creating the initialization stamp file (ie., the local lock). If it succeeds, then no new 'enter' or 'run' is in the process of starting, and the entry point can exit. If this system of reference counting is simplified to just the global lock, then all the entry points of all Toolbx containers will exit only after all the 'enter' and 'run' sessions across all Toolbx containers have ended. The local lock makes it possible to do this for each container separately. This system will not work without the global lock. It will cause a few races if a new 'enter' or 'run' is invoked, just as the last of the previous batch of sessions end, letting the entry point acquire the local lock and prepare to exit. Sometimes, a Toolbx container's entry point is started directly with 'podman start', without going through the 'enter' or 'run' commands, for debugging. Care was taken to detect this case by making a non-blocking attempt to acquire the global lock from the entry point before creating the initialization stamp file (ie., the local lock). If it fails, then it's because an 'enter' or 'run' is waiting for the container to get initialized by the entry point, and things proceed as described above. If it succeeds, then it's because the entry point was started directly. In this case, the entry point releases the global lock, and adds a timeout after creating the initialization stamp file before trying to acquire any other locks to give the user time to invoke 'enter' or 'run'. A timeout of 25 seconds is used, as is the default for D-Bus method calls [1] and when waiting for the entry point to initialize the container. A variation of this system of reference counting can only use the advisory file locks in the 'enter' and 'run' commands, and invoke 'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to find out if there are any remaining sessions [2]. This was not done because each podman(1) invocation is sufficiently expensive and there is a desire to keep them to minimum in the 'enter' and 'run' commands, because these are the most frequently used commands and users expect them to be as lean as possible [3,4]. A totally different approach could be to pass an AF_UNIX socket to the Toolbx container through the NOTIFY_SOCKET environment variable and 'podman create --sdnotify container ...', and do the reference counting by sending messages from the host to the entry point before and after each 'podman exec' [2]. One downside is that the reference counting will break if the host process crashes before sending the message to deduct the count after a 'podman exec' ends. Another downside is that it becomes complicated to directly call 'podman start', without going through the 'enter' or 'run' commands, for debugging. [1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html [2] containers/podman#26589 [3] Commit 4536e2c containers@4536e2c8c28f6c4f containers#813 containers#654 [4] Commit 74d4fcf containers@74d4fcf00c6ec3d1 containers#1491 containers#1070 containers#114

containers#114

softwarefactory-project-zuul · 2025-08-08T22:52:48Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/1647ea108c9f4d1a8a8771cd86dbe355

softwarefactory-project-zuul · 2025-08-11T02:41:59Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/0ce8837e27e14e3197c8bfa70b8b6a89

softwarefactory-project-zuul · 2025-08-11T12:03:45Z

Build succeeded.
https://softwarefactory-project.io/zuul/t/local/buildset/606872de5acc4256ab155a99b5e875a8

debarshiray mentioned this pull request Jul 6, 2025

Containers remain running after exiting #114

Open

debarshiray force-pushed the wip/rishi/automatically-stop-containers-3 branch from c7c8158 to 0bf4ee2 Compare July 6, 2025 20:10

debarshiray force-pushed the wip/rishi/automatically-stop-containers-3 branch from ea2db5d to d410d09 Compare July 10, 2025 22:03

debarshiray force-pushed the wip/rishi/automatically-stop-containers-3 branch from 3578ee3 to 0882df7 Compare July 22, 2025 20:27

debarshiray force-pushed the wip/rishi/automatically-stop-containers-3 branch from 77203e8 to 0da5949 Compare August 7, 2025 22:16

.mailmap: Canonicalize Mario's name

7fa2303

From now on, masch <[email protected]> will show up as Mario Sebastian Chacon <[email protected]>. containers#1703

debarshiray force-pushed the wip/rishi/automatically-stop-containers-3 branch from 0da5949 to 669298f Compare August 8, 2025 12:45

debarshiray added 6 commits August 8, 2025 21:11

Prepare 0.2

e3ce0bc

containers#1703

cmd/run: Shuffle some code around

f37cf63

A subsequent commit will use this to stop the Toolbx container once the last 'enter' or 'run' session finishes. containers#114

cmd/initContainer: Remove the initialization stamp file when exiting

a79037c

containers#114

debarshiray force-pushed the wip/rishi/automatically-stop-containers-3 branch from 669298f to f37cf63 Compare August 8, 2025 21:22

cmd, pkg/utils: Stop the container once the last session finishes #1679

Are you sure you want to change the base?

cmd, pkg/utils: Stop the container once the last session finishes #1679

Conversation

debarshiray commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

softwarefactory-project-zuul bot commented Jul 6, 2025

Uh oh!

debarshiray commented Jul 6, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 6, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 6, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 6, 2025

Uh oh!

debarshiray commented Jul 7, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 7, 2025

Uh oh!

debarshiray commented Jul 7, 2025

Uh oh!

debarshiray commented Jul 7, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 7, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 10, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 14, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 22, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 23, 2025

Uh oh!

softwarefactory-project-zuul bot commented Jul 23, 2025

Uh oh!

softwarefactory-project-zuul bot commented Aug 8, 2025

Uh oh!

softwarefactory-project-zuul bot commented Aug 8, 2025

Uh oh!

softwarefactory-project-zuul bot commented Aug 8, 2025

Uh oh!

softwarefactory-project-zuul bot commented Aug 11, 2025

Uh oh!

softwarefactory-project-zuul bot commented Aug 11, 2025

Uh oh!

Uh oh!

debarshiray commented Jul 6, 2025 •

edited

Loading