Skip to content

Commit 369cc79

Browse files
committed
cmd, pkg/utils: Stop the container once the last session finishes
Currently, once a Toolbx container gets started with 'podman start', as part of the 'enter' or 'run' commands, it doesn't stop unless the host is shut down or someone explicitly calls 'podman stop'. This becomes annoying if someone tries to remove the container because commands like 'podman rm' and such don't work without the '--force' flag, even if all active 'enter' and 'run' sessions have ended, and the lingering entry points of those containers are can be considered a waste of resources. A system of reference counting based on advisory file locks has been used to automatically exit the container's entry point once all the active sessions have ended. Two locks are used - a global lock that's common for all containers, and a local lock that's specific to each container. The initialization stamp file is conveniently used as the local lock. The 'enter' and 'run' sessions acquire shared file locks and the container's entry point acquires ones that are exclusive. All attempts at acquiring the locks are blocking unless otherwise noted. The global lock is acquired at the beginning of 'enter' and 'run' before they inspect the container, negotiate the path to the local lock (ie., the initialization stamp file) with the entry point, and the local lock is created by the entry point. Once the local lock is known by 'enter' and 'run', they acquire it and only then release the global. The Toolbx container's entry point tries to acquire the global lock as it creates the initialization stamp file (ie., the local lock). This waits for the 'enter' and 'run' invocations to receive the location of the local lock, acquire it and release the global. Once the entry point acquires the global lock, it releases it, and waits trying to acquire the local lock. This sequence of acquiring and releasing the locks lets the entry point track the state of the 'enter' and 'run' invocations. It should only try to acquire the local lock after the 'enter' and 'run' invocations have acquired it before invoking 'podman exec'. The entry point is able to acquire the local lock after all 'enter' and 'run' sessions end and release their local locks. At this point, a new 'enter' or 'run' invocation might be in the process of starting. Both sides need to be careful not to race against each other and up in an invalid state. eg., a 'podman start' being invoked against a container whose entry point is just about to exit, or a 'podman exec' being invoked against a container whose entry point is about to exit or has already exited. Therefore, the entry point makes a non-blocking attempt to acquire the global lock while holding the local. If it fails, then it's because a new 'enter' or 'run' was invoked that is in the process of negotiating the path to the local lock with the entry point. In this case, the entry point releases the local lock and goes back trying to acquire the global lock, as it did when creating the initialization stamp file (ie., the local lock). If it succeeds, then no new 'enter' or 'run' is in the process of starting, and the entry point can exit. If this system of reference counting is simplified to just the global lock, then all the entry points of all Toolbx containers will exit only after all the 'enter' and 'run' sessions across all Toolbx containers have ended. The local lock makes it possible to do this for each container separately. This system will not work without the global lock. It will cause a few races if a new 'enter' or 'run' is invoked, just as the last of the previous batch of sessions end, letting the entry point acquire the local lock and prepare to exit. Sometimes, a Toolbx container's entry point is started directly with 'podman start', without going through the 'enter' or 'run' commands, for debugging. Care was taken to detect this case by making a non-blocking attempt to acquire the global lock from the entry point before creating the initialization stamp file (ie., the local lock). If it fails, then it's because an 'enter' or 'run' is waiting for the container to get initialized by the entry point, and things proceed as described above. If it succeeds, then it's because the entry point was started directly. In this case, the entry point releases the global lock, and adds a timeout after creating the initialization stamp file before trying to acquire any other locks to give the user time to invoke 'enter' or 'run'. A timeout of 25 seconds is used, as is the default for D-Bus method calls [1] and when waiting for the entry point to initialize the container. A variation of this system of reference counting can only use the advisory file locks in the 'enter' and 'run' commands, and invoke 'podman inspect --format {{.ExecIDs}} ...' after each 'podman exec' to find out if there are any remaining sessions [2]. This was not done because each podman(1) invocation is sufficiently expensive and there is a desire to keep them to minimum in the 'enter' and 'run' commands, because these are the most frequently used commands and users expect them to be as lean as possible [3,4]. A totally different approach could be to pass an AF_UNIX socket to the Toolbx container through the NOTIFY_SOCKET environment variable and 'podman create --sdnotify container ...', and do the reference counting by sending messages from the host to the entry point before and after each 'podman exec' [2]. One downside is that the reference counting will break if the host process crashes before sending the message to deduct the count after a 'podman exec' ends. Another downside is that it becomes complicated to directly call 'podman start', without going through the 'enter' or 'run' commands, for debugging. [1] https://docs.gtk.org/gio/property.DBusProxy.g-default-timeout.html [2] containers/podman#26589 [3] Commit 4536e2c 4536e2c8c28f6c4f #813 #654 [4] Commit 74d4fcf 74d4fcf00c6ec3d1 #1491 #1070 #114
1 parent 0882df7 commit 369cc79

File tree

3 files changed

+184
-0
lines changed

3 files changed

+184
-0
lines changed

src/cmd/initContainer.go

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
package cmd
1818

1919
import (
20+
"context"
2021
"errors"
2122
"fmt"
2223
"io/ioutil"
@@ -25,6 +26,7 @@ import (
2526
"path/filepath"
2627
"strconv"
2728
"strings"
29+
"syscall"
2830
"time"
2931

3032
"github.com/containers/toolbox/pkg/shell"
@@ -352,6 +354,28 @@ func initContainer(cmd *cobra.Command, args []string) error {
352354
return err
353355
}
354356

357+
referenceCountGlobalLock, err := utils.GetReferenceCountGlobalLock(targetUser)
358+
if err != nil {
359+
return err
360+
}
361+
362+
var waitForRun bool
363+
if referenceCountGlobalLockFile, err := utils.Flock(referenceCountGlobalLock,
364+
syscall.LOCK_EX|syscall.LOCK_NB); err == nil {
365+
waitForRun = true
366+
if err := referenceCountGlobalLockFile.Close(); err != nil {
367+
logrus.Debugf("Releasing global reference count lock: %s", err)
368+
return utils.ErrFlockRelease
369+
}
370+
}
371+
372+
parentCtx := context.Background()
373+
waitForExitCtx, waitForExitCancel := context.WithCancelCause(parentCtx)
374+
defer waitForExitCancel(errors.New("clean-up"))
375+
376+
detectWhenContainerIsUnsedAsync(waitForExitCancel, initializedStamp, referenceCountGlobalLock, waitForRun)
377+
done := waitForExitCtx.Done()
378+
355379
logrus.Debugf("Creating initialization stamp %s", initializedStamp)
356380

357381
initializedStampFile, err := os.Create(initializedStamp)
@@ -371,6 +395,10 @@ func initContainer(cmd *cobra.Command, args []string) error {
371395

372396
for {
373397
select {
398+
case <-done:
399+
cause := context.Cause(waitForExitCtx)
400+
logrus.Debugf("Exiting entry point: %s", cause)
401+
return nil
374402
case event := <-tickerDaily.C:
375403
handleDailyTick(event)
376404
case event := <-watcherForHostEvents:
@@ -737,6 +765,55 @@ func createSymbolicLink(existingTarget, newLink string) error {
737765
return nil
738766
}
739767

768+
func detectWhenContainerIsUnsedAsync(cancel context.CancelCauseFunc,
769+
initializedStamp, referenceCountGlobalLock string,
770+
waitForRun bool) {
771+
772+
go func() {
773+
if waitForRun {
774+
logrus.Debugf("This entry point was not started by 'toolbox enter' or 'toolbox run'")
775+
logrus.Debugf("Waiting for 'toolbox enter' or 'toolbox run'")
776+
time.Sleep(25 * time.Second)
777+
}
778+
779+
for {
780+
logrus.Debugf("Waiting for 'podman exec' to begin")
781+
if err := waitForExecToBegin(referenceCountGlobalLock); err != nil {
782+
if errors.Is(err, utils.ErrFlockRelease) {
783+
cancel(err)
784+
} else {
785+
logrus.Debugf("Waiting for 'podman exec' to begin: %s", err)
786+
logrus.Debug("This entry point will not exit when the container is unused")
787+
}
788+
789+
return
790+
}
791+
792+
logrus.Debugf("Waiting for the container to be unused")
793+
if err := waitForContainerToBeUnused(initializedStamp,
794+
referenceCountGlobalLock); err != nil {
795+
if errors.Is(err, syscall.EWOULDBLOCK) {
796+
logrus.Debug("Detected potentially new use of the container")
797+
continue
798+
}
799+
800+
if errors.Is(err, utils.ErrFlockRelease) {
801+
cancel(err)
802+
} else {
803+
logrus.Debugf("Waiting for the container to be unused: %s", err)
804+
logrus.Debug("This entry point will not exit when the container is unused")
805+
}
806+
807+
return
808+
}
809+
810+
cause := errors.New("all 'podman exec' sessions exited")
811+
cancel(cause)
812+
return
813+
}
814+
}()
815+
}
816+
740817
func getDelayEntryPoint() (time.Duration, bool) {
741818
valueString := os.Getenv("TOOLBX_DELAY_ENTRY_POINT")
742819
if valueString == "" {
@@ -1066,6 +1143,43 @@ func updateTimeZoneFromLocalTime() error {
10661143
return nil
10671144
}
10681145

1146+
func waitForExecToBegin(referenceCountGlobalLock string) error {
1147+
referenceCountGlobalLockFile, err := utils.Flock(referenceCountGlobalLock, syscall.LOCK_EX)
1148+
if err != nil {
1149+
return err
1150+
}
1151+
1152+
if err := referenceCountGlobalLockFile.Close(); err != nil {
1153+
logrus.Debugf("Releasing global reference count lock: %s", err)
1154+
return utils.ErrFlockRelease
1155+
}
1156+
1157+
return nil
1158+
}
1159+
1160+
func waitForContainerToBeUnused(initializedStamp, referenceCountGlobalLock string) error {
1161+
referenceCountLocalLockFile, err := utils.Flock(initializedStamp, syscall.LOCK_EX)
1162+
if err != nil {
1163+
if errors.Is(err, syscall.EWOULDBLOCK) {
1164+
panicMsg := fmt.Sprintf("unexpected %T: %s", err, err)
1165+
panic(panicMsg)
1166+
}
1167+
1168+
return err
1169+
}
1170+
1171+
if _, err := utils.Flock(referenceCountGlobalLock, syscall.LOCK_EX|syscall.LOCK_NB); err != nil {
1172+
if err := referenceCountLocalLockFile.Close(); err != nil {
1173+
logrus.Debugf("Releasing local reference count lock: %s", err)
1174+
return utils.ErrFlockRelease
1175+
}
1176+
1177+
return err
1178+
}
1179+
1180+
return nil
1181+
}
1182+
10691183
func writeTimeZone(timeZone string) error {
10701184
const etcTimeZone = "/etc/timezone"
10711185

src/cmd/run.go

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,35 @@ func runCommand(container string,
243243
}
244244
}
245245

246+
logrus.Debug("Acquiring global reference count lock")
247+
248+
referenceCountGlobalLock, err := utils.GetReferenceCountGlobalLock(currentUser)
249+
if err != nil {
250+
return err
251+
}
252+
253+
referenceCountGlobalLockFile, err := utils.Flock(referenceCountGlobalLock, syscall.LOCK_SH)
254+
if err != nil {
255+
logrus.Debugf("Acquiring global reference count lock: %s", err)
256+
257+
var errFlock *utils.FlockError
258+
259+
if errors.As(err, &errFlock) {
260+
if errors.Is(err, utils.ErrFlockAcquire) {
261+
err = utils.ErrFlockAcquire
262+
} else if errors.Is(err, utils.ErrFlockCreate) {
263+
err = utils.ErrFlockCreate
264+
} else {
265+
panicMsg := fmt.Sprintf("unexpected %T: %s", err, err)
266+
panic(panicMsg)
267+
}
268+
}
269+
270+
return err
271+
}
272+
273+
defer referenceCountGlobalLockFile.Close()
274+
246275
logrus.Debugf("Inspecting container %s", container)
247276
containerObj, err := podman.InspectContainer(container)
248277
if err != nil {
@@ -345,6 +374,35 @@ func runCommand(container string,
345374
}
346375

347376
logrus.Debugf("Container %s is initialized", container)
377+
logrus.Debug("Acquiring local reference count lock")
378+
379+
referenceCountLocalLockFile, err := utils.Flock(initializedStamp, syscall.LOCK_SH)
380+
if err != nil {
381+
logrus.Debugf("Acquiring local reference count lock: %s", err)
382+
383+
var errFlock *utils.FlockError
384+
385+
if errors.As(err, &errFlock) {
386+
if errors.Is(err, utils.ErrFlockAcquire) {
387+
err = utils.ErrFlockAcquire
388+
} else if errors.Is(err, utils.ErrFlockCreate) {
389+
err = utils.ErrFlockCreate
390+
} else {
391+
panicMsg := fmt.Sprintf("unexpected %T: %s", err, err)
392+
panic(panicMsg)
393+
}
394+
}
395+
396+
return err
397+
}
398+
399+
defer referenceCountLocalLockFile.Close()
400+
401+
logrus.Debug("Releasing global reference count lock")
402+
if err := referenceCountGlobalLockFile.Close(); err != nil {
403+
logrus.Debugf("Releasing global reference count lock: %s", err)
404+
return utils.ErrFlockRelease
405+
}
348406

349407
environ := append(cdiEnviron, p11KitServerEnviron...)
350408
if err := runCommandWithFallbacks(container,

src/pkg/utils/utils.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,8 @@ var (
179179

180180
ErrFlockCreate = errors.New("failed to create lock file")
181181

182+
ErrFlockRelease = errors.New("failed to release lock")
183+
182184
ErrImageWithoutBasename = errors.New("image does not have a basename")
183185
)
184186

@@ -493,6 +495,16 @@ func GetP11KitServerSocketLock(targetUser *user.User) (string, error) {
493495
return p11KitServerSocketLock, nil
494496
}
495497

498+
func GetReferenceCountGlobalLock(targetUser *user.User) (string, error) {
499+
toolbxRuntimeDirectory, err := GetRuntimeDirectory(targetUser)
500+
if err != nil {
501+
return "", err
502+
}
503+
504+
referenceCountGlobalLock := filepath.Join(toolbxRuntimeDirectory, "container-reference-count.lock")
505+
return referenceCountGlobalLock, nil
506+
}
507+
496508
func GetRuntimeDirectory(targetUser *user.User) (string, error) {
497509
if runtimeDirectories == nil {
498510
runtimeDirectories = make(map[string]string)

0 commit comments

Comments
 (0)