Skip to content

Conversation

l0crian1
Copy link
Contributor

@l0crian1 l0crian1 commented Sep 2, 2025

Change summary

When a user incorrectly tries to restart a podman container with podman container restart <container>, the podman and quadlet config will go out of sync, and you will not be able to recover the container.

vyos@PE2:~$ sudo podman container restart alpine1
WARN[0010] StopSignal SIGTERM failed to stop container alpine1 in 10 seconds, resorting to SIGKILL
ERRO[0010] Cleaning up container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157: unmounting container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157 storage: cleaning up container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157 storage: unmounting container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157 root filesystem: removing mount point "/usr/lib/live/mount/persistence/container/storage/overlay/42a8fff3d1a21fa4bd5c592dc017c39463dc4a24a9c75e94793ae88cef682ed8/merged": directory not empty
Error: crun: executable file `/bin/sh` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

Attempting to restart the container afterwards the correct way will result in a failure.

vyos@PE2:~$ restart container alpine1
Job for vyos-container-alpine1.service failed because the control process exited with error code.
See "systemctl status vyos-container-alpine1.service" and "journalctl -xeu vyos-container-alpine1.service" for details.

Deleting the config and trying to reapply it doesn't work:

vyos@PE2# delete container name alpine1
vyos@PE2# commit

vyos@PE2# set container name alpine1 image 'alpine'
vyos@PE2# set container name alpine1 allow-host-network
vyos@PE2# commit

[ container ]
Traceback (most recent call last):
  File "/usr/libexec/vyos/services/vyos-configd", line 156, in run_script
    script.apply(c)
  File "/usr/libexec/vyos/conf_mode/container.py", line 618, in apply
    cmd(f'systemctl restart vyos-container-{name}.service')
  File "/usr/lib/python3/dist-packages/vyos/utils/process.py", line 189, in cmd
    raise OSError(code, feedback)
PermissionError: [Errno 1] failed to run command: None systemctl restart vyos-container-alpine1.service
returned:
exit code: 1

[[container]] failed
Commit failed

Even restarting won't correct the issue. When looking at the logs, we can see that the issue is that the previous overlay layer directory still has data in it. And the quadlet wants to recreate the container, which fails due to the directory not empty error:

Sep 02 11:59:49 PE2 systemd[1]: Starting VyOS Container alpine1...
Sep 02 11:59:49 PE2 podman[6739]: time="2025-09-02T11:59:49Z" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
Sep 02 11:59:49 PE2 podman[6739]: time="2025-09-02T11:59:49Z" level=warning msg="Unmounting container \"alpine1\" while attempting to delete storage: removing mount point \"/usr/lib/live/mount/persistence/container/storage/overlay/42a8fff3d1a21fa4bd5c592dc017c39463dc4a24a9c75e94793ae88cef682ed8/merged\": directory not empty"
Sep 02 11:59:49 PE2 podman[6739]: Error: removing storage for container "alpine1": removing mount point "/usr/lib/live/mount/persistence/container/storage/overlay/42a8fff3d1a21fa4bd5c592dc017c39463dc4a24a9c75e94793ae88cef682ed8/merged": directory not empty
Sep 02 11:59:49 PE2 systemd[1]: vyos-container-alpine1.service: Control process exited, code=exited, status=125/n/a
Sep 02 11:59:49 PE2 systemd[1]: vyos-container-alpine1.service: Failed with result 'exit-code'.
Sep 02 11:59:49 PE2 systemd[1]: Failed to start VyOS Container alpine1.

This added function will clean the stale overlay layer so the container can be reinitialized correctly.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes)
  • Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • Other (please describe):

Related Task(s)

https://vyos.dev/T6673

Related PR(s)

How to test / Smoketest result

Create a container:

set container name alpine1 image 'alpine'
set container name alpine1 allow-host-network

Restart the container with podman commands:

vyos@PE2:~$ sudo podman container restart alpine1
WARN[0010] StopSignal SIGTERM failed to stop container alpine1 in 10 seconds, resorting to SIGKILL
ERRO[0010] Cleaning up container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157: unmounting container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157 storage: cleaning up container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157 storage: unmounting container 0721bc4a562ffe28276b203adc17fa79e7269a574a100c72c71137a285fe8157 root filesystem: removing mount point "/usr/lib/live/mount/persistence/container/storage/overlay/42a8fff3d1a21fa4bd5c592dc017c39463dc4a24a9c75e94793ae88cef682ed8/merged": directory not empty
Error: crun: executable file `/bin/sh` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

Attempt to restart the container using VyOS op-mode command:

Before:

vyos@PE2:~$ restart container alpine1
Job for vyos-container-alpine1.service failed because the control process exited with error code.
See "systemctl status vyos-container-alpine1.service" and "journalctl -xeu vyos-container-alpine1.service" for details.

After:

vyos@PE2:~$ restart container alpine1
Container "alpine1" restarted!

Verify container is present:

vyos@PE2:~$ show container
CONTAINER ID  IMAGE                                 COMMAND               CREATED        STATUS        PORTS       NAMES
6ccb774230dd  docker.io/library/alpine:latest       /bin/sh               3 seconds ago  Up 3 seconds              alpine1

Checklist:

  • I have read the CONTRIBUTING document
  • I have linked this PR to one or more Phabricator Task(s)
  • I have run the components SMOKETESTS if applicable
  • My commit headlines contain a valid Task id
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

Fixed issue with podman and systemd sync when restarting
containers with 'podman restart' command.
Copy link

github-actions bot commented Sep 2, 2025

👍
No issues in PR Title / Commit Title

- Placed podman storage directory in vyos/defaults.py
- Replaced repeated declarations with vyos.defaults.directories['podman_storage']
@dmbaturin dmbaturin changed the title Container: T6673: Fix restart of containers with podman container: T6673: Fix restart of containers with podman Sep 15, 2025
Copy link
Member

@dmbaturin dmbaturin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix logic looks sensible to me.

@sever-sever sever-sever requested a review from Copilot September 15, 2025 12:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where containers become unrecoverable when users incorrectly restart them using direct podman commands instead of VyOS commands. The fix adds logic to clean up stale overlay layers when container restart fails.

  • Adds a clean_layer() function to remove corrupted container overlay layers
  • Updates the container restart operation to attempt layer cleanup on initial failure
  • Adds the podman storage directory path to the VyOS defaults configuration

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/op_mode/container.py Implements layer cleanup functionality and integrates it into the restart operation
python/vyos/defaults.py Adds podman storage directory path constant

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <[email protected]>
Copy link

CI integration 👍 passed!

Details

CI logs

  • CLI Smoketests (no interfaces) 👍 passed
  • CLI Smoketests VPP 👍 passed
  • CLI Smoketests (interfaces only) 👍 passed
  • Config tests 👍 passed
  • Config tests VPP 👍 passed
  • RAID1 tests 👍 passed
  • TPM tests 👍 passed

@c-po c-po added the bp/circinus Create automatic backport for circinus label Sep 18, 2025
Copy link
Member

@c-po c-po left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes addressed

@c-po c-po merged commit 0989889 into vyos:current Sep 18, 2025
17 of 18 checks passed
@vyosbot vyosbot added mirror-initiated This PR initiated for mirror sync workflow mirror-completed and removed mirror-initiated This PR initiated for mirror sync workflow labels Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bp/circinus Create automatic backport for circinus current mirror-completed rebase
Development

Successfully merging this pull request may close these issues.

4 participants