Skip to content

Commit 16b45c8

Browse files
committed
rootfs: make pivot_root(2) dance handle initramfs case
While pivot_root(2) normally refuses to pivot a mount if you are running with / as initramfs (because initramfs doesn't have a parent mount), you can create a bind-mount and make that a new root to work around this problem. This hack is fairly well known and is used all over the place (see [1,2]) but until now we have forced users to have a far less secure configuration with --no-pivot. There are some minor issues with this trick (the initramfs sticks around at the top of the mount tree, but is completely masked) but they don't really matter for containers. [1]: containers/bubblewrap#592 (comment) [2]: https://aconz2.github.io/2024/07/29/container-from-initramfs.html Signed-off-by: Aleksa Sarai <[email protected]>
1 parent 9112335 commit 16b45c8

File tree

1 file changed

+31
-2
lines changed

1 file changed

+31
-2
lines changed

libcontainer/rootfs_linux.go

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1079,8 +1079,37 @@ func pivotRoot(rootfs string) error {
10791079
return &os.PathError{Op: "fchdir", Path: "fd " + strconv.Itoa(newroot), Err: err}
10801080
}
10811081

1082-
if err := unix.PivotRoot(".", "."); err != nil {
1083-
return &os.PathError{Op: "pivot_root", Path: ".", Err: err}
1082+
pivotErr := unix.PivotRoot(".", ".")
1083+
if errors.Is(pivotErr, unix.EINVAL) {
1084+
// If pivot_root(2) failed with -EINVAL, one of the possible reasons is
1085+
// that we are in early boot and trying pivot_root on top of the
1086+
// initramfs (which isn't allowed because initramfs/rootfs doesn't have
1087+
// a parent mount).
1088+
//
1089+
// Traditionally, users were told to pass --no-pivot-root (which used a
1090+
// chroot instead) but this is very insecure (even with the hardenings
1091+
// we've put into our chroot() wrapper).
1092+
//
1093+
// A much better solution is to create a bind-mount of the target and
1094+
// chroot into it, resulting in a parented mount that pivot_root(2)
1095+
// will accept. One minor issue is that the mount will still exist (and
1096+
// in the case of an init system like systemd, this will result in
1097+
// wasted memory, so they have to do some hacks to clear the initramfs)
1098+
// but the mount is masked in a much more safe way than chroot() so
1099+
// this is still much better.
1100+
if err := unix.Mount(".", ".", "", unix.MS_BIND|unix.MS_REC, ""); err != nil {
1101+
err := &os.PathError{Op: "bind mount over self", Path: rootfs, Err: err}
1102+
return fmt.Errorf("error during fallback for failed pivot_root (%w): %w", pivotErr, err)
1103+
}
1104+
if err := unix.Chroot("."); err != nil {
1105+
err := &os.PathError{Op: "chroot into bind-mount", Path: rootfs, Err: err}
1106+
return fmt.Errorf("error during fallback for failed pivot_root (%w): %w", pivotErr, err)
1107+
}
1108+
// Re-try the pivot_root().
1109+
pivotErr = unix.PivotRoot(".", ".")
1110+
}
1111+
if pivotErr != nil {
1112+
return &os.PathError{Op: "pivot_root", Path: rootfs, Err: err}
10841113
}
10851114

10861115
// Currently our "." is oldroot (according to the current kernel code).

0 commit comments

Comments
 (0)