-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: Clarify Rootless Runtime Requirements #4022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Containers are a wonderful Rube Goldberg machine of Linux internals and configuration [1]. This is especially true for container runtimes that support a "rootless" mode, where spawned processes are subject to constraints and limits that are not present when the runtime executes as root. This change clarifies the existing documentation for launching KinD with a rootless runtime. The guidance is split into logical sections, providing context and justification for each recommended host change. Callouts are made for changes that impact networking components, such as Ingress and Gateway controllers. These generally appear to push default performance guardrails for user containers/processes. [1] https://en.wikipedia.org/wiki/Rube_Goldberg_machine Assisted-by: Cursor Signed-off-by: Adam Kaplan <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: adambkaplan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @adambkaplan! |
Hi @adambkaplan. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
> Gateway API has an [Ingress migration guide](https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/). | ||
> **WARNING**: If you are using a [rootless container runtime], ensure your host is | ||
> properly configured before creating the KinD cluster. Most Ingress and Gateway controllers will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph does not seem meaningful; no program works unless it is properly configured.
This is not specific to rootless, nor to ingress/gateway controllers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind's installation instructions and quickstart assume a rootful container runtime. There is no additional configuration required beyond installing said runtime and the kind
CLI. See https://github.com/kubernetes-sigs/kind/blob/main/site/content/_index.md and https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/quick-start.md.
On Fedora (42) with podman (rootless), kind create cluster
creates a control plane with no reported issues. You can even run most operators/helm charts and not have any problems. Most end users on Fedora would not think to consult the rootless guidance until problems arise. And in my experience, Ingress is where things fall apart in ways that are incredibly hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Fedora (42) with podman (rootless), kind create cluster creates a control plane with no reported issues.
To me, kind create cluster
with Podman failed on a clean installation of Fedora 42 (ARM).
/etc/subuid
and/etc/subgid
were not configuredkind create cluster
fails withERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to run sudo usermod -aG systemd-journal $(whoami)
and re-login to allow podman logs
to be functional.
Didn't you need that?
kind can auto-detect the [docker], [podman], or [nerdctl] installed and choose the available one. If you want to turn off the auto-detect, use the environment variable `KIND_EXPERIMENTAL_PROVIDER=docker`, `KIND_EXPERIMENTAL_PROVIDER=podman` or `KIND_EXPERIMENTAL_PROVIDER=nerdctl` to | ||
select the runtime. | ||
|
||
> **NOTE**: In some distributions (ex: Fedora), the container runtime operates in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the default mode depends on the runtime implementation, not on the host distribution.
Regardless to the host distribution, Podman and nerdctl default to rootless, Docker defaults to rootful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this - I am not familiar with nerdctl's capabilities. My experience is Fedora + Podman, and wanted to hedge wrt Podman on other Linux distributions. Will correct in a follow-up commit.
|
||
Also, depending on the host configuration, the following steps might be needed: | ||
Your host may also need to enable [cgroup delegation](https://systemd.io/CGROUP_DELEGATION/) for daemon-based controller runtimes. | ||
This is not required for daemonless runtimes, such as podman. Note that this procedure may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Untrue. What's the source of this misinformation? Is this from some hallucinating LLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just a human trying to figure out how all this stuff works. I don't have this setting enabled, and my kind clusters (Fedora 42 + podman) seem to behave well once the other guidance in this article is adopted.
My key questions are:
- When are systemd user services involved when running rootless kind clusters?
- What are the consequences of not enabling cgroup delegation for user services here?
I don't think an LLM (Gemini in this case) is hallucinating when it claims "Podman itself doesn't hava a hard requirement on systemd (it can run without it)." Correct me if I'm wrong: if kind create cluster
merely spawns containers by invoking Podman directly then there is no systemd user service involved. If true, then this recommendation a) does nothing for kind, and b) has potentially undesirable side effects for other user-scoped systemd services.
Invoking podman containers through podman.socket
and the API service (a daemon) is a separate matter. There the user-scoped service does the right thing and enables cgroup delegation:
$ cat /usr/lib/systemd/user/podman.service
[Unit]
Description=Podman API Service
Requires=podman.socket
After=podman.socket
Documentation=man:podman-system-service(1)
StartLimitIntervalSec=0
[Service]
Delegate=true
Type=exec
KillMode=process
Environment=LOGGING="--log-level=info"
ExecStart=/usr/bin/podman $LOGGING system service
[Install]
WantedBy=default.target
I would hope that rootless Docker and rootless nerdctl/containerd do similar things for their systemd services these days, in which case maybe the guidance here is obsolete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemd seems to have begun to enable cpu
delegation by default:
systemd/systemd@b8df7f8
This is the matter of the default configuration of systemd, not of Docker/Podman/nerdctl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delegation seems no longer needed to be configured manually, for systemd >= 252.
(Ubuntu >= 23.04 , Debian >= 12, RHEL >= 9, ...)
Confirmed with a clean installation of Ubuntu 25.04 (systemd 257) with Rootless Docker.
1. As root, create the directory `/etc/systemd/system/[email protected]/` if it does not already exist | ||
|
||
```sh | ||
sudo mdkir -p /etc/systemd/system/[email protected]/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: mdkir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that obvious that mkdir is needed when the directory does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience for users - code blocks are easy to copy/paste.
### Networking | ||
|
||
- Create `/etc/modules-load.d/iptables.conf` with the following content: | ||
Containers running in rootless mode are not typically loaded with host-level iptable modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree with "typically"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My Linux experience is almost exclusively with Fedora. I am not familiar with what kernel mods are loaded by default in other distributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A clean installation of Ubuntu 25.04 seems to load ip_tables
by default
|
||
- Create `/etc/modules-load.d/iptables.conf` with the following content: | ||
Containers running in rootless mode are not typically loaded with host-level iptable modules. | ||
This breaks the behavior of most Ingress and Gateway controllers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specific to these controllers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to phrase as "This breaks the behavior of many networking components, such as Ingress and Gateway controllers"?
Containers running in rootless mode are not typically loaded with host-level iptable modules. | ||
This breaks the behavior of most Ingress and Gateway controllers. | ||
|
||
To load the iptable modules into the KinD containers, do the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, kernel modules are loaded to the host kernel, not into "the KinD containers"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question - why isn't this procedure needed for rootful runtimes? How are these kernel modules getting loaded dynamically?
As with the other modifications here, these are system-level changes that may have undesirable side effects. I think it's helpful if end users understand the "deeper why" behind these changes. For all I know, this is a feature gap that could be fixed somewhere in the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the modules are loaded on demand in the case of rootful
fs.inotify.max_user_instances = 512 | ||
``` | ||
2. Restart your system for these changes to take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not sysctl --system
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add in a follow-up commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/KinD/kind/g
|
||
- If using podman, be aware that by default there is a [limit](https://docs.podman.io/en/v4.3/markdown/options/pids-limit.html#pids-limit-limit) to the number of pids that can be created. This can cause problems like nginx workers inside a container not spawning correctly. | ||
- If you want to disable this limit, edit your `containers.conf` file (generally located in `/etc/containers/containers.conf`). Note that this could cause things like pid exhaustion to happen on the host machine. Alternatively, change `0` to your desired new limit: | ||
2. Restart your system to ensure these changes take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo systemctl restart systemd-modules-load.service
should suffice
|
||
## Host requirements | ||
|
||
### cgroups v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### cgroups v2 | |
### cgroup v2 |
For consistency with the other occurrences
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/), [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) and [Rootless nerdctl](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md) can be used as the node provider of kind. | ||
|
||
## Provider requirements | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown style consistency - I like having an empty line after a heading. This doesn't appear to impact the site rendering by Hugo (see the deploy preview).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if we ever decided to enable a markdown linter (highly unlikely) it would complain about there not being a blank line between headers, code blocks, etc.
That said, unrelated changes to the file does make it slightly harder to review, but...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contains several misinformation
2. Restart your system for these changes to take effect. | ||
### Allow Unprivileged Binding to HTTP(S) Ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specific to HTTP(S)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some broader context behind this PR: I run on Fedora 42 with podman, which is rootless by default. It is just a process, no backend daemon or systemd service is required.
Today kind create cluster
just works ™️ - until you try to use a Gateway or Ingress controller. I spent practically a full working day hitting failure after failure getting NGINX or Contour working. Some problems straightforward to understand (can't bind to privileged port 80 by default, duh). Others were impossible to debug "connection reset by peer" errors, or arbitrary/flaky behavior.
Following the guidance in the rootless article got this working, sans enabling cgroup delegation for all user services. I would have saved myself a lot of trouble if I was directed to this page from the main home page, the quickstart guide, or the Ingress article.
> Gateway API has an [Ingress migration guide](https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/). | ||
> **WARNING**: If you are using a [rootless container runtime], ensure your host is | ||
> properly configured before creating the KinD cluster. Most Ingress and Gateway controllers will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind's installation instructions and quickstart assume a rootful container runtime. There is no additional configuration required beyond installing said runtime and the kind
CLI. See https://github.com/kubernetes-sigs/kind/blob/main/site/content/_index.md and https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/quick-start.md.
On Fedora (42) with podman (rootless), kind create cluster
creates a control plane with no reported issues. You can even run most operators/helm charts and not have any problems. Most end users on Fedora would not think to consult the rootless guidance until problems arise. And in my experience, Ingress is where things fall apart in ways that are incredibly hard to debug.
kind can auto-detect the [docker], [podman], or [nerdctl] installed and choose the available one. If you want to turn off the auto-detect, use the environment variable `KIND_EXPERIMENTAL_PROVIDER=docker`, `KIND_EXPERIMENTAL_PROVIDER=podman` or `KIND_EXPERIMENTAL_PROVIDER=nerdctl` to | ||
select the runtime. | ||
|
||
> **NOTE**: In some distributions (ex: Fedora), the container runtime operates in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this - I am not familiar with nerdctl's capabilities. My experience is Fedora + Podman, and wanted to hedge wrt Podman on other Linux distributions. Will correct in a follow-up commit.
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/), [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) and [Rootless nerdctl](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md) can be used as the node provider of kind. | ||
|
||
## Provider requirements | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown style consistency - I like having an empty line after a heading. This doesn't appear to impact the site rendering by Hugo (see the deploy preview).
|
||
Also, depending on the host configuration, the following steps might be needed: | ||
Your host may also need to enable [cgroup delegation](https://systemd.io/CGROUP_DELEGATION/) for daemon-based controller runtimes. | ||
This is not required for daemonless runtimes, such as podman. Note that this procedure may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just a human trying to figure out how all this stuff works. I don't have this setting enabled, and my kind clusters (Fedora 42 + podman) seem to behave well once the other guidance in this article is adopted.
My key questions are:
- When are systemd user services involved when running rootless kind clusters?
- What are the consequences of not enabling cgroup delegation for user services here?
I don't think an LLM (Gemini in this case) is hallucinating when it claims "Podman itself doesn't hava a hard requirement on systemd (it can run without it)." Correct me if I'm wrong: if kind create cluster
merely spawns containers by invoking Podman directly then there is no systemd user service involved. If true, then this recommendation a) does nothing for kind, and b) has potentially undesirable side effects for other user-scoped systemd services.
Invoking podman containers through podman.socket
and the API service (a daemon) is a separate matter. There the user-scoped service does the right thing and enables cgroup delegation:
$ cat /usr/lib/systemd/user/podman.service
[Unit]
Description=Podman API Service
Requires=podman.socket
After=podman.socket
Documentation=man:podman-system-service(1)
StartLimitIntervalSec=0
[Service]
Delegate=true
Type=exec
KillMode=process
Environment=LOGGING="--log-level=info"
ExecStart=/usr/bin/podman $LOGGING system service
[Install]
WantedBy=default.target
I would hope that rootless Docker and rootless nerdctl/containerd do similar things for their systemd services these days, in which case maybe the guidance here is obsolete?
1. As root, create the directory `/etc/systemd/system/[email protected]/` if it does not already exist | ||
|
||
```sh | ||
sudo mdkir -p /etc/systemd/system/[email protected]/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience for users - code blocks are easy to copy/paste.
### Networking | ||
|
||
- Create `/etc/modules-load.d/iptables.conf` with the following content: | ||
Containers running in rootless mode are not typically loaded with host-level iptable modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My Linux experience is almost exclusively with Fedora. I am not familiar with what kernel mods are loaded by default in other distributions.
|
||
- Create `/etc/modules-load.d/iptables.conf` with the following content: | ||
Containers running in rootless mode are not typically loaded with host-level iptable modules. | ||
This breaks the behavior of most Ingress and Gateway controllers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to phrase as "This breaks the behavior of many networking components, such as Ingress and Gateway controllers"?
Containers running in rootless mode are not typically loaded with host-level iptable modules. | ||
This breaks the behavior of most Ingress and Gateway controllers. | ||
|
||
To load the iptable modules into the KinD containers, do the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question - why isn't this procedure needed for rootful runtimes? How are these kernel modules getting loaded dynamically?
As with the other modifications here, these are system-level changes that may have undesirable side effects. I think it's helpful if end users understand the "deeper why" behind these changes. For all I know, this is a feature gap that could be fixed somewhere in the stack.
fs.inotify.max_user_instances = 512 | ||
``` | ||
2. Restart your system for these changes to take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add in a follow-up commit.
Containers are a wonderful Rube Goldberg machine of Linux internals and configuration [1]. This is especially true for container runtimes that support a "rootless" mode, where spawned processes are subject to constraints and limits that are not present when the runtime executes as root.
This change clarifies the existing documentation for launching KinD with a rootless runtime. The guidance is split into logical sections, providing context and justification for each recommended host change. Callouts are made for changes that impact networking components, such as Ingress and Gateway controllers. These generally appear to push default performance guardrails for user containers/processes.
[1] https://en.wikipedia.org/wiki/Rube_Goldberg_machine
Fixes #3451