Skip to content

Conversation

adambkaplan
Copy link

Containers are a wonderful Rube Goldberg machine of Linux internals and configuration [1]. This is especially true for container runtimes that support a "rootless" mode, where spawned processes are subject to constraints and limits that are not present when the runtime executes as root.

This change clarifies the existing documentation for launching KinD with a rootless runtime. The guidance is split into logical sections, providing context and justification for each recommended host change. Callouts are made for changes that impact networking components, such as Ingress and Gateway controllers. These generally appear to push default performance guardrails for user containers/processes.

[1] https://en.wikipedia.org/wiki/Rube_Goldberg_machine

Fixes #3451

Containers are a wonderful Rube Goldberg machine of Linux internals and
configuration [1]. This is especially true for container runtimes that
support a "rootless" mode, where spawned processes are subject to
constraints and limits that are not present when the runtime executes
as root.

This change clarifies the existing documentation for launching KinD with
a rootless runtime. The guidance is split into logical sections,
providing context and justification for each recommended host change.
Callouts are made for changes that impact networking components, such
as Ingress and Gateway controllers. These generally appear to push
default performance guardrails for user containers/processes.

[1] https://en.wikipedia.org/wiki/Rube_Goldberg_machine

Assisted-by: Cursor
Signed-off-by: Adam Kaplan <[email protected]>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adambkaplan
Once this PR has been reviewed and has the lgtm label, please assign stmcginnis for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 30, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @adambkaplan!

It looks like this is your first PR to kubernetes-sigs/kind 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kind has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 30, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @adambkaplan. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 30, 2025
@aojea
Copy link
Contributor

aojea commented Sep 30, 2025

/ok-to-test
/assign @AkihiroSuda

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 30, 2025
> Gateway API has an [Ingress migration guide](https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/).
> **WARNING**: If you are using a [rootless container runtime], ensure your host is
> properly configured before creating the KinD cluster. Most Ingress and Gateway controllers will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph does not seem meaningful; no program works unless it is properly configured.
This is not specific to rootless, nor to ingress/gateway controllers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind's installation instructions and quickstart assume a rootful container runtime. There is no additional configuration required beyond installing said runtime and the kind CLI. See https://github.com/kubernetes-sigs/kind/blob/main/site/content/_index.md and https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/quick-start.md.

On Fedora (42) with podman (rootless), kind create cluster creates a control plane with no reported issues. You can even run most operators/helm charts and not have any problems. Most end users on Fedora would not think to consult the rootless guidance until problems arise. And in my experience, Ingress is where things fall apart in ways that are incredibly hard to debug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fedora (42) with podman (rootless), kind create cluster creates a control plane with no reported issues.

To me, kind create cluster with Podman failed on a clean installation of Fedora 42 (ARM).

  • /etc/subuid and /etc/subgid were not configured
  • kind create cluster fails with ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to run sudo usermod -aG systemd-journal $(whoami) and re-login to allow podman logs to be functional.

Didn't you need that?

kind can auto-detect the [docker], [podman], or [nerdctl] installed and choose the available one. If you want to turn off the auto-detect, use the environment variable `KIND_EXPERIMENTAL_PROVIDER=docker`, `KIND_EXPERIMENTAL_PROVIDER=podman` or `KIND_EXPERIMENTAL_PROVIDER=nerdctl` to
select the runtime.

> **NOTE**: In some distributions (ex: Fedora), the container runtime operates in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the default mode depends on the runtime implementation, not on the host distribution.
Regardless to the host distribution, Podman and nerdctl default to rootless, Docker defaults to rootful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this - I am not familiar with nerdctl's capabilities. My experience is Fedora + Podman, and wanted to hedge wrt Podman on other Linux distributions. Will correct in a follow-up commit.


Also, depending on the host configuration, the following steps might be needed:
Your host may also need to enable [cgroup delegation](https://systemd.io/CGROUP_DELEGATION/) for daemon-based controller runtimes.
This is not required for daemonless runtimes, such as podman. Note that this procedure may
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untrue. What's the source of this misinformation? Is this from some hallucinating LLM?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just a human trying to figure out how all this stuff works. I don't have this setting enabled, and my kind clusters (Fedora 42 + podman) seem to behave well once the other guidance in this article is adopted.

My key questions are:

  1. When are systemd user services involved when running rootless kind clusters?
  2. What are the consequences of not enabling cgroup delegation for user services here?

I don't think an LLM (Gemini in this case) is hallucinating when it claims "Podman itself doesn't hava a hard requirement on systemd (it can run without it)." Correct me if I'm wrong: if kind create cluster merely spawns containers by invoking Podman directly then there is no systemd user service involved. If true, then this recommendation a) does nothing for kind, and b) has potentially undesirable side effects for other user-scoped systemd services.

Invoking podman containers through podman.socket and the API service (a daemon) is a separate matter. There the user-scoped service does the right thing and enables cgroup delegation:

$ cat /usr/lib/systemd/user/podman.service 
[Unit]
Description=Podman API Service
Requires=podman.socket
After=podman.socket
Documentation=man:podman-system-service(1)
StartLimitIntervalSec=0

[Service]
Delegate=true
Type=exec
KillMode=process
Environment=LOGGING="--log-level=info"
ExecStart=/usr/bin/podman $LOGGING system service

[Install]
WantedBy=default.target

I would hope that rootless Docker and rootless nerdctl/containerd do similar things for their systemd services these days, in which case maybe the guidance here is obsolete?

Copy link
Member

@AkihiroSuda AkihiroSuda Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemd seems to have begun to enable cpu delegation by default:
systemd/systemd@b8df7f8

This is the matter of the default configuration of systemd, not of Docker/Podman/nerdctl.

Copy link
Member

@AkihiroSuda AkihiroSuda Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delegation seems no longer needed to be configured manually, for systemd >= 252.
(Ubuntu >= 23.04 , Debian >= 12, RHEL >= 9, ...)

Confirmed with a clean installation of Ubuntu 25.04 (systemd 257) with Rootless Docker.

1. As root, create the directory `/etc/systemd/system/[email protected]/` if it does not already exist

```sh
sudo mdkir -p /etc/systemd/system/[email protected]/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: mdkir

Copy link
Member

@AkihiroSuda AkihiroSuda Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that obvious that mkdir is needed when the directory does not exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience for users - code blocks are easy to copy/paste.

### Networking

- Create `/etc/modules-load.d/iptables.conf` with the following content:
Containers running in rootless mode are not typically loaded with host-level iptable modules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree with "typically"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My Linux experience is almost exclusively with Fedora. I am not familiar with what kernel mods are loaded by default in other distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A clean installation of Ubuntu 25.04 seems to load ip_tables by default


- Create `/etc/modules-load.d/iptables.conf` with the following content:
Containers running in rootless mode are not typically loaded with host-level iptable modules.
This breaks the behavior of most Ingress and Gateway controllers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specific to these controllers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to phrase as "This breaks the behavior of many networking components, such as Ingress and Gateway controllers"?

Containers running in rootless mode are not typically loaded with host-level iptable modules.
This breaks the behavior of most Ingress and Gateway controllers.

To load the iptable modules into the KinD containers, do the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, kernel modules are loaded to the host kernel, not into "the KinD containers"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question - why isn't this procedure needed for rootful runtimes? How are these kernel modules getting loaded dynamically?

As with the other modifications here, these are system-level changes that may have undesirable side effects. I think it's helpful if end users understand the "deeper why" behind these changes. For all I know, this is a feature gap that could be fixed somewhere in the stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the modules are loaded on demand in the case of rootful

fs.inotify.max_user_instances = 512
```
2. Restart your system for these changes to take effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not sysctl --system ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add in a follow-up commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/KinD/kind/g


- If using podman, be aware that by default there is a [limit](https://docs.podman.io/en/v4.3/markdown/options/pids-limit.html#pids-limit-limit) to the number of pids that can be created. This can cause problems like nginx workers inside a container not spawning correctly.
- If you want to disable this limit, edit your `containers.conf` file (generally located in `/etc/containers/containers.conf`). Note that this could cause things like pid exhaustion to happen on the host machine. Alternatively, change `0` to your desired new limit:
2. Restart your system to ensure these changes take effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sudo systemctl restart systemd-modules-load.service should suffice


## Host requirements

### cgroups v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### cgroups v2
### cgroup v2

For consistency with the other occurrences

Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/), [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) and [Rootless nerdctl](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md) can be used as the node provider of kind.

## Provider requirements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown style consistency - I like having an empty line after a heading. This doesn't appear to impact the site rendering by Hugo (see the deploy preview).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if we ever decided to enable a markdown linter (highly unlikely) it would complain about there not being a blank line between headers, code blocks, etc.

That said, unrelated changes to the file does make it slightly harder to review, but...

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contains several misinformation

2. Restart your system for these changes to take effect.
### Allow Unprivileged Binding to HTTP(S) Ports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specific to HTTP(S)

Copy link
Author

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some broader context behind this PR: I run on Fedora 42 with podman, which is rootless by default. It is just a process, no backend daemon or systemd service is required.

Today kind create cluster just works ™️ - until you try to use a Gateway or Ingress controller. I spent practically a full working day hitting failure after failure getting NGINX or Contour working. Some problems straightforward to understand (can't bind to privileged port 80 by default, duh). Others were impossible to debug "connection reset by peer" errors, or arbitrary/flaky behavior.

Following the guidance in the rootless article got this working, sans enabling cgroup delegation for all user services. I would have saved myself a lot of trouble if I was directed to this page from the main home page, the quickstart guide, or the Ingress article.

> Gateway API has an [Ingress migration guide](https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/).
> **WARNING**: If you are using a [rootless container runtime], ensure your host is
> properly configured before creating the KinD cluster. Most Ingress and Gateway controllers will
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind's installation instructions and quickstart assume a rootful container runtime. There is no additional configuration required beyond installing said runtime and the kind CLI. See https://github.com/kubernetes-sigs/kind/blob/main/site/content/_index.md and https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/quick-start.md.

On Fedora (42) with podman (rootless), kind create cluster creates a control plane with no reported issues. You can even run most operators/helm charts and not have any problems. Most end users on Fedora would not think to consult the rootless guidance until problems arise. And in my experience, Ingress is where things fall apart in ways that are incredibly hard to debug.

kind can auto-detect the [docker], [podman], or [nerdctl] installed and choose the available one. If you want to turn off the auto-detect, use the environment variable `KIND_EXPERIMENTAL_PROVIDER=docker`, `KIND_EXPERIMENTAL_PROVIDER=podman` or `KIND_EXPERIMENTAL_PROVIDER=nerdctl` to
select the runtime.

> **NOTE**: In some distributions (ex: Fedora), the container runtime operates in
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this - I am not familiar with nerdctl's capabilities. My experience is Fedora + Podman, and wanted to hedge wrt Podman on other Linux distributions. Will correct in a follow-up commit.

Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/), [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) and [Rootless nerdctl](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md) can be used as the node provider of kind.

## Provider requirements

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown style consistency - I like having an empty line after a heading. This doesn't appear to impact the site rendering by Hugo (see the deploy preview).


Also, depending on the host configuration, the following steps might be needed:
Your host may also need to enable [cgroup delegation](https://systemd.io/CGROUP_DELEGATION/) for daemon-based controller runtimes.
This is not required for daemonless runtimes, such as podman. Note that this procedure may
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just a human trying to figure out how all this stuff works. I don't have this setting enabled, and my kind clusters (Fedora 42 + podman) seem to behave well once the other guidance in this article is adopted.

My key questions are:

  1. When are systemd user services involved when running rootless kind clusters?
  2. What are the consequences of not enabling cgroup delegation for user services here?

I don't think an LLM (Gemini in this case) is hallucinating when it claims "Podman itself doesn't hava a hard requirement on systemd (it can run without it)." Correct me if I'm wrong: if kind create cluster merely spawns containers by invoking Podman directly then there is no systemd user service involved. If true, then this recommendation a) does nothing for kind, and b) has potentially undesirable side effects for other user-scoped systemd services.

Invoking podman containers through podman.socket and the API service (a daemon) is a separate matter. There the user-scoped service does the right thing and enables cgroup delegation:

$ cat /usr/lib/systemd/user/podman.service 
[Unit]
Description=Podman API Service
Requires=podman.socket
After=podman.socket
Documentation=man:podman-system-service(1)
StartLimitIntervalSec=0

[Service]
Delegate=true
Type=exec
KillMode=process
Environment=LOGGING="--log-level=info"
ExecStart=/usr/bin/podman $LOGGING system service

[Install]
WantedBy=default.target

I would hope that rootless Docker and rootless nerdctl/containerd do similar things for their systemd services these days, in which case maybe the guidance here is obsolete?

1. As root, create the directory `/etc/systemd/system/[email protected]/` if it does not already exist

```sh
sudo mdkir -p /etc/systemd/system/[email protected]/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience for users - code blocks are easy to copy/paste.

### Networking

- Create `/etc/modules-load.d/iptables.conf` with the following content:
Containers running in rootless mode are not typically loaded with host-level iptable modules.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My Linux experience is almost exclusively with Fedora. I am not familiar with what kernel mods are loaded by default in other distributions.


- Create `/etc/modules-load.d/iptables.conf` with the following content:
Containers running in rootless mode are not typically loaded with host-level iptable modules.
This breaks the behavior of most Ingress and Gateway controllers.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to phrase as "This breaks the behavior of many networking components, such as Ingress and Gateway controllers"?

Containers running in rootless mode are not typically loaded with host-level iptable modules.
This breaks the behavior of most Ingress and Gateway controllers.

To load the iptable modules into the KinD containers, do the following:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question - why isn't this procedure needed for rootful runtimes? How are these kernel modules getting loaded dynamically?

As with the other modifications here, these are system-level changes that may have undesirable side effects. I think it's helpful if end users understand the "deeper why" behind these changes. For all I know, this is a feature gap that could be fixed somewhere in the stack.

fs.inotify.max_user_instances = 512
```
2. Restart your system for these changes to take effect.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add in a follow-up commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky ingress behavior using ingress-nginx and rootless podman
5 participants