From 1494d7d4830bab12ed0a5216d9760bf0a3f17d92 Mon Sep 17 00:00:00 2001 From: Adam Kaplan Date: Tue, 30 Sep 2025 17:59:02 -0400 Subject: [PATCH] feat: Clarify Rootless Runtime Requirements Containers are a wonderful Rube Goldberg machine of Linux internals and configuration [1]. This is especially true for container runtimes that support a "rootless" mode, where spawned processes are subject to constraints and limits that are not present when the runtime executes as root. This change clarifies the existing documentation for launching KinD with a rootless runtime. The guidance is split into logical sections, providing context and justification for each recommended host change. Callouts are made for changes that impact networking components, such as Ingress and Gateway controllers. These generally appear to push default performance guardrails for user containers/processes, or require access to privileged components of a Linux system. Additional host requirements were added based on community review. Some of these are met by running more recent versions of popular Linux distributions, with recommended minimum versions for Ubuntu, Fedora, and Arch Linux. For those running older versions or other distributions, specific instructions were added to enable cgroup v2 and systemd CPU delegation. [1] https://en.wikipedia.org/wiki/Rube_Goldberg_machine Assisted-by: Cursor Signed-off-by: Adam Kaplan Co-authored-by: Akihiro Suda --- site/content/docs/user/ingress.md | 5 + site/content/docs/user/quick-start.md | 5 +- site/content/docs/user/rootless.md | 221 ++++++++++++++++++++++---- 3 files changed, 198 insertions(+), 33 deletions(-) diff --git a/site/content/docs/user/ingress.md b/site/content/docs/user/ingress.md index 3feafb56a2..5bede046e9 100644 --- a/site/content/docs/user/ingress.md +++ b/site/content/docs/user/ingress.md @@ -21,6 +21,10 @@ Ingress exposes HTTP and HTTPS routes from outside the cluster to services withi > **NOTE**: You may also want to consider using [Gateway API](https://gateway-api.sigs.k8s.io/) instead of Ingress. > Gateway API has an [Ingress migration guide](https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/). +> **WARNING**: If you are using a [rootless container runtime], ensure your host is +> properly configured before creating the KIND cluster. Most Ingress and Gateway controllers will +> not work if these steps are skipped. + ### Create Cluster #### Option 1: LoadBalancer @@ -139,3 +143,4 @@ curl localhost/bar [LoadBalancer]: /docs/user/loadbalancer/ [Cloud Provider KIND]: /docs/user/loadbalancer/ +[rootless container runtime]: /docs/user/rootless/ diff --git a/site/content/docs/user/quick-start.md b/site/content/docs/user/quick-start.md index d105822312..86262ea428 100644 --- a/site/content/docs/user/quick-start.md +++ b/site/content/docs/user/quick-start.md @@ -160,6 +160,9 @@ More usage can be discovered with `kind create cluster --help`. kind can auto-detect the [docker], [podman], or [nerdctl] installed and choose the available one. If you want to turn off the auto-detect, use the environment variable `KIND_EXPERIMENTAL_PROVIDER=docker`, `KIND_EXPERIMENTAL_PROVIDER=podman` or `KIND_EXPERIMENTAL_PROVIDER=nerdctl` to select the runtime. +> **NOTE**: podman and nerdctl operate in [rootless mode](/docs/user/rootless) by default. Extra +> setup is needed for KIND clusters to be fully functional. + ## Interacting With Your Cluster After [creating a cluster](#creating-a-cluster), you can use [kubectl][kubectl] @@ -501,4 +504,4 @@ kind, the Kubernetes cluster itself, etc. [Private Registries]: /docs/user/private-registries [customize control plane with kubeadm]: https://kubernetes.io/docs/setup/independent/control-plane-flags/ [access multiple clusters]: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/ -[release notes]: https://github.com/kubernetes-sigs/kind/releases \ No newline at end of file +[release notes]: https://github.com/kubernetes-sigs/kind/releases diff --git a/site/content/docs/user/rootless.md b/site/content/docs/user/rootless.md index 5b341b6fa6..ec44ef5712 100644 --- a/site/content/docs/user/rootless.md +++ b/site/content/docs/user/rootless.md @@ -9,57 +9,214 @@ menu: Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/), [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) and [Rootless nerdctl](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md) can be used as the node provider of kind. ## Provider requirements + - Docker: 20.10 or later - Podman: 3.0 or later - nerdctl: 1.7 or later ## Host requirements -The host needs to be running with cgroup v2. -Make sure that the result of the `docker info` command contains `Cgroup Version: 2`. -If it prints `Cgroup Version: 1`, try adding `GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"` to `/etc/default/grub` and -running `sudo update-grub` to enable cgroup v2. -Also, depending on the host configuration, the following steps might be needed: +### cgroup v2 + +The host needs to be running with cgroup v2, which is the default for many Linux disributions: + +- Ubuntu: 21.10 and later. +- Fedora: 31 and later. +- Arch: April 2021 release and later. + +You can verify the cgroup version used by your controller runtime with the following procedure: + +- `docker`: Run `docker info` and look for `Cgroup Version: 2` in the output. +- `podman`: Run `podman info` and look for `cgroupVersion: v2` in the output. +- `nerdctl`: Run `nerdctl info` and look for `Cgroup Version: 2` in the output. + +If the `info` output prints `Cgroup Version: 1` or equivalent, try the following to enable cgroup v2: + +1. In `/etc/default/grub`, add the line `GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"` +2. Run `sudo update-grub` to enable cgroup v2. + +Your host will also need to enable [cgroup delegation](https://systemd.io/CGROUP_DELEGATION/) of the `cpu` controller for +user services. This is enabled by default for distributions running `systemd` version 252 and higher. + +To enable cgroup delegation for all the controllers, do the following: + +1. Check your version of `systemd` by running `systemctl --version`. If the output prints + `systemd 252` or higher, no further action is needed. Example output below from a Fedora host: + + ```sh + $ systemctl --version + systemd 257 (257.9-2.fc42) + ``` + +2. For systems with older versions of `systemd`, first create the directory + `/etc/systemd/system/user@.service.d/` if it is not present. + + ```sh + sudo mdkir -p /etc/systemd/system/user@.service.d/ + ``` + +3. Next, create the file `/etc/systemd/system/user@.service.d/delegate.conf` with the following content: + + ```ini + [Service] + Delegate=yes + ``` + +4. Reload systemd for these changes to take effect: + + ```sh + sudo systemctl daemon-reload + ``` + +5. If using docker, reload the user docker daemon: + + ```sh + systemctl --user restart docker + ``` + +### Networking + +Containers running in rootless mode may not loaded with host-level iptable modules. +This breaks the behavior of most networking components, such as Ingress and Gateway controllers. + +To load the iptable modules, do the following: + +1. First, use `lsmod` to check which kernel modules are loaded by default for user processes on + your system. Use `grep` to find which iptable modules are loaded: + + ```sh + lsmod | grep "ip.*table" + ``` + +2. Check the output for the following kernel modules: + - `ip6_tables` + - `ip6table_nat` + - `ip_tables` + - `iptable_nat` + +3. If one or more of the kernel modules above are not present, your system needs to load these at + startup for each process. First, run the following command to add these missing modules: + + ```sh + sudo tee /etc/modules-load.d/iptables.conf > /dev/null <<'EOF' + ip6_tables + ip6table_nat + ip_tables + iptable_nat + EOF + ``` -- Create `/etc/systemd/system/user@.service.d/delegate.conf` with the following content, and then run `sudo systemctl daemon-reload`: +4. Check that the new module loading configuration is correct. You should see the following output: - ```ini - [Service] - Delegate=yes - ``` + ```sh + $ cat /etc/modules-load.d/iptables.conf + ip6_tables + ip6table_nat + ip_tables + iptable_nat + ``` - (This is not enabled by default because ["the runtime impact of - [delegating the "cpu" controller] is still too - high"](https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZMKLS7SHMRJLJ57NZCYPBAQ3UOYULV65/). - Beware that changing this configuration may affect system - performance.) +4. Next, restart the `systemd-modules-load` service to make these changes effective immediately: - Please note that: + ```sh + sudo systemctl restart systemd-modules-load.service + ``` - - `/etc/systemd/system/user@.service.d/` directory needs to be created if not already present on your host - - If using Docker and it was already running when this step was done, a restart is needed for the changes to take - effect - {{< codeFromInline lang="bash" >}} - systemctl --user restart docker - {{< /codeFromInline >}} +5. Alternatively, restart your system to ensure these changes take effect. -- Create `/etc/modules-load.d/iptables.conf` with the following content: +### Increase PID Limits - ``` - ip6_tables - ip6table_nat - ip_tables - iptable_nat - ``` +KIND nodes are represented as individual containers on their hosts. Runtimes such as podman set +default [process id limits](https://docs.podman.io/en/v4.3/markdown/options/pids-limit.html#pids-limit-limit) +that may be too low for the node or for a pod running on the node. The Ingress NGINX Controller is +[particularly susceptible](https://github.com/kubernetes-sigs/kind/issues/3451) to this issue. -- If using podman, be aware that by default there is a [limit](https://docs.podman.io/en/v4.3/markdown/options/pids-limit.html#pids-limit-limit) to the number of pids that can be created. This can cause problems like nginx workers inside a container not spawning correctly. - - If you want to disable this limit, edit your `containers.conf` file (generally located in `/etc/containers/containers.conf`). Note that this could cause things like pid exhaustion to happen on the host machine. Alternatively, change `0` to your desired new limit: +To increase the PID limit, do the following: + +1. If using podman, edit your `containers.conf` file (generally located in + `/etc/containers/containers.conf` or `~/.config/containers/containers.conf`) to increase the PIDs + limit to a desired value (default 4096 on most systems): ```ini [containers] - pids_limit = 0 + pids_limit = 65536 ``` +2. Re-recreate the KIND cluster for these changes to take effect: + + ```sh + kind delete cluster && kind create cluster + ``` + +### Increase inotify Limits + +As documented in [known issues](/docs/user/known-issues/#pod-errors-due-to-too-many-open-files), pods may +fail by reaching inotify watch and instance limits. Ingress controllers such as NGINX and Contour +are particularly susceptible to this issue. + +To increase the inotify limits, do the following: + +1. As root, create a `.conf` file in `/etc/systctl.d` that increases the `fs.inotify` max user settings: + + ``` + fs.inotify.max_user_watches = 524288 + fs.inotify.max_user_instances = 512 + ``` + +2. Reload `sysctl` for these changes to take effect: + + ```sh + sudo sysctl --system + ``` + +Alternatively, restart your system for these changes to take effect. + + +### Allow Binding to Privileged Ports + +If you use the `extraPortMappings` method to provide ingress to your KIND cluster, you can allow +the KIND node container to bind to ports 80 and 443 on the host. User containers cannot bind to +ports below 1024 by default as they are considered privileged. + +You can avoid this issue by binding the node to a non-privileged host port, such as 8080 or 8443: + +```yaml +# kind config.yaml +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 +nodes: +- role: control-plane + extraPortMappings: + - containerPort: 80 + hostPort: 8080 + protocol: TCP + - containerPort: 443 + hostPort: 8443 + protocol: TCP +``` + +Note that with this configuration, requests to your cluster ingress will need to add the +appropriate port number. In the example above, HTTP requests must use `localhost:8080` in the URL. + +To allow a KIND node to bind to ports 80 and/or 443 on the host, do the following: + +1. As root, create a `.conf` file in `/etc/systctl.d` that lowers the privileged port start number: + + ``` + # Allow unprivileged binding to HTTP port 80 + # Use 443 if you only need binding to the default HTTPS port + net.ipv4.ip_unprivileged_port_start=80 + ``` + +2. Reload `sysctl` for these changes to take effect: + + ```sh + sudo sysctl --system + ``` + +Alternatively, restart your system for these changes to take effect. + + ## Restrictions The restrictions of Rootless Docker apply to kind clusters as well.