This is a collection of scripts and templates to help create Kubernetes cluster on metal (and virtual) servers in Hetzner.
- Hibrid Cluster
- HA control plane running on 3 VMs (Talos)
- N worker nodes running on Metal servers (Talos)
- Load Balancer for Control Plane
- Load Balancer for Ingress
- vSwitch (connecting VMs and Metal servers)
.
├── config
│ ├── cluster_config.yaml
│ ├── cluster_nodes_index.yaml
│ ├── discovery
│ ├── secrets
│ └── talos
├── config_templates
│ ├── cluster_config.yaml
│ ├── cluster_nodes_index.yaml
│ └── talos
├── README.md
├── requirements.txt
├── scripts
│ ├── config.py
│ ├── hetzner_robot.py
│ └── install-talos-metal.py
└── storage
configfolder stores all config files relates to a cluster. You might want to handle it as a distinct git repo. -config/secretsis where all the Talos secrets for the cluster are stored. DO NOT ADD to git!!!config_templates- template files used by the scripts to bootstrap./configcontentscripts/config.py- main python scriptscripts/install-talos-metal.py- script that handles intallation of Talos on metal servers over SSH
git clone [email protected]:bunnyshell/open-talos-hetzner-builder
cd open-talos-hetzner-builder--talosconmise trust
mise iuv venv
uv pip install -r requirements.txt
# source .venv/bin/activateThe script can manage the needed vSwitch if provided with Hetzner Robot Webservice credentials.
Alternatively, you can create and manage the vSwitch manually from the Robot UI.
To create a Webservice/app user in Hetzner Robot navigate to robot.hetzner.com -> Settings -> Webservice and app settings.
HCLOUD_TOKEN="__________________________________"
HETZNER_ROBOT_USER="__________________________________"
HETZNER_ROBOT_PASSWORD="__________________________________"This script will create config folder, subfolder and draft config files. Also, it will create Talos secrets.
uv run scripts/config.py initSet cluster name, endpoint, hostname and talos version.
Optionally, edit Hetzner zone, datacenter and cp-server-type, robot-vlan-tag.
The Talos schematic is used to build the Talos server image for each cluster node. We have 2 types of nodes: worker (metal) and controlplane (VMs).
Edit config/talos/schematic.yaml and make sure you include your required Talos extentions.
Next, run:
uv run scripts/config.py schematicThis will calculate the Talos schematic ID and save it to scripts/cluste_config.yaml
In order to install Talos on all servers, we need:
- a Talos config file for all control plane nodes. (All control plane nodes share the same Talos config)
- a Talos config file for each worker node. (Each worker node has it's own Taloc config file because of disk IDs)
Run this command to render Talos config files:
uv run scripts/config.py renderThe Talos config files are stored in config/secrets/nodes/
The cluster needs a vSwitch to connect all metal servers in a private network. You have 2 choiches to create the vSwitch:
- Use the Robot Web UI to create a vSwitch. (or run
uv run scripts/config.py vswitch) - Connect the metal servers to this vSwitch.
- Make sure edit
config/talos_config/cluster_config.yamland fill in:- the VLAN TAG
- vSwitch ID (automatic if
uv run scripts/config.py vswitch)
The script will create the vSwitch (with the tag specified in the config/cluster_config.yaml file) and save the vSwitch ID (in the same file).
ur run scripts/config.py vswitchThe cluster needs Virtual Network (similar to a VPC) and dedicated subnets for metal and virtual servers. The metal subnet need to be exposed to the vSwitch (so that metal and virtual servers can communicate over the private network). This command handles all these requirements:
uv run scripts/config.py net
The cluster needs one load balancer dedicated to the control plane. All control plane nodes will be added to this LB. Run:
uv run scripts/config.py cp-lb
Create DNS records pointing the cluster hostname (defined as cluster.hostname in scripts/cluster_config.yaml) to the IP address or hostname of the control plane LB (created at the previous step).
Run dig or nslookup to make sure the hostname of the cluster is properly resolved by the DNS system.
Hetzner does not support Talos out of the box, we need to create snapshot of Talos to be used for creating VMs.
uv run scripts/config.py hcloud-imageIf you want to upload the image manually, skip this step and set hetzner.hcloud-image-id in config/cluster_config.yaml value to match the desired image ID.
The cluster uses 3 control plane nodes. These are VMs in HCloud. Run this command:
uv run scripts/config.py cp-nodesexport CP1=___IP_OF_CONTROL_PLANE_SERVER_1___
export TALOSCONFIG=$(realpath config/secrets/talosconfig.yaml)
talosctl dashboard -n $CP1 -e $CP1
# wait for the CP1 server to boot
# bootstrap Kubernetes on CP1
talosctl bootstrap -n $CP1 -e $CP1
Get the kubeconfig.yaml file
export KUBECONFIG=$(pwd)/config/secrets/kubeconfig.yaml
talosctl kubeconfig -n $CP1 -e $CP1Wait for the other nodes to join the cluster:
kubectl get nodesAt this point the cluster should have 3 control plane nodes, but NO CNI, so the nodes will be `NOT READY'.
We need to have a consistent naming/numbering scheme for the worker nodes. To do this, we write the config/cluster_nodes_index.yaml file.
index:
1: __ip_of_workernopde_1__
2: __ip_of_workernopde_2__
3: __ip_of_workernopde_3__Reboot each metal node in restore mode (using the Robot interface). Make sure you configure an SSH key for access to the server during restore.
Next, run similar commands for each node (provide -i with __server_number__ value):
# Example for worker node 1
uv run scripts/install-talos-metal.py -k ~/ssh-key -u root -i 1
# Example for worker node 2
uv run scripts/install-talos-metal.py -k ~/ssh-key -u root -i 2
- Install CNI
- Install CSI