-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Introduction
I would like to use clevis to decrypt my ZFS root partition on several machines.
Using 2 VMs, I tried to test if this is at all possible, and I think I've come pretty far, but I still keep getting the password prompt. I have some ideas on how to approach this further, but I could use some help figuring out where to look next.
Any help is greatly appreciated π
(Summary at the bottom)
Given how far I've come, it doesn't strike me as a lot of work to add "out-of-the-box" ZFS support to clevis. If I can get it to work, I might work on a PR for that myself π
Use case
I have a use case in mind with two PC's (a desktop and a raspberry-pi, the pi will use LUKS) which will be mutual tang servers (i.e. they can both be rebooted remotely, just not at the same time), and a laptop that uses either tang server when it's on the same network, and a passphrase when it's not.
What I did so far
Setup
I created a CentOS 8.2 VM with a root partition on natively encrypted ZFS (using dracut and systemd-boot) and cloned it twice to make 2 machines:
tangThe server hosting the keys, ip:192.168.122.18clevisThe server asking for decryption keys, ip:92.168.122.242
For the ZFS pool layout I used this guide by OpenZFS, which I adapted for CentOS.
For the bootloader (systemd-boot) I used this page on the ArchWiki.
And for zfs-mount-generator I used this page on the ArchWiki.
[root@clevis:~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 1.87G 13.1G 192K /
rpool/ROOT 1.56G 13.1G 192K none
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195 1.55G 13.1G 1.43G /
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/srv 368K 13.1G 192K /srv
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/tmp 784K 13.1G 400K /tmp
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/usr 1016K 13.1G 192K /usr
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/usr/local 824K 13.1G 520K /usr/local
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var 78.2M 13.1G 192K /var
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/games 288K 13.1G 192K /var/games
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib 74.2M 13.1G 29.5M /var/lib
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/AccountsService 192K 13.1G 192K /var/lib/AccountsService
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/NetworkManager 632K 13.1G 292K /var/lib/NetworkManager
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/dnf 1.56M 13.1G 1.06M /var/lib/dnf
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/flatpak 192K 13.1G 192K /var/lib/flatpak
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/rpm 40.1M 13.1G 37.2M /var/lib/rpm
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/lib/rpm-state 392K 13.1G 232K /var/lib/rpm-state
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/log 2.33M 13.1G 1.83M /var/log
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/spool 1012K 13.1G 348K /var/spool
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/spool/mail 352K 13.1G 192K /var/spool/mail
rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195/var/www 192K 13.1G 192K /var/www
rpool/USERDATA 320M 13.1G 192K /
rpool/USERDATA/home_1b9910ca-f889-4b64-8942-a139e62b1195 319M 13.1G 200K /home
rpool/USERDATA/home_1b9910ca-f889-4b64-8942-a139e62b1195/myuser 318M 13.1G 318M /home/myuser
rpool/USERDATA/root_1b9910ca-f889-4b64-8942-a139e62b1195 796K 13.1G 480K /root
Install tang
I followed the guide by RedHat to install and setup tang.
[root@tang:~]# dnf install -y tang
[root@tang:~]# semanage port -a -t tangd_port_t -p tcp 7500
# This should probably be limited to only the local subnet, luckily all servers are currently behind NAT
[root@tang:~]# firewall-cmd --add-port=7500/tcp
[root@tang:~]# firewall-cmd --add-port=7500/tcp --permanent
[root@tang:~]# systemctl enable tangd.socket
# add the override for port 7500 (see the RedHat guide)
[root@tang:~]# systemctl edit tangd.socket
[root@tang:~]# systemctl daemon-reload
[root@tang:~]# systemctl start tangd.socket
[root@tang:~]# /usr/libexec/tangd-keygen /var/db/tang
# We need to save this for later
[root@tang:~]# tang-show-keys 7500
sN8bs7tkHqdKQii2DNmqYz6nluQ
Install clevis
[root@clevis:~]# dnf install -y clevis
Setting clevis properties on ZFS dataset
I'm not how its stored when using LUKS, but I am assuming the necessary value (jwe) doesn't need to be encrypted (otherwise you would still need to enter keys manually).
Encrypting the pasword
Verify that we're using the correct password.
[root@clevis:~]# echo -n 'testpass' | zfs load-key -n rpool
1 / 1 key(s) successfully verified
I'm using the IP, since I'm not sure if something like /etc/hosts is available in the initramfs where clevis will be run.
# Using the thumprint we got earlier
[root@clevis:~]# echo -n 'testpass' | clevis encrypt tang '{"url": "http://192.168.122.18:7500", "thp": "sN8bs7tkHqdKQii2DNmqYz6nluQ"}' > password.jwe
Store the JWE as a ZFS property
I'm using ZFS's User Properties[1] to save this value:
# Explicitly specify that we'd like to decrypt this, something like autodecrypt=yes or onboot=yes or when=onboot might be better.
# A property setting an order might also be useful when using multiple pools/datasets e.g. latchset.clevis:priority=0
zfs set latchset.clevis:decrypt=yes rpool
zfs set latchset.clevis:jwe=$(cat password.jwe) rpool
# we should not need to decrypt child datasets unless explicitly specified with `zfs set latchset.clevis:decrypt=yes rpool/some/dataset` and another `latchset.clevis:jwe`
# Therefore we skip inherited values (i.e. only check locally set ones)
[root@clevis:~]# zfs get latchset.clevis:decrypt -s local
NAME PROPERTY VALUE SOURCE
rpool latchset.clevis:decrypt yes local
[root@clevis:~]# zfs get latchset.clevis:jwe -s local
NAME PROPERTY VALUE SOURCE
rpool latchset.clevis:jwe [long JWE string] local
Check if it's correctly stored:
# This currently assumes only one dataset has decrypt=yes set, this should be made more flexible.
[root@clevis:~]# zfs get -H latchset.clevis:decrypt -s local | awk '$3=="yes"{print $1} | xargs zfs list -H -o latchset.clevis:jwe > zfs-out.jwe
# -Z ignores newline at EOF differences
[root@clevis:~]# diff -Z password.jwe zfs-out.jwe && echo 'identical'
identical
Test clevis for decryption
[root@clevis:~]# zfs get -H latchset.clevis:decrypt -s local | awk '$3=="yes"{print $1}' | xargs -I POOLNAME sh -c "zfs list -H -o latchset.clevis:jwe POOLNAME | clevis decrypt | zfs load-key -n POOLNAME"
1 / 1 key(s) successfully verified
Updating initramfs
Add dracut config
Add extra config for dracut
[root@clevis:~]# tail /etc/dracut.conf.d/*
==> /etc/dracut.conf.d/20-network.conf <==
kernel_cmdline=" ip=192.168.122.242 netmask=255.255.255.0 gateway=192.168.122.1 nameserver=192.168.122.1 "
==> /etc/dracut.conf.d/30-clevis.conf <==
add_dracutmodules+=" clevis "
==> /etc/dracut.conf.d/50-zfs.conf <==
add_dracutmodules+=" zfs "
Add network settings to bootloader
[root@clevis:~]# cat /boot/loader/entries/centos.conf
title CentOS 8 ZFS
version zfs-4.18.0-193.14.2.el8_2.x86_64
linux /vmlinuz-4.18.0-193.14.2.el8_2.x86_64
initrd /initramfs-4.18.0-193.14.2.el8_2.x86_64.img
options rd.auto=1 ip=192.168.122.242 netmask=255.255.255.0 gateway=192.168.122.1 nameserver=192.168.122.1 root=ZFS=rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195 rw
Update systemd zfs-load-key-rpool
N.B.: I'm not sure if editing this service this is the correct approach, since the boot process keeps asking me to enter the password by hand. The systemd-cat echo commands also do not appear. I think this file (the original) is auto-generated by systemd/dracut somehow, and I need a way to hook into that.
Run systemctl edit zfs-load-key-rpool.service, and enter the following:
[Service]
ExecStart=
ExecStart=/bin/sh -c 'set -eu;keystatus="$$(/sbin/zfs get -H -o value keystatus "rpool")";[ "$$keystatus" = "unavailable" ] || exit 0; systemd-cat echo '######## trying clevis #########'; /sbin/zfs list -H -o latchset.clevis:jwe rpool | /bin/clevis decrypt | /sbin/zfs load-key rpool && exit 0; systemd-cat echo '###### trying password ######'; c>
This does have the hardcoded pool name ("rpool") set, but that was already the case.
In more readable form:
set -eu;
keystatus="$$(/sbin/zfs get -H -o value keystatus "rpool")";
[ "$$keystatus" = "unavailable" ] || exit 0;
systemd-cat echo '######## trying clevis #########';
/sbin/zfs list -H -o latchset.clevis:jwe rpool | /bin/clevis decrypt | /sbin/zfs load-key rpool)" && exit 0;
systemd-cat echo '###### trying password ######';
count=0;
while [ $$count -lt 3 ];
do systemd-ask-password --id="zfs:rpool" "Enter passphrase for rpool:" | /sbin/zfs load-key "rpool" && exit 0
count=$$((count + 1));
done;
exit 1;
Inspect the old value next to the new overridden value:
[root@clevis:~]# systemctl cat zfs-load-key-rpool.service
Install dracut modules
[root@clevis:~]# dnf install -y clevis-dracut zfs-dracut
Update Initramfs
# grep used for brevity
[root@clevis:~]# dracut -vf |& grep 'module:\|img\|zfs\|clevis'
dracut: zfsexpandknowledge: pool rpool has device /dev/disk/by-partlabel/rpool (which resolves to /dev/vda3)
dracut: zfsexpandknowledge: block devices backing ZFS dataset /: /dev/vda3
dracut: zfsexpandknowledge: host device /dev/vda1
dracut: zfsexpandknowledge: host device /dev/vda3
dracut: zfsexpandknowledge: device /dev/vda of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda3 of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda1 of type vfat
dracut: zfsexpandknowledge: pool rpool has device /dev/disk/by-partlabel/rpool (which resolves to /dev/vda3)
dracut: zfsexpandknowledge: block devices backing ZFS dataset /: /dev/vda3
dracut: zfsexpandknowledge: host device /dev/vda1
dracut: zfsexpandknowledge: host device /dev/vda3
dracut: zfsexpandknowledge: device /dev/vda of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda3 of type zfs_member
dracut: zfsexpandknowledge: device /dev/vda1 of type vfat
dracut: *** Including module: bash ***
dracut: *** Including module: systemd ***
dracut: *** Including module: systemd-initrd ***
dracut: *** Including module: nss-softokn ***
dracut: *** Including module: rngd ***
dracut: *** Including module: i18n ***
dracut: *** Including module: network-legacy ***
dracut: *** Including module: network ***
dracut: *** Including module: ifcfg ***
dracut: *** Including module: drm ***
dracut: *** Including module: plymouth ***
dracut: *** Including module: clevis ***
dracut: *** Including module: prefixdevname ***
dracut: *** Including module: crypt ***
dracut: *** Including module: dm ***
dracut: *** Including module: kernel-modules ***
dracut: *** Including module: kernel-modules-extra ***
dracut: *** Including module: kernel-network-modules ***
dracut: *** Including module: qemu ***
dracut: *** Including module: zfs ***
dracut: *** Including module: rootfs-block ***
dracut: *** Including module: terminfo ***
dracut: *** Including module: udev-rules ***
dracut: *** Including module: biosdevname ***
dracut: *** Including module: dracut-systemd ***
dracut: *** Including module: usrmount ***
dracut: *** Including module: base ***
dracut: *** Including module: fs-lib ***
dracut: *** Including module: microcode_ctl-fw_dir_override ***
dracut: microcode_ctl module: mangling fw_dir
dracut: *** Including module: shutdown ***
dracut: *** Creating image file '/boot/initramfs-4.18.0-193.14.2.el8_2.x86_64.img' ***
dracut: *** Creating initramfs image file '/boot/initramfs-4.18.0-193.14.2.el8_2.x86_64.img' done ***
Test it
[root@clevis:~]# systemctl reboot
The password prompt still appears, it seems like there is still something missing.
Summary
This is what I know after testing with 2 VM's
What works:
Having access to JWE at boot time
By making the JWE value available at boot time in ZFS metadata: zfs list -o name,latchset.clevis:jwe rpool
Network connection at boot
This is done by adding the kernel_cmdline value (in /etc/dracut.conf.d/20-network.conf to the bootloader's options.
It now responds to pings while waiting for the ZFS passphrase.
Manually booting
By adding rd.break=pre-mount to the bootloader's optionsI am able to boot manually using clevis instead of typing the passphrase myself.
# for some reason rpool is imported without altroot set, so we reimport it to set it.
~# zpool export rpool
~# zpool import rpool -R /sysroot
# Test the key
~# zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key -n rpool
1 / 1 key(s) successfully verified
# load the key
~# zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key rpool
# mount /
~# zfs mount rpool/ROOT/centos_1b9910ca-f889-4b64-8942-a139e62b1195
# mount child datasets
~# zfs mount -a
# Booting resumes here
~# systemctl switch-root /sysroot
What not yet works:
I still need to find a way to make sure clevis is actually used within the initramfs, i.e. have it run zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key rpool
Possible solutions
Order of dracut modules being loaded
I don't think it matters, as long as both clevis, zfs and the network connection are loaded before zfs load-key rpool is being run.
Missing parameters/configuration
There might be something needed to tell dracut to run:
zfs list -H -o latchset.clevis:jwe rpool | clevis decrypt | zfs load-key rpool
instead of:
systemd-ask-password --id="zfs:rpool" "Enter passphrase for rpool:" | zfs load-key "rpool"
This might be dracut configuration (i.e. in /etc/dracut.conf.d/) or an extra kernel parameter (i.e. options in /boot/loader/entries/centos.conf).
zfs-mount-generator
It might be possible that zfs-mount-generator is making this harder than it needs to be. Maybe using zfs-mount.service will help?
References
[1]: (see: man zfs | less +"/^ User Properties" or the Oracle Documentation on User Properties which is the same as on Linux) to save these values.