-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Overview
Relying on network-online.target only is problematic as documented in several places.
The uupd.{timer,service} units rely on network-online.target. But that does not mean what it sounds like. It does not mean that the network is ready for traffic. See https://systemd.io/NETWORK_ONLINE/.
I should mention that I use wifi exclusively - not wired ethernet.
When opening the laptop lid (Wake from Sleep), I keep seeing errors like this:
Jan 15 04:57:11 nitro5 uupd[157219]: {"level":"ERROR","msg":"Hardware checks failed","error":"Network, returned error: network not online"}
Hence, the need for the Retry with Backoff pattern to be implemented in the uupd code itself.
Note that the Circuit Breaker pattern might also be beneficial for situations like CloudFlare introducing a change that disallowed connectivity to dependent assets.
Example Use Case
I have written software, that is similarly scheduled by systemd where the solution was to implement the Retry with Backoff pattern to prevent false positive reporting.
The systemd units also rely on network-online.target. And because requirements allowed for it, also sets the BindsTo=gnome-session.target property as I was guided in a forum somewhere; although I was not able to make that advice work for me. I do realize that the uupd requirements do not allow for assuming an authenticated gnome session.
Once I implemented the Retry with Backoff pattern in the rule (written in python) needing to use ssh reliably, I started to see log entries like the following and resilience was improved.
Feb 02 05:24:36 nitro5 upd_monitor.sh[390387]: 2026-02-02 05:24:36,796 - steps - WARNING - process_rule - Worker-04: <class '__mp_main__.ProcessNonZeroRetcodeError'>, Retrying in 3 seconds...
Feb 02 05:24:36 nitro5 upd_monitor.sh[390387]: <class '__mp_main__.ProcessNonZeroRetcodeError'>, Retrying in 6 seconds...
Feb 02 05:24:36 nitro5 upd_monitor.sh[390387]: 2026-02-02 05:24:36,796 - steps - DEBUG - process_rule - Worker-04: Processing pi-cluster-health ... done
Feb 02 05:24:36 nitro5 upd_monitor.sh[390326]: 2026-02-02 05:24:36,796 - steps - INFO - process - Received result from Worker-04
Feb 02 05:24:36 nitro5 upd_monitor.sh[390326]: 2026-02-02 05:24:36,831 - utils - DEBUG - _ - process done
Feb 02 05:24:36 nitro5 systemd[3708]: upd-indicator-monitor.service: Consumed 3.901s CPU time, 656.2M memory peak.
Questions
- Does this resound with the team's sensibilities regarding industry best practices?
- Do you agree that the
uupdgolang code is the correct architectural location for implementing these kinds of reliability / resilience policies? - Is the Wake from Sleep event something you want to officially support? As I mentioned elsewhere, using
ujust updateas a manual workaround has worked fine for me throughout 2025.
Disclaimer
Although I have quite a bit of experience guiding teams to implement reliability and resilience policies in their application architecture on my day job, I will not pretend to claim a similar level of familiarity here. I.e., my experience guides me to know it is needed, but not how to implement it in a Universal Blue system component.
My hope is to encourage discussion; not complain.
Current Version
$ sudo bootc status
● Booted image: ghcr.io/ublue-os/bluefin-dx-nvidia-open:stable
Digest: sha256:43279a9dfad55057c3f977a33b5d0b64e3e659fb6565c371f449af190dfc20ca (amd64)
Version: 43.20260127 (2026-01-27T01:52:09Z)
Related Issue
Note that part of the problem was resolved with #126.