We monitor our servers with the Prometheus node exporter which provides a node_timex_offset_seconds metric, itself derived from the kernel's timex.offset field (see timex.h).
As far as I can tell, ntpsec and timesyncd update that field, but neither does chrony or ntpd-rs. Or at least the node exporter systematically report this field as being zero. I'm not sure what value that should be since this is of course server-dependent (at least as far as I understand it), but i'm pretty sure the offset is never zero.
Here is, for example, the output of ntp-ctl on my home server:
anarcat@marcos:~$ ntp-ctl status
Synchronization status:
Dispersion: 0.000947s, Delay: 0.029867s
Stratum: 5
Sources:
ntpd-rs.pool.ntp.org:123/51.161.47.242:123 (1): -0.000387±0.001606(±0.044428)s
poll interval: 256s, missing polls: 0
root dispersion: 0.000015s, root delay:0.000015s
ntpd-rs.pool.ntp.org:123/216.128.178.20:123 (2): +0.002955±0.002456(±0.030494)s
poll interval: 1024s, missing polls: 0
root dispersion: 0.008835s, root delay:0.019379s
ntpd-rs.pool.ntp.org:123/208.81.1.244:123 (3): +0.009297±0.001234(±0.050555)s
poll interval: 1024s, missing polls: 0
root dispersion: 0.023193s, root delay:0.033997s
ntpd-rs.pool.ntp.org:123/167.160.187.12:123 (4): +0.002160±0.001569(±0.028479)s
poll interval: 1024s, missing polls: 0
root dispersion: 0.000763s, root delay:0.001389s
Servers:
here, i think the "offset" should be 0.000947s, or about 947µs. But I don't actually know: this interface in the linux kernel is not well documented, to say the least...
Is it possible that ntpd-rs is not reporting those numbers correctly to the linux kernel?
In our case, we have the following alerts we use to monitor for clock errors on our servers (which currently never fire, when using ntpsec). we worry that one of those alert would stop working and fail to detect certain error conditions:
- alert: HostClockSkew
expr: ((node_timex_offset_seconds > 0.05 and deriv(node_timex_offset_seconds[5m]) >= 0) or (node_timex_offset_seconds < -0.05 and deriv(node_timex_offset_seconds[5m]) <= 0)) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 15m
labels:
severity: warning
annotations:
summary: "Host clock skew on {{ $labels.alias }}"
description: |
The kernel's clock is skewed by more than 0.05s ({{ $value | humanizeDuration }})
and continuing to drift. Ensure NTP is configured correctly on {{ $labels.alias }}.
playbook: "https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/incident-response#host-clock-desynchronized"
- alert: HostClockNotSynchronizing
expr: (min_over_time(node_timex_sync_status[1m]) == 0 and node_timex_maxerror_seconds >= 16) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 15m
labels:
severity: warning
annotations:
summary: "NTP is failing to synchronize the host clock on {{ $labels.alias }}"
description: |
Clock not synchronising. Ensure NTP is configured and running properly on {{ $labels.alias }}.
playbook: "https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/incident-response#host-clock-desynchronized"
we've reviewed ntp-rs as a replacement for timesyncd and ntpsec (and also compared it with chrony), and it seems pretty darn good, by the way! see my comment in here : https://gitlab.torproject.org/tpo/tpa/team/-/issues/41936#note_3394545
We monitor our servers with the Prometheus node exporter which provides a
node_timex_offset_secondsmetric, itself derived from the kernel'stimex.offsetfield (see timex.h).As far as I can tell, ntpsec and timesyncd update that field, but neither does chrony or ntpd-rs. Or at least the node exporter systematically report this field as being zero. I'm not sure what value that should be since this is of course server-dependent (at least as far as I understand it), but i'm pretty sure the offset is never zero.
Here is, for example, the output of
ntp-ctlon my home server:here, i think the "offset" should be
0.000947s, or about 947µs. But I don't actually know: this interface in the linux kernel is not well documented, to say the least...Is it possible that ntpd-rs is not reporting those numbers correctly to the linux kernel?
In our case, we have the following alerts we use to monitor for clock errors on our servers (which currently never fire, when using ntpsec). we worry that one of those alert would stop working and fail to detect certain error conditions:
we've reviewed ntp-rs as a replacement for timesyncd and ntpsec (and also compared it with chrony), and it seems pretty darn good, by the way! see my comment in here : https://gitlab.torproject.org/tpo/tpa/team/-/issues/41936#note_3394545