Bug#964444: systemd-timesyncd: time synchronization suddenly stopped working

Thu Jul 16 14:48:27 BST 2020

Control: retitle -1 systemd-timesyncd: poll interval should not be increased during network failures
Control: severity -1 wishlist

On 2020-07-08 21:16:43 +0200, Michael Biebl wrote:
> Am 08.07.20 um 12:35 schrieb Vincent Lefevre:
> > There's something suspicious in the "timedatectl timesync-status"
> > output above:
> > 
> >>>>        Server: (null) (3.debian.pool.ntp.org)   
> > 
> > Currerntly I get:
> > 
> >        Server: 193.52.136.2 (0.debian.pool.ntp.org)
> > 
> > So, why "(null)" above instead of the IP address?
> 
> Dunno, maybe it's a result of the failure to resolve the names via DNS.
> But this does look fishy indeed.

I've done some tests, and I think that the "(null)" was temporary,
but IMHO, the way it is handled is rather bad (see below).

Note: The 5-second delta might be unrelated. Since the connection
was bad, there may have been an inaccuracy; this might be due to the
"maximum acceptable root distance" RootDistanceMaxSec of 5 seconds,
but the documentation is unclear, e.g. what are the consequences?
A 5-second inaccuracy may be OK after a very long period, but would
not be acceptable if the machine has been correctly synchronized
recently. It is also possible that the wifi hotspot I was using at
that time was providing a NTP server[*] and this server did not have
the correct time (the reason for that could again come from the bad
connection).

[*] I currently have an Ethernet connection, and I could check
that the NTP server provided by DHCP is honored.

What I have done is the following: Reconnect to the wifi hotspot of
one of my phones, while the mobile data were disabled. Then, from
what I could see, timedatectl switched to the fallback NTP servers:

FallbackNTPServers=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org

and it tried the one after the other, apparently with cached
IP addresses. For 2.debian.pool.ntp.org, I could see that it
tried at least 3 IP addresses. After some time, it ended up
with "(null) (3.debian.pool.ntp.org)", as I saw above.

Here's what I got:

zira:~> timedatectl timesync-status
       Server: 62.210.244.146 (2.debian.pool.ntp.org)
Poll interval: 2min 8s (min: 32s; max 34min 8s)      
[...]

zira:~> timedatectl timesync-status
       Server: (null) (3.debian.pool.ntp.org)   
Poll interval: 4min 16s (min: 32s; max 34min 8s)
[...]

zira:~> timedatectl timesync-status
       Server: (null) (3.debian.pool.ntp.org)   
Poll interval: 8min 32s (min: 32s; max 34min 8s)
[...]

Then I reenabled the mobile data, and after some time:

zira:~> timedatectl timesync-status
       Server: 37.187.122.11 (0.debian.pool.ntp.org)
Poll interval: 17min 4s (min: 32s; max 34min 8s)    
[...]

The fact that the poll interval increases even when there are
connection issues makes synchronization less likely to have a
chance to work. When I reported the bug, it was 34min 8s (i.e.
the maximum in its default configuration). I suppose that a
workaround would be to change the PollIntervalMaxSec value,
but this would affect the normal usage.

BTW, the way it is handled should be documented (the man pages
don't say anything on its evolution). And I think that failures
should be logged in some way.

I could report upstream bugs (behavior + documentation + logging).

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)