Bug#947936: chrony: Does (still) not start properly on boot on buster
Vincent Blut
vincent.debian at free.fr
Sun Jan 12 22:08:43 GMT 2020
On 2020-01-12T20:41+0100, Michael Biebl wrote:
>Am 12.01.20 um 20:15 schrieb Santiago Vila:
>> My theory is that this is some kind of race condition.
>>
>> I initially cloned the machine from another one where this happened.
>>
>> Then I discovered that the problem also happens (randomly) in a brand
>> new machine if I copy the journal from the "bad" machine.
>>
>> However, there is nothing special about the journal (or at least
>> "journalctl --verify" says it's ok), except maybe that it's several
>> megabytes long.
>>
>> Could it be that systemd spends some time processing the journal at
>> boot time and this is what triggers the race condition?
>
>On my buster system:
>
>
>> an 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking at job systemd-timesyncd.service/stop conflicted_by=yes
>> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking at job systemd-timesyncd.service/start conflicted_by=no
>> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Fixing conflicting jobs systemd-timesyncd.service/stop,systemd-timesyncd.service/start by deleting job systemd-timesyncd.service/start
>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job chrony.service/start conflicted_by=no
>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job chrony.service/stop conflicted_by=no
>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Fixing conflicting jobs chrony.service/start,chrony.service/stop by deleting job chrony.service/stop
>> Jan 12 20:04:46 debian systemd[1]: Found redundant job systemd-timesyncd.service/stop, dropping from transaction.
>
>
>Those lines are missing on GCE system. Instead I see
>> Jan 12 17:02:01 d1 systemd[1]: Keeping job systemd-timesyncd.service/start because of sysinit.target/start
>> Jan 12 17:02:01 d1 systemd[1]: Keeping job chrony.service/stop because of systemd-timesyncd.service/start
>> Jan 12 17:02:01 d1 systemd[1]: systemd-timesyncd.service: Installed new job systemd-timesyncd.service/start as 119
>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 82 chrony.service/start finished, result=canceled
>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Installed new job chrony.service/stop as 121
>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 121 chrony.service/stop finished, result=done
>
>
>The problem is, that the transaction is computed *before*
>ConditionFileIsExecutable=!/usr/sbin/chronyd is evaluated (conditions
>are evaluation just before the binary is executed) and it might indeed
>depend on the ordering, in which the jobs are scheduled.
Makes sense. Thanks for debugging this.
>So, the simplest fix would be to drop the line
>> Conflicts=systemd-timesyncd.service openntpd.service ntp.service ntpsec.service
>from chrony.service.
>This way, systemd will schedule the start of both services.
>chrony.service will be started properly and for
>systemd-timesyncd.service the ConditionFileIsExecutable will kick in.
If there is no risk of regression, then I’m all for making this change.
>I'm sorry, that I first asked chrony (and other NTP implementations) to
>add a Conflicts=systemd-timesyncd.service, hoping I could get rid of the
>/lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
>hack.
Don’t feel sorry Michael, that was even requested by systemd upstream.
From systemd changelog:
“… Alternative NTP implementations should add a
Conflicts=systemd-timesyncd.service to their unit files to take over and
replace systemd's NTP default functionality.”
>But it appears, this hack is actually much more reliable then using
>Conflicts=
>See
>https://github.com/systemd/systemd/issues/7104#issuecomment-456755980
>which explains the problem. We'd need some kind of asymmetrical conflicts.
We probably should not keep this hack forever, but instead let timedated
read known NTP implementation unit names from
usr/lib/systemd/ntp-units.d/*.list since this has been reintroduced in
systemd 243. I added the necessary bits in chrony 3.5-5.
By the way, it would be great to know if this issue can be triggered by
upgrading chrony to the 3.5-5 version (and with systemd >= 243-1).
>I guess at this point it is best to ask chrony, ntp, openntpd, ntpsec
>and virtualbox [1] to drop the Conflicts= line again.
>Maybe we should even do that for buster via a stable upload?
>
>Thoughts?
Seems good.
>Michael
Cheers,
Vincent
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20200112/86abf3ec/attachment.sig>
More information about the Pkg-systemd-maintainers
mailing list