Bug#947936: chrony: Does (still) not start properly on boot on buster
Michael Biebl
biebl at debian.org
Sun Jan 12 22:24:46 GMT 2020
Am 12.01.20 um 23:08 schrieb Vincent Blut:
> On 2020-01-12T20:41+0100, Michael Biebl wrote:
>> Am 12.01.20 um 20:15 schrieb Santiago Vila:
>>> My theory is that this is some kind of race condition.
>>>
>>> I initially cloned the machine from another one where this happened.
>>>
>>> Then I discovered that the problem also happens (randomly) in a brand
>>> new machine if I copy the journal from the "bad" machine.
>>>
>>> However, there is nothing special about the journal (or at least
>>> "journalctl --verify" says it's ok), except maybe that it's several
>>> megabytes long.
>>>
>>> Could it be that systemd spends some time processing the journal at
>>> boot time and this is what triggers the race condition?
>>
>> On my buster system:
>>
>>
>>> an 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking
>>> at job systemd-timesyncd.service/stop conflicted_by=yes
>>> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking
>>> at job systemd-timesyncd.service/start conflicted_by=no
>>> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Fixing
>>> conflicting jobs
>>> systemd-timesyncd.service/stop,systemd-timesyncd.service/start by
>>> deleting job systemd-timesyncd.service/start
>>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job
>>> chrony.service/start conflicted_by=no
>>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job
>>> chrony.service/stop conflicted_by=no
>>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Fixing conflicting
>>> jobs chrony.service/start,chrony.service/stop by deleting job
>>> chrony.service/stop
>>> Jan 12 20:04:46 debian systemd[1]: Found redundant job
>>> systemd-timesyncd.service/stop, dropping from transaction.
>>
>>
>> Those lines are missing on GCE system. Instead I see
>>> Jan 12 17:02:01 d1 systemd[1]: Keeping job
>>> systemd-timesyncd.service/start because of sysinit.target/start
>>> Jan 12 17:02:01 d1 systemd[1]: Keeping job chrony.service/stop
>>> because of systemd-timesyncd.service/start
>>> Jan 12 17:02:01 d1 systemd[1]: systemd-timesyncd.service: Installed
>>> new job systemd-timesyncd.service/start as 119
>>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 82
>>> chrony.service/start finished, result=canceled
>>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Installed new job
>>> chrony.service/stop as 121
>>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 121
>>> chrony.service/stop finished, result=done
>>
>>
>> The problem is, that the transaction is computed *before*
>> ConditionFileIsExecutable=!/usr/sbin/chronyd is evaluated (conditions
>> are evaluation just before the binary is executed) and it might indeed
>> depend on the ordering, in which the jobs are scheduled.
>
> Makes sense. Thanks for debugging this.
>
>> So, the simplest fix would be to drop the line
>>> Conflicts=systemd-timesyncd.service openntpd.service ntp.service
>>> ntpsec.service
>> from chrony.service.
>> This way, systemd will schedule the start of both services.
>> chrony.service will be started properly and for
>> systemd-timesyncd.service the ConditionFileIsExecutable will kick in.
>
> If there is no risk of regression, then I’m all for making this change.
I don't see a risk for a regression, given that systemd in buster ships
disable-with-time-daemon.conf.
So if the drop the Conflicts again, we'd basically have the situation
again as in stretch.
That said, we should probably first upload this change to sid to give it
some wider exposure first.
> We probably should not keep this hack forever, but instead let timedated
> read known NTP implementation unit names from
> usr/lib/systemd/ntp-units.d/*.list since this has been reintroduced in
> systemd 243. I added the necessary bits in chrony 3.5-5.
Hm, I don't see how this change in timedate would actually help in this
situation.
Support for ntp-units.d in timedated/timedatectl just enables that if
you use "timedatectl set-ntp true|false" it will prefer alternatives if
installed.
We don't actually use "timedatectl set-ntp true|false" though in our
maintainer scripts though (and I don't think we should).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20200112/a7700664/attachment.sig>
More information about the Pkg-systemd-maintainers
mailing list