Bug#947936: chrony: Does (still) not start properly on boot on buster
Vincent Blut
vincent.debian at free.fr
Sun Jan 12 23:05:54 GMT 2020
On 2020-01-12T23:24+0100, Michael Biebl wrote:
>Am 12.01.20 um 23:08 schrieb Vincent Blut:
>> On 2020-01-12T20:41+0100, Michael Biebl wrote:
>>> Am 12.01.20 um 20:15 schrieb Santiago Vila:
>>>> My theory is that this is some kind of race condition.
>>>>
>>>> I initially cloned the machine from another one where this happened.
>>>>
>>>> Then I discovered that the problem also happens (randomly) in a brand
>>>> new machine if I copy the journal from the "bad" machine.
>>>>
>>>> However, there is nothing special about the journal (or at least
>>>> "journalctl --verify" says it's ok), except maybe that it's several
>>>> megabytes long.
>>>>
>>>> Could it be that systemd spends some time processing the journal at
>>>> boot time and this is what triggers the race condition?
>>>
>>> On my buster system:
>>>
>>>
>>>> an 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking
>>>> at job systemd-timesyncd.service/stop conflicted_by=yes
>>>> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking
>>>> at job systemd-timesyncd.service/start conflicted_by=no
>>>> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Fixing
>>>> conflicting jobs
>>>> systemd-timesyncd.service/stop,systemd-timesyncd.service/start by
>>>> deleting job systemd-timesyncd.service/start
>>>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job
>>>> chrony.service/start conflicted_by=no
>>>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job
>>>> chrony.service/stop conflicted_by=no
>>>> Jan 12 20:04:46 debian systemd[1]: chrony.service: Fixing conflicting
>>>> jobs chrony.service/start,chrony.service/stop by deleting job
>>>> chrony.service/stop
>>>> Jan 12 20:04:46 debian systemd[1]: Found redundant job
>>>> systemd-timesyncd.service/stop, dropping from transaction.
>>>
>>>
>>> Those lines are missing on GCE system. Instead I see
>>>> Jan 12 17:02:01 d1 systemd[1]: Keeping job
>>>> systemd-timesyncd.service/start because of sysinit.target/start
>>>> Jan 12 17:02:01 d1 systemd[1]: Keeping job chrony.service/stop
>>>> because of systemd-timesyncd.service/start
>>>> Jan 12 17:02:01 d1 systemd[1]: systemd-timesyncd.service: Installed
>>>> new job systemd-timesyncd.service/start as 119
>>>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 82
>>>> chrony.service/start finished, result=canceled
>>>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Installed new job
>>>> chrony.service/stop as 121
>>>> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 121
>>>> chrony.service/stop finished, result=done
>>>
>>>
>>> The problem is, that the transaction is computed *before*
>>> ConditionFileIsExecutable=!/usr/sbin/chronyd is evaluated (conditions
>>> are evaluation just before the binary is executed) and it might indeed
>>> depend on the ordering, in which the jobs are scheduled.
>>
>> Makes sense. Thanks for debugging this.
>>
>>> So, the simplest fix would be to drop the line
>>>> Conflicts=systemd-timesyncd.service openntpd.service ntp.service
>>>> ntpsec.service
>>> from chrony.service.
>>> This way, systemd will schedule the start of both services.
>>> chrony.service will be started properly and for
>>> systemd-timesyncd.service the ConditionFileIsExecutable will kick in.
>>
>> If there is no risk of regression, then I’m all for making this change.
>
>I don't see a risk for a regression, given that systemd in buster ships
>disable-with-time-daemon.conf.
>So if the drop the Conflicts again, we'd basically have the situation
>again as in stretch.
Not for chrony and openntpd. Both were already in conflict with
systemd-timesyncd.service in Stretch.
>That said, we should probably first upload this change to sid to give it
>some wider exposure first.
Sure.
>> We probably should not keep this hack forever, but instead let
>> timedated
>> read known NTP implementation unit names from
>> usr/lib/systemd/ntp-units.d/*.list since this has been reintroduced in
>> systemd 243. I added the necessary bits in chrony 3.5-5.
>
>Hm, I don't see how this change in timedate would actually help in this
>situation.
>Support for ntp-units.d in timedated/timedatectl just enables that if
>you use "timedatectl set-ntp true|false" it will prefer alternatives if
>installed.
Doesn’t systemd-timesyncd look for foreign services in ntp-units.d/ when
starting?
I thought that was the case and that it remained inactive in case an
NTP implementation with a higher priority was found there.
>We don't actually use "timedatectl set-ntp true|false" though in our
>maintainer scripts though (and I don't think we should).
Agreed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-systemd-maintainers/attachments/20200113/73bee20e/attachment.sig>
More information about the Pkg-systemd-maintainers
mailing list