[Debian-ha-maintainers] Bug#887563: corosync prerm will stop pacemaker and not start it again
Nish Aravamudan
nish.aravamudan at canonical.com
Fri Apr 20 17:40:32 BST 2018
Hi Ferenc!
On Fri, Apr 20, 2018 at 7:59 AM, Ferenc Wágner <wferi at niif.hu> wrote:
> Control: fixed -1 2.4.4-1
>
> Nishanth Aravamudan <nish.aravamudan at canonical.com> writes:
>
>> I believe this is because the prerm of corosync.service [...]
>> unconditionally stops corosync for all Debian and Ubuntu releases
>> (as the init script is installed even if unused by systemd). When
>> corosync stops, pacemaker fails to connect to corosync (and the
>> pacemaker systemd unit file specifies that pacemaker Requires corosync)
>> and also stops.
>>
>> When the postinst for corosync runs [...] corosync will start, but
>> there is no connection between corosync starting (systemd or SysV) and
>> pacemaker.
>
> Right.
>
>> I think there are two necessary changes to the packaging/upstream to fix
>> this:
>>
>> 1) The systemd unit file should indicate pacemaker is PartOf corosync,
>> which will propogate restarts of corosync to pacemaker. This will also
>> propogate stops, but as mentioned above, pacemaker already stops when
>> corosync stops, so I think it's harmless.
>
> How would this help? Currently pacemaker.service Requires
> corosync.service, which is a stronger (stricter) constraint than PartOf
> would be if I read systemd.unit(5) correctly.
You are right, and I'm sorry for not updating the Debian bug sooner --
we ended up moving to "BindsTo" not "PartOf" to resolve this in
Ubuntu.
I spent some time reading the manpage myself and this is how I
interpret the relevant section(s):
Requires=
Configures requirement dependencies on other units. If this unit
gets activated, the units listed here will be activated as well.
...
This means, since pacemaker.service Requires=corosync.service, that
when pacemaker is started, corosync is started (and, iirc, since
pacemaker.service also has an After=corosync.service, systemd will
start corosync.service first).
This does not imply anything further, though, and in the default
package configuration, pacemaker has a *hard* dependency on corosync
(afaict). Thus, the first line of the next paragraph in the manpage is
relevant:
Note that this dependency type does not imply that the other unit
always has to be in active state when this unit is running
This section also mentions the use of BindsTo=, however that only
affects the stopping of units, per the manpage.
Finally, from PartOf:
PartOf=
Configures dependencies similar to Requires=, but limited to
stopping and restarting of units. When systemd stops or restarts
the units listed here, the action is propagated to this unit.
So, actually, PartOf is *different* than Requires, and in the case of
pacemaker and corosync's dependency relationship, helps reflect the
other half of the requirements, but not the existing ones :)
What we want to express (I believe) is:
1) Corosync can be started/stopped on its own
2) If pacemaker is started, corosync must be started
3) If corosync is restarted, pacemaker should be restarted
pacemaker.service Requires=corosync.service says "When pacemaker is
started, corosync should be started. When corosync is stopped,
pacemaker is stopped."
pacemaker.service PartOf=corosync.service says "When corosync is
restarted, pacemaker is restarted. When corosync is stopped, pacemaker
is stopped."
pacemaker.service BindsTo=corosync.service says "Every state
transition corosync goes through, pacemaker will also go through."
>> Additionally, the SysV init file should be updated to check if the
>> pacemaker SysV status was running before stopping corosync in the
>> restart path and start pacemaker as well after starting corosync.
>
> I don't intend to go there. If you stop Corosync under Pacemaker,
> Pacemaker will fail and the node will be fenced. Systemd helps with
> this by cleanly stopping Pacemaker (and any other service declaring a
> Requires relation to Corosync) beforehand; SysV init has no comparable
> mechanisms. And you can't expect the Corosync init script take care of
> all possible dependent services (Pacemaker, DLM, cLVM, corosync-notifyd,
> whatever). This is part of the reason why I don't really support SysV
> init in the HA stack.
Yeah, I'm fine with this; I only mentioned it for completeness wrt.
the ordering.
>> 2) d/rules should call dh_installinit with --restart-after-upgrade. This
>> is the default in compat >= 10 (2.4.2-3 is still at 9). That will change
>> the prerm and postinst to not stop/start the service on upgrade, but
>> simply restart it in the postinst (removals will still stop the
>> service).
>
> Corosync 2.4.4-1 has switched to compat 11, so this is done.
Great!
>> Now, neither of these actually fix the existing packages unfortunately,
>> which will stop pacemaker on the upgrade to a fixed package and thus
>> stop pacemaker. I have no idea if there actually is any way to fix this
>> for existing packages, since the 'old' prerm will be invoked by dpkg on
>> the upgrade path.
>
> I don't find this a too serious problem. Inconvenient, yes, but if
> you're running Corosync, then you probably have a highly available setup
> where even a prolonged node outage does not lead to service interruption.
> Your monitoring system delivers a warning, you start Pacemaker or reboot
> and everything is back to normal.
Agreed, and, in theory if a fix lands, its the last time this happens
:) Note that in Ubuntu, we added a special run-time flag between
pacemaker and corosync that we will drop once 18.04 releases, to
indicate corosync needs to restart pacemaker. This requires lockstep
uploading (and appropriate breaks, etc) between the two packages.
Thanks,
Nish
More information about the Debian-ha-maintainers
mailing list