[Debian-ha-maintainers] Bug#887563: corosync prerm will stop pacemaker and not start it again

Ferenc Wágner wferi at niif.hu
Sat Apr 21 23:19:37 BST 2018


Nish Aravamudan <nish.aravamudan at canonical.com> writes:

> I spent some time reading the manpage myself and this is how I
> interpret the relevant section(s):
>
>      Requires=
>            Configures requirement dependencies on other units. If this unit
>            gets activated, the units listed here will be activated as well.
> ...
>
> This means, since pacemaker.service Requires=corosync.service, that
> when pacemaker is started, corosync is started (and, iirc, since
> pacemaker.service also has an After=corosync.service, systemd will
> start corosync.service first).

Agreed.

> This does not imply anything further

Not agreed (if you mean "Requires" under "this").  Version 232-25+deb9u2
of the systemd.unit man page continues with:

  If one of the other units gets deactivated or its activation fails,
  this unit will be deactivated.

> and in the default package configuration, pacemaker has a *hard*
> dependency on corosync (afaict).

Even: Pacemaker always has a hard dependency on Corosync in Debian.  We
don't compile in support for any other messaging layers.  This means
Pacemaker can't start without Corosync and exits immediately if Corosync
is stopped under it.

> Thus, the first line of the next paragraph in the manpage is relevant:
>
>   Note that this dependency type does not imply that the other unit
>   always has to be in active state when this unit is running

I couldn't find this text, but my understanding is that systemd won't do
anything if a Required unit suddenly stops on its own (no systemd job
here).  This is what BindsTo adds to Requires, but in our case it does
not matter anyway, as Pacemaker immediately exits if Corosync stops.

This exit is not particulary pretty, though.  The pacemakerd master
daemon notices the loss of Corosync connection and commences a regular
shutdown or the constituent daemons.  However, most daemons themselves
also notice the loss of Corosync connection (or the exit of their peer
daemons) and exit with various failure codes.  In the end, pacemakerd
seems to ignore these errors and exits successfully.

This successful exit is strange in my opinion, but at least it does not
let the Restart=on-failure setting of Pacemaker muddy the waters even
more.

>   PartOf=
>     Configures dependencies similar to Requires=, but limited to
>     stopping and restarting of units.  When systemd stops or restarts
>     the units listed here, the action is propagated to this unit.
>
> So, actually, PartOf is *different* than Requires, and in the case of
> pacemaker and corosync's dependency relationship, helps reflect the
> other half of the requirements, but not the existing ones :)

I can't follow you here.  My reading is that PartOf is less than
Requires, because starting a PartOf something does not start that
something itself.  However, stopping that something stops all PartsOf
it, just like as if it was Required.

> What we want to express (I believe) is:
>
> 1) Corosync can be started/stopped on its own
> 2) If pacemaker is started, corosync must be started

Yes.  And if Corosync is stopped, Pacemaker must be stopped beforehand.

> 3) If corosync is restarted, pacemaker should be restarted

Rather: if Corosync is restarted, Pacemaker must be stopped beforehand
and started afterwards.  Otherwise you'll get fenced.

> pacemaker.service Requires=corosync.service says "When pacemaker is
> started, corosync should be started. When corosync is stopped,
> pacemaker is stopped."

Yes.  Though proper ordering is necessary as well, so Pacemaker needs an
After=corosync.service directive.  With that in place, Pacemaker is
stopped before Corosync is restarted, and started afterwards.  So this
achieves 1), 2) and my version of 3), which is exactly what I want.

> pacemaker.service PartOf=corosync.service says "When corosync is
> restarted, pacemaker is restarted. When corosync is stopped, pacemaker
> is stopped."

These are also true when Pacemaker Requires=corosync.service.  But the
start constraint is not present with PartOf.

> pacemaker.service BindsTo=corosync.service says "Every state
> transition corosync goes through, pacemaker will also go through."

I'd rather say: BindsTo implies all constraints Requires does, and
another one, which is very different in nature: "that this unit is
stopped when any of the units listed suddenly disappears."  Note that
all other dependencies are between systemd *jobs*, while this one
involves a *state*.
-- 
Regards,
Feri



More information about the Debian-ha-maintainers mailing list