[Pkg-xen-devel] Bug#452721: Bug#452721: #452721 moreinfo?

Andy Smith andy at strugglers.net
Mon Sep 27 18:13:04 BST 2021


Hi Elliott,

On Sun, Sep 26, 2021 at 08:07:58PM -0700, Elliott Mitchell wrote:
> During a full downtime when all VMs were fully shut down, this effect
> can be achieved by including numbers in the filename.  Say
> /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg,
> /etc/xen/auto/9_everything_else.cfg.

I also do this to control start up order, though I use a prefix of
NNN-.

The main missing functionality from my point of view is not being
able to control the order of save/shutdown. As you say the script
for saving everything or shutting everything down just does a read
of all existing domids and does the action on them one by one in
increasing order.

I think the "auto" directory is a pretty good and simple interface,
so how about using it for save/shutdown as well? So, instead of just
enumerating all running domids, enumerate all files in
/etc/xen/auto/ in REVERSE order, parsing the name of the domain out
of each one and doing the action on that name. When all files have
been exhausted, THEN do the action on any remaining running domains.

This has the advantages of:

- still working even if administrator does not use ordering in
  /etc/xen/auto. Filename format there does not change from what it
  is now, where ordering is already possible but is optional.

- being quite obvious behaviour - save/shutdown order is reverse of
  start order.

That seems like a good minimal improvement, but if one wanted to
explicitly control save/shutdown order then perhaps the next
enhancement could be an /etc/xen/shutdown/ directory with similar
purpose to the "auto" one? i.e.:

1. Enumerate files in "shutdown" directory in reverse order, getting
   name from each and doing shutdown action on it

2. If there were no files there, instead use "auto" directory for
   this purpose

3. Then do shutdown action on every remaining running domain as
   usual

Again this still results in everything getting a shutdown action if
administrator does not want to do any of this.

It's an open question for me whether step 2 (falling back to
enumerating "auto" directory) only happens when "shutdown" directory
is empty or if it should happen all of the time.

If you had a dom0 with 100 domains on it but only wanted to control
the order of a few of them, without fallback you would need to copy
ALL the links from auto to shutdown and then change their ordering
because otherwise this would shut down the ones you specified and
then do all the rest in domid order like it does right now.

WITH fallback, you'd get the few you wanted to control done in the
order you expect and then you'd get the order from "auto", which is
appealing but does mean it's going to try to shut down again some
that are already shut down. If there is a relatively quick "is a
domain by this name still running?" check then maybe that's
workable.

> If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save;
> they will be paused in identifier order, but saved by domain name.  When
> scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs
> are restored in a distinct (and often problematic) order.
> 
> A minimal solution would be for `xendomains` to save VMs in
> /var/lib/xen/save <domId>-<name> and then use `sort -n` during restore.

If by this you mean it would be good if the "save all" action picked
the filename from the filename in the "auto" directory, to replicate
that directory's ordering, then I agree.

If however you mean the actual Xen domid of the running domain then
I'm not sure what that would buy us. If I had a domain with a
filename of 010-ldap0.cfg it might get strted first and have domid
1, but then I reboot it and it has domid 99, I wouldn't want it
saved as /var/lib/xen/save/99-ladp0, I'd still want it saved as
/var/lib/xen/save/010-ladp0,

> A better approach would be to have a LSB style header specifying
> dependencies to flag VMs which should be saved or shutdown late,
> and VMs which should be saved or shutdown early.
> 
> A ridiculous overkill solution might be to turn the /etc/xen/*.cfg
> files into full init scripts.

I don't think that we should be proposing to change the config
language of upstream Xen or diverge from how domains are usually
configured with upstream Xen. I think that we can get a lot of
improvement without modifying the format of the config files and
only by changing how the start and shutdown scripts work.

At the moment domain start and shutdown is serial in nature and can
take a long time. I don't know if there is any scope for improving
that in scripts, or whether it's an upstream conversation, either
way not for this bug. But because of the lengthy process I do have
an interest in starting my important domains first and shutting them
down last.

Presently I am handling this by numbering the links in the auto
directory, and using my own script that saves or shuts things down
in the order I want.

I can see how this could be improved but I'm not sure it's worth
spending a large amount of effort on it and/or coming up with
a complicated solution.

I have multiple dom0s so where I have concerns about an essential
service being unavailable I take steps to make that service
redundant and then I don't have to care so much about whether the
domain for that service is shut down 1st or 100th.

While being able to control ordering of shutdown would be NICE, it
seems like this would be catering to the administrator of a single
dom0 that can't make services redundant. This raises the question
of what are such administrators doing about the risk of their one
dom0 host becoming unavailable and all its domains with it?

I also feel that trying to add dependency logic into the
configuration is stepping into territory best left to actual cluster
management software, that says what order things should start/stop
in, how many copies of them need to run, where they can be allowed
to run for redundancy purposes, etc.

Thanks,
Andy



More information about the Pkg-xen-devel mailing list