[Pkg-xen-devel] Bug#452721: Bug#452721: #452721 moreinfo?

Elliott Mitchell ehem+debian at m5p.com
Mon Sep 27 22:16:53 BST 2021


On Mon, Sep 27, 2021 at 05:13:04PM +0000, Andy Smith wrote:
> On Sun, Sep 26, 2021 at 08:07:58PM -0700, Elliott Mitchell wrote:
> > During a full downtime when all VMs were fully shut down, this effect
> > can be achieved by including numbers in the filename.  Say
> > /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg,
> > /etc/xen/auto/9_everything_else.cfg.
> 
> I also do this to control start up order, though I use a prefix of
> NNN-.
> 
> The main missing functionality from my point of view is not being
> able to control the order of save/shutdown. As you say the script
> for saving everything or shutting everything down just does a read
> of all existing domids and does the action on them one by one in
> increasing order.

Seems we're running into the same problems, coming up with the same
first-tier workaround and now we all need a common complete solution.


> I think the "auto" directory is a pretty good and simple interface,
> so how about using it for save/shutdown as well? So, instead of just
> enumerating all running domids, enumerate all files in
> /etc/xen/auto/ in REVERSE order, parsing the name of the domain out
> of each one and doing the action on that name. When all files have
> been exhausted, THEN do the action on any remaining running domains.
> 
> This has the advantages of:
> 
> - still working even if administrator does not use ordering in
>   /etc/xen/auto. Filename format there does not change from what it
>   is now, where ordering is already possible but is optional.
> 
> - being quite obvious behaviour - save/shutdown order is reverse of
>   start order.

This though requires something which understands the format of those
files, can retrieve name or uuid, and then resolve that to something
suitable for `xl {save|shutdown}`.  Alternatively this requires
`xl {save|shutdown}` to be able to select the target domain based on the
configuration file (documentation reads like this might be halfway
implemented).

Additionally this needs a tool to identify domains which are NOT listed
in /etc/xen/auto/ then do save/shutdown on them first.


> That seems like a good minimal improvement, but if one wanted to
> explicitly control save/shutdown order then perhaps the next
> enhancement could be an /etc/xen/shutdown/ directory with similar
> purpose to the "auto" one? i.e.:
> 
> 1. Enumerate files in "shutdown" directory in reverse order, getting
>    name from each and doing shutdown action on it
> 
> 2. If there were no files there, instead use "auto" directory for
>    this purpose
> 
> 3. Then do shutdown action on every remaining running domain as
>    usual
> 
> Again this still results in everything getting a shutdown action if
> administrator does not want to do any of this.
> 
> It's an open question for me whether step 2 (falling back to
> enumerating "auto" directory) only happens when "shutdown" directory
> is empty or if it should happen all of the time.

This strikes me (note, I am NOT a Debian maintainer) as likely to involve
too much work for too little gain.  For complex setups this won't be
enough, for simple setups this will be overkill.


> > If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save;
> > they will be paused in identifier order, but saved by domain name.  When
> > scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs
> > are restored in a distinct (and often problematic) order.
> > 
> > A minimal solution would be for `xendomains` to save VMs in
> > /var/lib/xen/save <domId>-<name> and then use `sort -n` during restore.
> 
> If by this you mean it would be good if the "save all" action picked
> the filename from the filename in the "auto" directory, to replicate
> that directory's ordering, then I agree.
> 
> If however you mean the actual Xen domid of the running domain then
> I'm not sure what that would buy us. If I had a domain with a
> filename of 010-ldap0.cfg it might get strted first and have domid
> 1, but then I reboot it and it has domid 99, I wouldn't want it
> saved as /var/lib/xen/save/99-ladp0, I'd still want it saved as
> /var/lib/xen/save/010-ladp0,

Minimal meaning very simple to implement, but very limited.

The idea is domains which start later get higher domain Ids.  As long as
crucial domains rarely get restarted, they will tend to keep low domain
Ids.  This fails when a crucial domain gets restarted late due to some
reason, but this might capture enough low-hanging fruit to be worthwhile.


> > A better approach would be to have a LSB style header specifying
> > dependencies to flag VMs which should be saved or shutdown late,
> > and VMs which should be saved or shutdown early.
> > 
> > A ridiculous overkill solution might be to turn the /etc/xen/*.cfg
> > files into full init scripts.
> 
> I don't think that we should be proposing to change the config
> language of upstream Xen or diverge from how domains are usually
> configured with upstream Xen. I think that we can get a lot of
> improvement without modifying the format of the config files and
> only by changing how the start and shutdown scripts work.
> 
> At the moment domain start and shutdown is serial in nature and can
> take a long time. I don't know if there is any scope for improving
> that in scripts, or whether it's an upstream conversation, either
> way not for this bug. But because of the lengthy process I do have
> an interest in starting my important domains first and shutting them
> down last.

I'm pretty sure #452721 is tagged "upstream" since the `xendomains`
originates from the Xen project.  If a solution is likely to be pushed
back to the Xen project, then nearly anything is on the table.  Just an
issue of how much time is needed.

What I was suggesting was NOT to modify the configuration format.  The
idea was a program could treat the domain configuration as if it was a
script (get it from argv[0]), then simply implement start/stop (roughly
system(`xl create $0`)).

Ultimately I suspect the domain configuration files need to add an
"init_handler" setting for specifying a program to be used for
start/stop.  Then "init_config" setting for configuring that program.

If this is saved in the runtime configuration (`xl list -l`), then
unhandled domains are readily identified by the lack of this
configuration.


> While being able to control ordering of shutdown would be NICE, it
> seems like this would be catering to the administrator of a single
> dom0 that can't make services redundant. This raises the question
> of what are such administrators doing about the risk of their one
> dom0 host becoming unavailable and all its domains with it?

I suspect this crowd are the ones Debian should be catering most to.
Large enough to have some fairly complicated needs, but small enough not
to have a full IT department.  There are also a very large number of
people in this category.


> I also feel that trying to add dependency logic into the
> configuration is stepping into territory best left to actual cluster
> management software, that says what order things should start/stop
> in, how many copies of them need to run, where they can be allowed
> to run for redundancy purposes, etc.

True, though a little bit would help many people.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg at m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



More information about the Pkg-xen-devel mailing list