[Nut-upsuser] Perform synchronous task before shutdown routine

Sun Jun 13 14:38:49 BST 2021

Hello,

Just to clarify: you have one VM acting as a NUT server and talking to the
UPS somehow (networked? pass-through usb/serial media?), another machine
(physical? VM?) acting as the file server for all VMs - is that including
the VM with NUT? And there's also a VMWare server itself? Are VMs also NUT
clients?

If the file server is physical, is it connected to VMWare directly or using
another powered device like a switch (and is that UPS-protected then?)
Similarly for possible networked connection to the UPS.

Is there a particular reason to not have your file server the NUT server as
well (more so if it is a separate physical machine)?

When NUT ecosystem shuts down due to a critical power situation (roughly:
too few power sources remain alive/online, and others are on battery and
low battery), there is a relatively short timeframe that upsmon or similar
(upssched, etc.) clients of secondary ("slave") systems are shutting down
first, and when they are all gone or a configured time limit elapses, the
client on primary ("master") system begins its shutdown. If the setup
involves an UPS smart enough, it is also told to power off and wait for
"wall power" to appear and then go up (maybe when charged to a sufficiently
safe level first).

So in the setup you propose,
1) power goes critical
2) your VMs (except the NUT server VM) begin to shut down - either as NUT
clients, or via vim-cmd, esxcli or similar scripting... and probably
requiring open-vm-tools or similar to process the shutdown request
gracefully - not sure there is a "virtual ACPI power button" in VMWare.
3) all VMs are down, and a time-sensitive end-game occurs:
* NUT server VM tells itself to shut down (and tells the UPS to power off,
hopefully with a sufficiently large timeout if it has a way to set that),
* File server is told to go down (and yank the disk from NUT server VM?)
* VMWare server is told to go down (and yank CPU/RAM/... from NUT server VM)

At least with SSH allowed on VMWare server and some vim-cmd scripting, or
possibly with vmware power shell (never used that myself), you should be
able to find which VMs rely on the data store or "volume" served by the
file server, and which of those VMs are running. If you script that into
the shutdown routine of the file server, and if the shutdown timeframe is
not dictated by its OS (e.g. disable or make very long the timeouts in
systemd), you can block the file server shutdown from proceeding until no
VMs are running served from its disks. Assuming that the networked
connection survives long enough, it can detect that your NUT VM went down
and so proceed with its own shutdown which began when power went critical
for every secondary client.

Maybe similar scripting is feasible in the NUT client that runs on VMWare
server itself to drive its shutdown after all VMs went down. Otherwise, the
file server going down, if it has a session to check VM states anyway,
could tell the VMWare server to initiate its shutdown.

But it all looks like a lot of ropes hanging around and waiting for stuff
to go wrong - and during an outage, I/O stress to flush the disks, delays
of orderly service stacks shutdowns (DB users first, databases next, ...)
and so on, things are very likely to do go wrong :)

It feels that the physical machine, likely your fileserver, is better
positioned to be the NUT server and so shut down last, after all its
clients have gone down or the timeout expired or the battery die<snip>....
At least, in such setup it only relies on connectivity between upsd and
upsmon's, and the amount of clients still alive (last heartbeat recently,
connection not terminated via protocol), and possible loss of connectivity
during a known power outage is something NUT already has logic for.

Jim

On Sat, Jun 12, 2021 at 11:15 PM Arnaldo Viegas de Lima <
arnaldo at viegasdelima.com> wrote:

> HI,
>
> I’m setting NUT to run on a VMWare server (running in a VM) to shutdown
> the server as well as the companion file server (running Linux), that will
> be running upsmon in slave mode. All VMs disks come from the file server,
> so all VMs must properly terminate ahead of the file server.
>
> The order of the tasks needed for the proper shutdown is:
>
> 1-Shutdown all running VMs (except the one controlling the UPS) and
> confirm they are down
> 2-When all VMs are terminated, signal slaves (FSD)
> 3-When the file server is down, properly terminate VMWare.
>
> Any ideas on how to sync these events?
>
> Thanks in advance,
>
> Arnaldo.
> _______________________________________________
> Nut-upsuser mailing list
> Nut-upsuser at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20210613/9e53d184/attachment.htm>