[Nut-upsuser] Possible to trigger commands based on slave disconnection?

Sun Nov 22 21:53:36 GMT 2020

On November 22, 2020 6:56:28 PM UTC, "Kevin P. Fleming" <kevin at km6g.us> wrote:
>Here's my situation:
>
>APC Smart-UPS 750 powering a rack of equipment. Rack includes a NAS
>(Debian-based) which is connected to the USB port of the UPS and is
>happily running NUT. Rack also includes a router, switch, and various
>other devices, including some Raspberry Pi computers which use the NAS
>as their storage (via NFS) and are powered via PoE (from the switch).
>
>When NUT tells the UPS to shut down, that of course will remove power
>from everything in the rack. When mains power returns, the UPS will
>apply power to everything in the rack. However, I *don't* want power
>applied to the RPis, because the NAS takes quite some time (60-90
>seconds) to fully boot, and they won't come up properly if they try
>too early.
>
>In addition, there is another RPi in another location, not connected
>to this UPS, but which also uses the NAS as its file storage and thus
>must be shut down if the NAS goes down.
>
>At this point I have NUT running on all the systems, and they all
>cleanly shut down when the UPS is heading into 'low battery' mode.
>What I would like to additionally do is send commands to my network
>switches to 'turn off the outlets' for the RPis (disable PoE on the
>ports) as each one disconnects from the NUT master. I can do this over
>SSH quite easily, if I can figure out a way to trigger the SSH command
>to be executed.
>
>I can also add commands to enable these ports in the NAS startup
>sequence, at a point late enough for it to be safe.
>
>So, the question is: is it possible on the NUT master to trigger a
>command to be run when a slave disconnects? If so, I'd check the "FSD"
>flag at that moment, and if the FSD flag is set then that means the
>power is being shut down, so I should disable the PoE port for that
>slave.
>
>I'll also need some way to turn the port back on if NUT entered FSD
>mode but then didn't actually shut down (due to mains power returning
>before the shutdown sequence was completed), so that I don't leave the
>RPis shut down.
>
>_______________________________________________
>Nut-upsuser mailing list
>Nut-upsuser at alioth-lists.debian.net
>https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser

>From a NUT perspective, I think it could be interesting to add drivers that abstract a PoE switch as an ePDU (maybe separate facades for SSH or serial-OOB or SNMP with write access, but same core).

Possibly ports' power and comms can be turned off/on separately at least on some vendors, which may need separate commands on ePDU shim side. Maybe you'd have to save switch config for ports to stay down after reboot. Otherwise it may make sense to put the switch onto a separately manageable outlet group of the UPS so you can turn that one on sone time after total boot by server command or after a delay counted by UPS itself.

NFS is quite a sturdy protocol, at least was in original Sun implementation; we had it running cross-continent over VPN with systems rebooting sometimes (server should persist state somehow though). On the remote RPi's you may want to play with client mount options like 'nointr' and/or 'hard' effectively so client processes would freeze on I/O requests until server becomes functional again. A common (and maybe default now) setup may be an opposite of that, to allow for timeouts and application-driven fault handling.

I don't think upsd currently has a way to call detailed shutdown logic, other than what you specify in SHUTDOWN_CMD - which may be a script, that does the complex logic of tracking accessibility and shutdowns for RPi's, and eventually (all clients down, timeout, UPS charge too low, ...) tells the NFS server and UPS to go down.

Given Debian and systemd, you might define a template to make unit instances (like rpi-power at 13.service) that would power on a port named by that instance when started, power off when stopped, and as Pre*Exec methods might check if the client is alive. Maybe some systemd versions have a more explicit health monitoring framework. That is easy to pluck into dependency chains (start after NUT and NFS, stop before them).

Typical stance for FSD is that it is a "forced shut down" (and often forced externally as a flag raised by UPS by its policy) - once started, you should not abort it leaving systems in undefined midway state. So whatever happens to power, you go down and reboot cleanly. Frameworks like systemd or SMF do take some unpredictability out, but on a distributed rack you'd still have to design very carefully to abort a shutdown (change of runlevel if it is not too late, and maybe make upsd - or even a depleted UPS - believe the state of FSD is no longer relevant so it does not trigger another shutdown) and start all stuff in order.

Hope this helps,
Jim Klimov

--
Typos courtesy of K-9 Mail on my Android