[Nut-upsdev] Questions about failover architecture
Jim Klimov
jimklimov+nut at gmail.com
Mon May 12 11:21:05 BST 2025
> Interesting concept. It would seem obvious that all networking has to be
on UPS to have a reasonable setup.
> I was trying to ask for real examples of situations/problems people
have. I do get the theoretical point.
Got it. One thing that did bite me when I was a young sysadmin was not so
trivial: the UPSes were nice and fancy with SNMP monitoring/management,
powered a rack of indeed redundant servers (and their gear on an ePDU fed
from STS/ATS which used any live input from the two UPSes it got). And
late-endgame UPS shutdowns did not work sometimes.
Some theorizing ang troubleshooting later, the problem was that the
`ups.conf` entry relied on a DNS name for the UPS, and the networking infra
servers were among those to go down - so when the late hook (what is now
`nutshutdown` script) found there's a killpower file and started the
`snmp-ups -k -a upsname` to tell the device to power off/cycle, it could
not resolve the `port` name to connect to.
IIRC the solution applicable there was to make the DHCP/DNS/routing server
become also the NUT server (as far as UPS management is concerned), so it
always knew the name. Unless it killed the DNS service and
name-caching-daemon first. So ultimately the IP address for the UPS was
bolted down as static (non-DHCP or using a DHCP persistent reservation),
and that IP was stored in `/etc/hosts`, and then it all became reliable for
shutdowns.
I think it was then, some 20 years ago, that I wanted to multiplex this
nice snmp-ups driver with lots of reported details, and the serial-port
out-of-band driver that was kind of dumb but I could reliably call for
shutdowns.
There were several similar deployments, on some maybe did I monitor with
both drivers independently, with SNMP said to provide zero power value to
its host, and Serial being the real thing as far as feeding that server
went.
So certainly there are non-trivial problems in this area, and workarounds
that hold for decades at least for that use-case... "there's nothing
sturdier than a temporary solution" ;)
But having a holistic solution would be nicer.
Jim
On Sat, May 10, 2025 at 1:44 PM Greg Troxel <gdt at lexort.com> wrote:
> Jim Klimov <jimklimov+nut at gmail.com> writes:
>
> > I'd say this is not so much about "often enough" (or it would have been
> > addressed earlier), but neither are ethernet cable/port outages yet
> people
> > do LACP/trunking/bonding anyway. And "out of band" serial consoles for
> good
> > measure.
>
> Sure, but that's for things whose failure is a much bigger deal. I'd
> say that for power, the thing people would do is have two UPS units,
> with two monitoring computers, powering dual power strips with computers
> having redundant supplies.
>
> > I assume issues that may be relevant are loss of networking (SNMP et al)
> > during a power outage, when some switch goes off, and the "out of band"
> > link can be used anyway to command the UPS to power cycle.
>
> Interesting concept. It would seem obvious that all networking has to
> be on UPS to have a reasonable setup.
>
> I was trying to ask for real examples of situations/problems people
> have. I do get the theoretical point.
>
> > Another problem is that UPS controllers often do expose different sets of
> > data points or with different precision over various protocols. Even with
> > SNMP I saw vendor MIBs and IETF standard trees show the same data
> > differently (e.g. as integer and as two-digit-after-the-dot floats)
> > simultaneously. Merging this info (even if just for read-only queries)
> > would be quite a practical benefit.
>
> I suppose, and it's a lot of work. Probably architecturally there would
> need to be a driver plumbing object that can be hooked up to N drivers
> and acts like 1 driver, and does the merging. It would need to have
> configurable strategies as there is no good single answer for how to
> merge e.g. something with integer volts at a 5s rate and VVV.VV at a 60s
> rate.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20250512/01b65477/attachment.htm>
More information about the Nut-upsdev
mailing list