[Nut-upsdev] Questions about failover architecture
Jim Klimov
jimklimov+nut at gmail.com
Mon May 12 16:40:29 BST 2025
Good point, I guess I thought it knew which UPSes are getting the FSD
treatment.
Maybe this should also get revised eventually (e.g. a "primary" monitoring
system not fed by the UPS it is wired to should never FSD itself due to the
outage, but should ensure that UPS gets powered off or power-cycled when no
"secondary" clients remain).
Then maybe not much is needed at this time, and we'd revisit this as we hit
any issues.
One point I remain concerned about is the UPS getting confused about
receiving the same commands over different protocols/media in short
sequence. Or similar but conflicting commands because one driver knows how
to request "load.off" and another knows about "shutdown.stayoff" or
"shotdown.reboot" leading to different code paths in the UPS controller -
can't exclude some firmware getting wedged about this.
In this case `upsdrvctl shutdown` should somehow know to only try one of
the drivers at a time (perhaps `sdorder` can be useful to at least somehow
separate these attempts in time), or to just start both "real" drivers in a
mode with `allow_killpower` (but not with `-k`) so that the
failover/multiplex driver can decide which one to call, and when/whether to
call another. The tool uses a common NUT config parser, so might be taught
about special driver names (failover) and their port values as inputs for
its work or lack thereof. Probably "upsdrvctl shutdown realdriver" should
work as it did, but "upsdrvctl shutdown failoverdriver" or "upsdrvctl
shutdown" (for all UPSes) should do these tricks.
Jim
On Mon, May 12, 2025 at 5:25 PM Sebastian Kuttnig via Nut-upsdev <
nut-upsdev at alioth-lists.debian.net> wrote:
> P.S. To articulate better what I am unclear about from your message:
>
> `nutshutdown` seems to run `@SBINDIR@/upsdrvctl shutdown`.
> From my understanding, this would command - all - `ups.conf` UPS to
> shutdown.
> So this would already include the UPS monitored by any
> failover/multiplexing driver.
> In contrast, `upsdrvctl shutdown` would start up all these drivers again,
> respectively.
>
> What kind of specific orchestration would be required for the "proxying"
> driver?
>
> My initial understanding was it would simply not support a shutdown
> command itself
> and `upsdrvctl` would command all supporting UPS to shutdown/start
> regardless of it.
> So similar to the clone* drivers, which seem not to have any special
> handling there.
>
> Thanks for the detailed responses so far, I am still exploring some areas
> of the NUT ecosystem. :-)
>
> Sebastian
>
> Am Mo., 12. Mai 2025 um 16:02 Uhr schrieb Sebastian Kuttnig <
> sebastian.kuttnig at gmail.com>:
>
>> Hello Jim,
>>
>> I am following the direction of the `clone` drivers, although using the
>> modern `upsdrvquery.c` API.
>> Essentially I am parsing the protocol lines directly reading from the
>> driver sockets, as the clone drivers do.
>>
>> As for ups.conf, I start up as follows without problems:
>>
>> [ups]
>> driver = dummy-ups
>> port = /etc/nut/5E.dev
>>
>> [ups2]
>> driver = dummy-ups
>> port = /etc/nut/APC.dev
>>
>> [failover]
>> driver = failover
>> port = dummy-ups-ups,dummy-ups-ups2
>>
>> Is this what you had in mind?
>> Appreciate any pointers regarding the `upsdrvctl` and `nutshutdown`
>> specifics.
>>
>> Sebastian
>>
>> Am Mo., 12. Mai 2025 um 15:53 Uhr schrieb Jim Klimov <
>> jimklimov+nut at gmail.com>:
>>
>>> Sounds great, thanks for the update!
>>>
>>> For communications with the other drivers (from failover or
>>> multiplexor), I suggest using the local driver socket line the clone*
>>> drivers do: this removes a dependency on `upsd` both for the service
>>> startup (no chicken-and-egg issue of drivers before upsd, but upsd before
>>> the failover driver) and for FSD end-game (after all services stopped, we
>>> just need to start the drivers to talk to the UPS, no need for upsd so they
>>> can see each other).
>>>
>>> Speaking of the latter, the `nutshutdown` script (or `upsdrvctl`) may
>>> need an update to know to start those additional drivers. Or perhaps do
>>> them one by one in case of shutdown command specifically (or any command
>>> generally), until one succeeds.
>>>
>>> Jim
>>>
>>>
>>> On Mon, May 12, 2025 at 3:22 PM Sebastian Kuttnig via Nut-upsdev <
>>> nut-upsdev at alioth-lists.debian.net> wrote:
>>>
>>>> Hello again,
>>>>
>>>> I can report back that my failover driver is progressing nicely, and a
>>>> lot of
>>>> things seem to overlap with what could very well also be useful and
>>>> maybe used
>>>> eventually for a future multiplexing driver.
>>>>
>>>> Basically, my failover logic takes driver names (sockets) in
>>>> comma-separated
>>>> form from the `port` variable and keeps track of these drivers,
>>>> monitoring
>>>> their information and failing over where necessary. Basic configuration
>>>> looks
>>>> like:
>>>>
>>>> [failover]
>>>> driver = failover
>>>> port = dummy-ups-ups,dummy-ups-ups2
>>>>
>>>> I could well picture a multiplexing driver accepting a similar format,
>>>> merging
>>>> variables of both drivers and resolving conflicts by port argument order
>>>> (`dummy-ups-ups`, then `dummy-ups-ups2`) in its most basic form.
>>>>
>>>> Additionally, this could be extended with preference arguments, such as:
>>>>
>>>> prefer.ups.status = dummy-ups-ups
>>>> prefer.battery.voltage.nominal = dummy-ups-ups2
>>>>
>>>> Such definitions would take precedence over the port argument order,
>>>> for more
>>>> granular control. This could be similar to what is used in `ups.conf`
>>>> for
>>>> `default.<variable>` or `override.<variable>`, format-wise.
>>>>
>>>> If either driver were to drop offline, the other driver could take over
>>>> with
>>>> its full set of variables, regardless of other set preferences.
>>>>
>>>> Just a rough sketch of what I have in my mind. Time permitting, I'll
>>>> start
>>>> working on this at some point after I finish my failover explorations.
>>>>
>>>> Sebastian
>>>>
>>>> Am Mo., 12. Mai 2025 um 14:16 Uhr schrieb Greg Troxel via Nut-upsdev <
>>>> nut-upsdev at alioth-lists.debian.net>:
>>>>
>>>>> Wow, that's quite the tale!
>>>>>
>>>>> I take away from this:
>>>>>
>>>>> There is a real example of wanting to merge two information sources.
>>>>>
>>>>> It's very complicated.
>>>>>
>>>>> Anybody wishing to succeed in a very complicated situation needs to
>>>>> really pay attention, to twice as many things as they thought when
>>>>> they started.
>>>>>
>>>>> It's unclear how to generalize from this to a solution that will work
>>>>> for the next person.
>>>>>
>>>>> but if someone wants to write soemthing that is an aggregating driver
>>>>> (looks like a driver, talks to N driver), and do so in a way that
>>>>> doesn't cause any significant pain for others that seems like a fine
>>>>> thing for them to do.
>>>>>
>>>>> I would suggest having some sort of config file that for each variable
>>>>> says which driver to prefer, and some kind of timeout for not available
>>>>> to flip to the backup. I guess for starters, one could configure two
>>>>> drivers in "fancier/less-reliable" and "old-school" slots, and prefer
>>>>> fancier for all except shutdown and status.
>>>>>
>>>>> I fear that the next layer is merging status from two where they don't
>>>>> quite match.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Nut-upsdev mailing list
>>>>> Nut-upsdev at alioth-lists.debian.net
>>>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>>>
>>>> _______________________________________________
>>>> Nut-upsdev mailing list
>>>> Nut-upsdev at alioth-lists.debian.net
>>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>>
>>> _______________________________________________
> Nut-upsdev mailing list
> Nut-upsdev at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20250512/d4fab61a/attachment.htm>
More information about the Nut-upsdev
mailing list