[Nut-upsdev] Questions about failover architecture

Jim Klimov jimklimov+nut at gmail.com
Mon May 12 16:54:31 BST 2025


Thinking of it, the service-aware nut-driver-enumerator script has ways to
set dependencies of a driver on other units in the system, depending on
driver type. A couple of lines added there would be useful to ensure the
failover/multiplex driver(s) start after the "real" driver units they
scrape data from, and it would be a "soft" dependency (real driver unit may
crash without impacting the failover/multiplexor).

But I guess this (and a myriad other nuances that would pop up) can also be
tackled after an actual driver appears :)

Jim


On Mon, May 12, 2025 at 5:40 PM Jim Klimov <jimklimov+nut at gmail.com> wrote:

> Good point, I guess I thought it knew which UPSes are getting the FSD
> treatment.
>
> Maybe this should also get revised eventually (e.g. a "primary" monitoring
> system not fed by the UPS it is wired to should never FSD itself due to the
> outage, but should ensure that UPS gets powered off or power-cycled when no
> "secondary" clients remain).
>
> Then maybe not much is needed at this time, and we'd revisit this as we
> hit any issues.
>
> One point I remain concerned about is the UPS getting confused about
> receiving the same commands over different protocols/media in short
> sequence. Or similar but conflicting commands because one driver knows how
> to request "load.off" and another knows about "shutdown.stayoff" or
> "shotdown.reboot" leading to different code paths in the UPS controller -
> can't exclude some firmware getting wedged about this.
>
> In this case `upsdrvctl shutdown` should somehow know to only try one of
> the drivers at a time (perhaps `sdorder` can be useful to at least somehow
> separate these attempts in time), or to just start both "real" drivers in a
> mode with `allow_killpower` (but not with `-k`) so that the
> failover/multiplex driver can decide which one to call, and when/whether to
> call another. The tool uses a common NUT config parser, so might be taught
> about special driver names (failover) and their port values as inputs for
> its work or lack thereof. Probably "upsdrvctl shutdown realdriver" should
> work as it did, but "upsdrvctl shutdown failoverdriver" or "upsdrvctl
> shutdown" (for all UPSes) should do these tricks.
>
> Jim
>
>
>
> On Mon, May 12, 2025 at 5:25 PM Sebastian Kuttnig via Nut-upsdev <
> nut-upsdev at alioth-lists.debian.net> wrote:
>
>> P.S. To articulate better what I am unclear about from your message:
>>
>> `nutshutdown` seems to run `@SBINDIR@/upsdrvctl shutdown`.
>> From my understanding, this would command - all - `ups.conf` UPS to
>> shutdown.
>> So this would already include the UPS monitored by any
>> failover/multiplexing driver.
>> In contrast, `upsdrvctl shutdown` would start up all these drivers again,
>> respectively.
>>
>> What kind of specific orchestration would be required for the "proxying"
>> driver?
>>
>> My initial understanding was it would simply not support a shutdown
>> command itself
>> and `upsdrvctl` would command all supporting UPS to shutdown/start
>> regardless of it.
>> So similar to the clone* drivers, which seem not to have any special
>> handling there.
>>
>> Thanks for the detailed responses so far, I am still exploring some areas
>> of the NUT ecosystem. :-)
>>
>> Sebastian
>>
>> Am Mo., 12. Mai 2025 um 16:02 Uhr schrieb Sebastian Kuttnig <
>> sebastian.kuttnig at gmail.com>:
>>
>>> Hello Jim,
>>>
>>> I am following the direction of the `clone` drivers, although using the
>>> modern `upsdrvquery.c` API.
>>> Essentially I am parsing the protocol lines directly reading from the
>>> driver sockets, as the clone drivers do.
>>>
>>> As for ups.conf, I start up as follows without problems:
>>>
>>> [ups]
>>> driver = dummy-ups
>>> port = /etc/nut/5E.dev
>>>
>>> [ups2]
>>> driver = dummy-ups
>>> port = /etc/nut/APC.dev
>>>
>>> [failover]
>>> driver = failover
>>> port = dummy-ups-ups,dummy-ups-ups2
>>>
>>> Is this what you had in mind?
>>> Appreciate any pointers regarding the `upsdrvctl` and `nutshutdown`
>>> specifics.
>>>
>>> Sebastian
>>>
>>> Am Mo., 12. Mai 2025 um 15:53 Uhr schrieb Jim Klimov <
>>> jimklimov+nut at gmail.com>:
>>>
>>>> Sounds great, thanks for the update!
>>>>
>>>> For communications with the other drivers (from failover or
>>>> multiplexor), I suggest using the local driver socket line the clone*
>>>> drivers do: this removes a dependency on `upsd` both for the service
>>>> startup (no chicken-and-egg issue of drivers before upsd, but upsd before
>>>> the failover driver) and for FSD end-game (after all services stopped, we
>>>> just need to start the drivers to talk to the UPS, no need for upsd so they
>>>> can see each other).
>>>>
>>>> Speaking of the latter, the `nutshutdown` script (or `upsdrvctl`) may
>>>> need an update to know to start those additional drivers. Or perhaps do
>>>> them one by one in case of shutdown command specifically (or any command
>>>> generally), until one succeeds.
>>>>
>>>> Jim
>>>>
>>>>
>>>> On Mon, May 12, 2025 at 3:22 PM Sebastian Kuttnig via Nut-upsdev <
>>>> nut-upsdev at alioth-lists.debian.net> wrote:
>>>>
>>>>> Hello again,
>>>>>
>>>>> I can report back that my failover driver is progressing nicely, and a
>>>>> lot of
>>>>> things seem to overlap with what could very well also be useful and
>>>>> maybe used
>>>>> eventually for a future multiplexing driver.
>>>>>
>>>>> Basically, my failover logic takes driver names (sockets) in
>>>>> comma-separated
>>>>> form from the `port` variable and keeps track of these drivers,
>>>>> monitoring
>>>>> their information and failing over where necessary. Basic
>>>>> configuration looks
>>>>> like:
>>>>>
>>>>>   [failover]
>>>>>   driver = failover
>>>>>   port = dummy-ups-ups,dummy-ups-ups2
>>>>>
>>>>> I could well picture a multiplexing driver accepting a similar format,
>>>>> merging
>>>>> variables of both drivers and resolving conflicts by port argument
>>>>> order
>>>>> (`dummy-ups-ups`, then `dummy-ups-ups2`) in its most basic form.
>>>>>
>>>>> Additionally, this could be extended with preference arguments, such
>>>>> as:
>>>>>
>>>>>   prefer.ups.status = dummy-ups-ups
>>>>>   prefer.battery.voltage.nominal = dummy-ups-ups2
>>>>>
>>>>> Such definitions would take precedence over the port argument order,
>>>>> for more
>>>>> granular control. This could be similar to what is used in `ups.conf`
>>>>> for
>>>>> `default.<variable>` or `override.<variable>`, format-wise.
>>>>>
>>>>> If either driver were to drop offline, the other driver could take
>>>>> over with
>>>>> its full set of variables, regardless of other set preferences.
>>>>>
>>>>> Just a rough sketch of what I have in my mind. Time permitting, I'll
>>>>> start
>>>>> working on this at some point after I finish my failover explorations.
>>>>>
>>>>> Sebastian
>>>>>
>>>>> Am Mo., 12. Mai 2025 um 14:16 Uhr schrieb Greg Troxel via Nut-upsdev <
>>>>> nut-upsdev at alioth-lists.debian.net>:
>>>>>
>>>>>> Wow, that's quite the tale!
>>>>>>
>>>>>> I take away from this:
>>>>>>
>>>>>>   There is a real example of wanting to merge two information sources.
>>>>>>
>>>>>>   It's very complicated.
>>>>>>
>>>>>>   Anybody wishing to succeed in a very complicated situation needs to
>>>>>>   really pay attention, to twice as many things as they thought when
>>>>>>   they started.
>>>>>>
>>>>>>   It's unclear how to generalize from this to a solution that will
>>>>>> work
>>>>>>   for the next person.
>>>>>>
>>>>>> but if someone wants to write soemthing that is an aggregating driver
>>>>>> (looks like a driver, talks to N driver), and do so in a way that
>>>>>> doesn't cause any significant pain for others that seems like a fine
>>>>>> thing for them to do.
>>>>>>
>>>>>> I would suggest having some sort of config file that for each variable
>>>>>> says which driver to prefer, and some kind of timeout for not
>>>>>> available
>>>>>> to flip to the backup.  I guess for starters, one could configure two
>>>>>> drivers in "fancier/less-reliable" and "old-school" slots, and prefer
>>>>>> fancier for all except shutdown and status.
>>>>>>
>>>>>> I fear that the next layer is merging status from two where they don't
>>>>>> quite match.
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Nut-upsdev mailing list
>>>>>> Nut-upsdev at alioth-lists.debian.net
>>>>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>>>>
>>>>> _______________________________________________
>>>>> Nut-upsdev mailing list
>>>>> Nut-upsdev at alioth-lists.debian.net
>>>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>>>
>>>> _______________________________________________
>> Nut-upsdev mailing list
>> Nut-upsdev at alioth-lists.debian.net
>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20250512/019d0f4f/attachment-0001.htm>


More information about the Nut-upsdev mailing list