[Nut-upsdev] Questions about failover architecture
Sebastian Kuttnig
sebastian.kuttnig at gmail.com
Mon May 12 17:01:50 BST 2025
Definitely a valid point for any multiplexing driver, for failover probably
less so as individual devices are involved which each need shutting down to
successfully kill all power.
Maybe for the multiplexing issue a settable flag on any individual driver
"nokill" could instruct upsdrvctl to exclude that duplicate driver from
wholesale "upsdrvctl shutdown", so that the user can decide which driver
does shutdown best.
But yes, probably an issue for another day. :-)
Again, many thanks for the insights and discussion!
Sebastian
Jim Klimov <jimklimov+nut at gmail.com> schrieb am Mo. 12. Mai 2025 um 17:54:
> Thinking of it, the service-aware nut-driver-enumerator script has ways to
> set dependencies of a driver on other units in the system, depending on
> driver type. A couple of lines added there would be useful to ensure the
> failover/multiplex driver(s) start after the "real" driver units they
> scrape data from, and it would be a "soft" dependency (real driver unit may
> crash without impacting the failover/multiplexor).
>
> But I guess this (and a myriad other nuances that would pop up) can also
> be tackled after an actual driver appears :)
>
> Jim
>
>
> On Mon, May 12, 2025 at 5:40 PM Jim Klimov <jimklimov+nut at gmail.com>
> wrote:
>
>> Good point, I guess I thought it knew which UPSes are getting the FSD
>> treatment.
>>
>> Maybe this should also get revised eventually (e.g. a "primary"
>> monitoring system not fed by the UPS it is wired to should never FSD itself
>> due to the outage, but should ensure that UPS gets powered off or
>> power-cycled when no "secondary" clients remain).
>>
>> Then maybe not much is needed at this time, and we'd revisit this as we
>> hit any issues.
>>
>> One point I remain concerned about is the UPS getting confused about
>> receiving the same commands over different protocols/media in short
>> sequence. Or similar but conflicting commands because one driver knows how
>> to request "load.off" and another knows about "shutdown.stayoff" or
>> "shotdown.reboot" leading to different code paths in the UPS controller -
>> can't exclude some firmware getting wedged about this.
>>
>> In this case `upsdrvctl shutdown` should somehow know to only try one of
>> the drivers at a time (perhaps `sdorder` can be useful to at least somehow
>> separate these attempts in time), or to just start both "real" drivers in a
>> mode with `allow_killpower` (but not with `-k`) so that the
>> failover/multiplex driver can decide which one to call, and when/whether to
>> call another. The tool uses a common NUT config parser, so might be taught
>> about special driver names (failover) and their port values as inputs for
>> its work or lack thereof. Probably "upsdrvctl shutdown realdriver" should
>> work as it did, but "upsdrvctl shutdown failoverdriver" or "upsdrvctl
>> shutdown" (for all UPSes) should do these tricks.
>>
>> Jim
>>
>>
>>
>> On Mon, May 12, 2025 at 5:25 PM Sebastian Kuttnig via Nut-upsdev <
>> nut-upsdev at alioth-lists.debian.net> wrote:
>>
>>> P.S. To articulate better what I am unclear about from your message:
>>>
>>> `nutshutdown` seems to run `@SBINDIR@/upsdrvctl shutdown`.
>>> From my understanding, this would command - all - `ups.conf` UPS to
>>> shutdown.
>>> So this would already include the UPS monitored by any
>>> failover/multiplexing driver.
>>> In contrast, `upsdrvctl shutdown` would start up all these drivers
>>> again, respectively.
>>>
>>> What kind of specific orchestration would be required for the "proxying"
>>> driver?
>>>
>>> My initial understanding was it would simply not support a shutdown
>>> command itself
>>> and `upsdrvctl` would command all supporting UPS to shutdown/start
>>> regardless of it.
>>> So similar to the clone* drivers, which seem not to have any special
>>> handling there.
>>>
>>> Thanks for the detailed responses so far, I am still exploring some
>>> areas of the NUT ecosystem. :-)
>>>
>>> Sebastian
>>>
>>> Am Mo., 12. Mai 2025 um 16:02 Uhr schrieb Sebastian Kuttnig <
>>> sebastian.kuttnig at gmail.com>:
>>>
>>>> Hello Jim,
>>>>
>>>> I am following the direction of the `clone` drivers, although using the
>>>> modern `upsdrvquery.c` API.
>>>> Essentially I am parsing the protocol lines directly reading from the
>>>> driver sockets, as the clone drivers do.
>>>>
>>>> As for ups.conf, I start up as follows without problems:
>>>>
>>>> [ups]
>>>> driver = dummy-ups
>>>> port = /etc/nut/5E.dev
>>>>
>>>> [ups2]
>>>> driver = dummy-ups
>>>> port = /etc/nut/APC.dev
>>>>
>>>> [failover]
>>>> driver = failover
>>>> port = dummy-ups-ups,dummy-ups-ups2
>>>>
>>>> Is this what you had in mind?
>>>> Appreciate any pointers regarding the `upsdrvctl` and `nutshutdown`
>>>> specifics.
>>>>
>>>> Sebastian
>>>>
>>>> Am Mo., 12. Mai 2025 um 15:53 Uhr schrieb Jim Klimov <
>>>> jimklimov+nut at gmail.com>:
>>>>
>>>>> Sounds great, thanks for the update!
>>>>>
>>>>> For communications with the other drivers (from failover or
>>>>> multiplexor), I suggest using the local driver socket line the clone*
>>>>> drivers do: this removes a dependency on `upsd` both for the service
>>>>> startup (no chicken-and-egg issue of drivers before upsd, but upsd before
>>>>> the failover driver) and for FSD end-game (after all services stopped, we
>>>>> just need to start the drivers to talk to the UPS, no need for upsd so they
>>>>> can see each other).
>>>>>
>>>>> Speaking of the latter, the `nutshutdown` script (or `upsdrvctl`) may
>>>>> need an update to know to start those additional drivers. Or perhaps do
>>>>> them one by one in case of shutdown command specifically (or any command
>>>>> generally), until one succeeds.
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>> On Mon, May 12, 2025 at 3:22 PM Sebastian Kuttnig via Nut-upsdev <
>>>>> nut-upsdev at alioth-lists.debian.net> wrote:
>>>>>
>>>>>> Hello again,
>>>>>>
>>>>>> I can report back that my failover driver is progressing nicely, and
>>>>>> a lot of
>>>>>> things seem to overlap with what could very well also be useful and
>>>>>> maybe used
>>>>>> eventually for a future multiplexing driver.
>>>>>>
>>>>>> Basically, my failover logic takes driver names (sockets) in
>>>>>> comma-separated
>>>>>> form from the `port` variable and keeps track of these drivers,
>>>>>> monitoring
>>>>>> their information and failing over where necessary. Basic
>>>>>> configuration looks
>>>>>> like:
>>>>>>
>>>>>> [failover]
>>>>>> driver = failover
>>>>>> port = dummy-ups-ups,dummy-ups-ups2
>>>>>>
>>>>>> I could well picture a multiplexing driver accepting a similar
>>>>>> format, merging
>>>>>> variables of both drivers and resolving conflicts by port argument
>>>>>> order
>>>>>> (`dummy-ups-ups`, then `dummy-ups-ups2`) in its most basic form.
>>>>>>
>>>>>> Additionally, this could be extended with preference arguments, such
>>>>>> as:
>>>>>>
>>>>>> prefer.ups.status = dummy-ups-ups
>>>>>> prefer.battery.voltage.nominal = dummy-ups-ups2
>>>>>>
>>>>>> Such definitions would take precedence over the port argument order,
>>>>>> for more
>>>>>> granular control. This could be similar to what is used in `ups.conf`
>>>>>> for
>>>>>> `default.<variable>` or `override.<variable>`, format-wise.
>>>>>>
>>>>>> If either driver were to drop offline, the other driver could take
>>>>>> over with
>>>>>> its full set of variables, regardless of other set preferences.
>>>>>>
>>>>>> Just a rough sketch of what I have in my mind. Time permitting, I'll
>>>>>> start
>>>>>> working on this at some point after I finish my failover explorations.
>>>>>>
>>>>>> Sebastian
>>>>>>
>>>>>> Am Mo., 12. Mai 2025 um 14:16 Uhr schrieb Greg Troxel via Nut-upsdev <
>>>>>> nut-upsdev at alioth-lists.debian.net>:
>>>>>>
>>>>>>> Wow, that's quite the tale!
>>>>>>>
>>>>>>> I take away from this:
>>>>>>>
>>>>>>> There is a real example of wanting to merge two information
>>>>>>> sources.
>>>>>>>
>>>>>>> It's very complicated.
>>>>>>>
>>>>>>> Anybody wishing to succeed in a very complicated situation needs to
>>>>>>> really pay attention, to twice as many things as they thought when
>>>>>>> they started.
>>>>>>>
>>>>>>> It's unclear how to generalize from this to a solution that will
>>>>>>> work
>>>>>>> for the next person.
>>>>>>>
>>>>>>> but if someone wants to write soemthing that is an aggregating driver
>>>>>>> (looks like a driver, talks to N driver), and do so in a way that
>>>>>>> doesn't cause any significant pain for others that seems like a fine
>>>>>>> thing for them to do.
>>>>>>>
>>>>>>> I would suggest having some sort of config file that for each
>>>>>>> variable
>>>>>>> says which driver to prefer, and some kind of timeout for not
>>>>>>> available
>>>>>>> to flip to the backup. I guess for starters, one could configure two
>>>>>>> drivers in "fancier/less-reliable" and "old-school" slots, and prefer
>>>>>>> fancier for all except shutdown and status.
>>>>>>>
>>>>>>> I fear that the next layer is merging status from two where they
>>>>>>> don't
>>>>>>> quite match.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Nut-upsdev mailing list
>>>>>>> Nut-upsdev at alioth-lists.debian.net
>>>>>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Nut-upsdev mailing list
>>>>>> Nut-upsdev at alioth-lists.debian.net
>>>>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>>>>
>>>>> _______________________________________________
>>> Nut-upsdev mailing list
>>> Nut-upsdev at alioth-lists.debian.net
>>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20250512/57556921/attachment.htm>
More information about the Nut-upsdev
mailing list