[Nut-upsdev] Enhanced driver troubleshooting
Jim Klimov
jimklimov+nut at gmail.com
Mon Apr 24 13:30:38 BST 2023
As a follow-up, that R&D led to several other issues and PRs, now mostly
merged and summarized in
https://github.com/networkupstools/nut/issues/1781#issuecomment-1519824607
(with a few points still awaiting completion).
They added new features to NUT master branch, including:
* drivers/upsdrvquery.{c,h}: mini-client for socket protocol, so drivers
(and upsdrvctl) can talk to the already-running driver instance, if any,
other than by sending signals and making guesses if they were handled; this
mini-client is usable for SET and INSTCMD with TRACKING and timeouts on
POSIX and WIN32 builds
* This mini-client is used to enable `programname -c reload-or-error`
commands on both platforms, allowing callers to either reload driver
configurations "on the fly" where supported (e.g.. `debug_min` setting;
more can be revised by specific driver maintainers) or to return an
exit-code stating that a driver restart is required to apply the
configuration change
* The exit-code is integrated with `nut-driver-enumerator` service
(systemd/SMF) to decide whether a detected change in an `ups.conf` section
requires re-definition (and restart) of a service instance for the
`nut-driver` (which was the knee-jerk reaction to any change until now,
since we only track checksums of config sections and do not know what
exactly changed), or if a live reload suffices - improving driver uptime
during troubleshooting investigations
* On both platforms, clients to the socket protocol (e.g.
`server/sockdebug(.exe)`) can newly `SET driver.debug NUM` to change
verbosity on the fly
* On both platforms, clients to the socket protocol can `SET
driver.flag.allow_killpower 1` (failsafe that can be pre-configured via
`ups.conf`) and `INSTCMD driver.killpower` to request that the running
driver does the "force shutdown" equivalent of earlier `drivername -k`. In
fact, handling of the latter was changed to try using the running driver
(if any) first, and try the old approach of possibly killing off that
driver process and starting a new device connection to request its poweroff
only if the INSTCMD did not report successful handling (meaning that its
`upsdrv_shutdown()` method returned - some are no-ops, a few are infinite
loops waiting for power state). As a consequence, this allows OS
integrations to revise just how the shutdown is handled - e.g. if the
nut-driver services should be stopped at all before the endgame (`upsmon`
and `upsd` may well be stopped, socket protocol does not depend on them).
* On POSIX platforms currently (WIN32 equivalent to complete later) some
more commands and notably signals are available for configuration reloading
(including `reload-or-restart` for driver-wrapping service instances) and
troubleshooting.
* Note that `upsdrvctl` is limited to devices persistently defined in
`ups.conf`, but drivers called directly can signal their instances spawned
with `-s TempUPSname` experiments.
That was a fun if bumpy ride :)
Hope this helps,
Jim Klimov
On Wed, Apr 19, 2023 at 5:18 PM Jim Klimov <jimklimov+nut at gmail.com> wrote:
> Cheers,
>
> With https://github.com/networkupstools/nut/pull/1906 and
> https://github.com/networkupstools/nut/pull/1912 (and probably some more
> later), I've been enhancing NUT driver framework with support for live
> reload of configuration, primarily to acknowledge changes to debug_min
> setting in ups.conf - but made in a generic fashion that specific drivers
> may take advantage of for their `addvar`'ed options handling (as a separate
> effort from users/maintainers of those drivers).
>
> Live testing and other forms of review would be welcome :)
>
> Jim
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20230424/fa4ede91/attachment.htm>
More information about the Nut-upsdev
mailing list