[Nut-upsdev] Question about hardware failures / FSD

João Luis Meloni Assirati jlmassir at gmail.com
Thu Oct 1 21:35:46 BST 2020


On Wed, Sep 30, 2020 at 3:37 AM Roger Price <roger at rogerprice.org> wrote:
>
> On Tue, 29 Sep 2020, João Luis Meloni Assirati wrote:
>
> > The UPS I am developing a driver to is able to report several flags for
> > critical hardware conditions, like overheat, overload, inverter failure,
> > output short etc. What should be the correct policy of operation when such a
> > condition occurs? I think that the an UPS in such a condition is not reliable
> > and therefore a system shutdown should be called. However, the developer's
> > manual and all other drivers I inspected seem to call the FSD flag only when
> > there is a shutdown already in progress (say, a countdown register is active).
>
> I attach a list of the current NUT status changes.  Of interest to you will be
> 16 and 17 since they introduce new status names.  They are used by a heartbeat
> mechanism based on the dummy-ups driver.  I suggest you introduce new status
> names and new status changes.  These will be monitored by a
> upsmon+upsched+upssched-cmd setup extended to cover the new statuses.
>
> I suggest you choose generic names which can be used by others in the future.
> For example HW1, HW2, ... for the hardware error conditions with a
> correspondance table for your UPS in the documentation.  Other drivers might
> have other correspondance tables.
>
> Although maybe OVERHEAT and OVERLOAD are already sufficiently generic.  It's up
> to you.
>
> Roger
>
> EVENTS based on upsd status changes
>
>    1. None->ALARM ALARM->None
>       The UPS has raised/dropped the ALARM signal.
>    2. None->BOOST BOOST->None
>       The UPS is now boosting/not boosting the output voltage.
>    3. None->BYPASS BYPASS->None
>       The UPS is/is not now bypassing its own batteries.
>    4. None->CAL CAL->None
>       The UPS is/is not now in calibration mode.
>    5. None->CHRG CHRG->None
>       The UPS is/is not now recharging its batteries.
>    6. None->DISCHRG DISCHRG->None
>       The UPS is/is not now discharging its batteries.
>    7. None->LB LB->None
>       The driver says the UPS battery charge is now low/no longer low.
>    8. None->OFF OFF->None
>       The driver says the UPS is/is not now OFF.
>    9. OL->OB OB->OL
>       The UPS is now on battery/no longer on battery.
>   10. None->OVER OVER->None
>       The UPS is/is not now in status [OVER].
>   11. None->RB RB->None
>       The UPS needs/no longer needs to have its battery replaced.
>   12. None->TEST TEST->None
>       The UPS is/is not now performing a test.
>   13. None->TRIM TRIM->None
>       The UPS is now trimming/not trimming the output voltage.
>
> Other EVENTS monitored by upsmon, upssched, upssched-cmd
>
>   14. LIVE->DEAD DEAD->LIVE
>       Communication with the UPS in now lost/restored.
>   15. None->FSD FSD->None
>       The UPS is/is not now in Forced ShutDown mode.
>   16. None->TICK TICK->None
>       A heartbeat UPS has/has not generated a [TICK].
>   17. None->TOCK TOCK->None
>       A heartbeat UPS has/has not generated a [TOCK].
>   18. TIMEOUT(my-timer)
>       Timer “my-timer” has completed.

Thank you, that was very helpful.

Are there also guidelines for alarms? The Developer Guide says there
are no official alarms yet and I should ask here. I think that alarms
should cover not only serious hardware problems, but also important
messages like "you should perform a runtime calibration".

Thank you,
João Luis.



More information about the Nut-upsdev mailing list