[Nut-upsdev] Question about hardware failures / FSD
João Luis Meloni Assirati
jlmassir at gmail.com
Thu Oct 1 21:35:46 BST 2020
On Wed, Sep 30, 2020 at 3:37 AM Roger Price <roger at rogerprice.org> wrote:
>
> On Tue, 29 Sep 2020, João Luis Meloni Assirati wrote:
>
> > The UPS I am developing a driver to is able to report several flags for
> > critical hardware conditions, like overheat, overload, inverter failure,
> > output short etc. What should be the correct policy of operation when such a
> > condition occurs? I think that the an UPS in such a condition is not reliable
> > and therefore a system shutdown should be called. However, the developer's
> > manual and all other drivers I inspected seem to call the FSD flag only when
> > there is a shutdown already in progress (say, a countdown register is active).
>
> I attach a list of the current NUT status changes. Of interest to you will be
> 16 and 17 since they introduce new status names. They are used by a heartbeat
> mechanism based on the dummy-ups driver. I suggest you introduce new status
> names and new status changes. These will be monitored by a
> upsmon+upsched+upssched-cmd setup extended to cover the new statuses.
>
> I suggest you choose generic names which can be used by others in the future.
> For example HW1, HW2, ... for the hardware error conditions with a
> correspondance table for your UPS in the documentation. Other drivers might
> have other correspondance tables.
>
> Although maybe OVERHEAT and OVERLOAD are already sufficiently generic. It's up
> to you.
>
> Roger
>
> EVENTS based on upsd status changes
>
> 1. None->ALARM ALARM->None
> The UPS has raised/dropped the ALARM signal.
> 2. None->BOOST BOOST->None
> The UPS is now boosting/not boosting the output voltage.
> 3. None->BYPASS BYPASS->None
> The UPS is/is not now bypassing its own batteries.
> 4. None->CAL CAL->None
> The UPS is/is not now in calibration mode.
> 5. None->CHRG CHRG->None
> The UPS is/is not now recharging its batteries.
> 6. None->DISCHRG DISCHRG->None
> The UPS is/is not now discharging its batteries.
> 7. None->LB LB->None
> The driver says the UPS battery charge is now low/no longer low.
> 8. None->OFF OFF->None
> The driver says the UPS is/is not now OFF.
> 9. OL->OB OB->OL
> The UPS is now on battery/no longer on battery.
> 10. None->OVER OVER->None
> The UPS is/is not now in status [OVER].
> 11. None->RB RB->None
> The UPS needs/no longer needs to have its battery replaced.
> 12. None->TEST TEST->None
> The UPS is/is not now performing a test.
> 13. None->TRIM TRIM->None
> The UPS is now trimming/not trimming the output voltage.
>
> Other EVENTS monitored by upsmon, upssched, upssched-cmd
>
> 14. LIVE->DEAD DEAD->LIVE
> Communication with the UPS in now lost/restored.
> 15. None->FSD FSD->None
> The UPS is/is not now in Forced ShutDown mode.
> 16. None->TICK TICK->None
> A heartbeat UPS has/has not generated a [TICK].
> 17. None->TOCK TOCK->None
> A heartbeat UPS has/has not generated a [TOCK].
> 18. TIMEOUT(my-timer)
> Timer “my-timer” has completed.
Thank you, that was very helpful.
Are there also guidelines for alarms? The Developer Guide says there
are no official alarms yet and I should ask here. I think that alarms
should cover not only serious hardware problems, but also important
messages like "you should perform a runtime calibration".
Thank you,
João Luis.
More information about the Nut-upsdev
mailing list