[Nut-upsuser] Low battery unexpected shutdown

Marc Franquesa marc.franquesa at gmail.com
Thu Apr 23 14:26:31 BST 2020


Hi, recently I observed that during a power outage, my NUT setup doesn't
shutdown properly. Indeed it never reaches the LowBattery state to notify
and initiate shutdown on clients. Reviewing my setup and logs I think I'm
facing multiple problems and I'm unable to point to real root cause.

First, let me focus on my scenario and current setup:

In normal status (ONLINE, fully charged) my UPS runtime is about 20minutes,
this is some of the relevant info:

Charge                    100%
Charge Low                20%
Runtime                   20m37s
Type                      PbAc
Manufacturer              EATON
Model                     Ellipse
ECO                       1200
Serial                    000000000
Power                     25VA
Frequency Nominal         50Hz
Voltage                   230.0V
Voltage Nominal           230V
Delay Shutdown            25s
Delay Start               40s
Firmware                  02
Load                      23%
Power Nominal             1200VA
Status                    OL

During a power outage it used to last for about 15 minutes and then
initiate the shutdown. however reviewing the logs of my systems:

Apr 19 12:04:23 - OnBattery
Apr 19 12:10:21 - Power from UPS cut, all systems got suddenly powered off
(no graceful shutdown)
Apr 19 12:16:46 - Power restored, UPS feeds power again to the systems and
they boot up.

So this indicated the UPS only lasted about 6 minutes.

Also I have 2 UPS:
- UPSdevice, which is the real UPS
- HeartBeat : virtual dummy device to monitor proper NUT communication (see
http://rogerprice.org/NUT/NUT.html#HEARTBEAT).

Looking at the logs when the event starts:

Apr 19 12:04:23 laney upsmon[756]: UPS UPSdevice at power.srv.l3jane.net: On
battery.
Apr 19 12:04:23 laney upssched[797]: New timer: onbattwarn (30 seconds)
Apr 19 12:04:53 laney upssched[797]: Event: onbattwarn
Apr 19 12:08:24 laney upsmon[756]: UPS HeartBeat at power.srv.l3jane.net: On
battery.
Apr 19 12:08:24 laney upssched[797]: Cancelling timer: heartbeat-failure

UPSdevice uses timer "onbattwarn", while HeartBeat is using a separated
timer "heartbeat-failure". So I supose that when heartbeat cancels the
'heartbeat-failure' the onbattwarn timer is still running.

The second concern is quickly spotted:

Apr 19 12:09:19 laney usbhid-ups[703]: libusb_get_interrupt: error
submitting URB: No such device
Apr 19 12:09:24 laney upsd[753]: Data for UPS [UPSdevice] is stale - check
driver
Apr 19 12:09:26 laney upsd[753]: UPS [UPSdevice] data is no longer stale

Seems that during the 'onbattwarn' event, NUT lost communcaiton with the
UPS for about 6 secs? this is really quick to trigger any 'COMM_BAD' evebt
so no notifcation regarding this is sent, just the logs.

However as stated above on the logs, just 1 min later (12:10:21) power is
totally lost even from UPS, no trace of LowBattery events, comm errors or
so. A part from a email message that upssched sent me (although no logs of
the event appears), this email reports:

Power ups UPSdevice battery-low notification
UPS:         UPSdevice at power.srv.l3jane.net
Notice type: LOWBATT
Message:     battery-low
Sun, 19 Apr 2020 12:10:27 +0200

Charge                    73%
Charge Low                20%
Runtime                   15m52s
Type                      PbAc
Manufacturer              EATON
Model                     Ellipse
ECO                       1200
Serial                    000000000
Type                      Ups
Power                     25VA
Frequency Nominal         50Hz
Voltage                   230.0V
Voltage Nominal           230V
Delay Shutdown            20s
Delay Start               30s
Firmware                  02
Load                      21%
Power Nominal             1200VA
Status                    FSD
ALARM                     OB LB

So, indeed was in LowBattery and FSD shutdown, however note that the
current charge was 73% >> 20% low battery, so why it set LB when there were
enough battery to run? Is this a comm problem, device issue, driver ?

So mainly my concerns questions are:
A) Can upsched introduce some race-conditions making a timer cancel cancel
other timers not related ?
B) May my UPS be faulting or just some USB driver issue?

Any other hint, information or idea is welcomed.

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20200423/28acb83e/attachment.html>


More information about the Nut-upsuser mailing list