[Nut-upsuser] MGE pulsar evolution 3000 discharges without nut noticing
Arjen de Korte
nut+users at de-korte.org
Wed Feb 20 08:28:15 UTC 2008
> I had a very strange occurance in my basement datacenter this early AM.
>
> the following was logged by upslog:
>
> 20080219 023452 100.0 123.0 049.0 [OL CHRG] NA 60.00
> 20080219 023522 100.0 122.8 049.0 [OL CHRG] NA 60.00
> 20080219 023552 098.0 123.0 049.0 [OL DISCHRG] NA 60.00
> 20080219 023622 097.0 123.0 049.0 [OL DISCHRG] NA 60.00
>
> ...
>
> 20080219 063429 019.0 122.6 049.0 [OL DISCHRG] NA 60.00
> 20080219 063459 019.0 122.8 049.0 [OL DISCHRG] NA 60.00
> 20080219 183013 100.0 121.1 050.0 [OL CHRG] NA 60.00
> 20080219 183043 100.0 120.9 049.0 [OL CHRG] NA 60.00
>
> the good news is the UPS held up for four hours.
Nothing wrong here. It was probably running a battery test, since the
input power seems to have been present at all times (it is reporting OL).
> the bad news is that nut completely ignored that the UPS was
> discharging, and none of my connected machines shut down properly.
It didn't ignore it, the UPS just wasn't critical at any time during this
test. By default, NUT will only start a shutdown sequence when the UPS
becomes critical (ie, no input power available and batteries low). From
the above, I conclude that at no time the input power was lost and neither
the batteries were low. You have a couple of things to configure to make
NUT behave otherwise.
> I
> know that the UPS driver was running, because if was not, upsd would
> have complained loudly and frequently. I know upsd was running (on the
> connected server as well as all the clients), because if it was not,
> upsmon would have complained loudly and frequently.
I don't see any problems here.
> there were no indications in my logs that nut was paying any attention
> to the situation.
Yes, it was. It was reporting [OL DISCHRG] all the time. There was just
nothing NUT needed to do.
> why did my UPS decide to discharge its battery when there appears to be
> no power outage? (just looking for suggestions and guesses here from
> other MGE users...)
Most probably, because of an (automated) battery test. Or someone
initiated one, but given your surprised reaction, I guess it wasn't you.
> why did nut not see that my UPS was discharging and the battery
> percentage running down and take appropriate action? "OL DISCHRG"
> status in the logs seems nonsensical.
This is all documented (in the FAQ for instance). This makes perfect sense.
> I have only seen this condition a
> few times before in my logs:
>
> 20061002 002628 094.0 079.6 045.0 [OB DISCHRG] NA 60.00
> 20061002 002658 093.0 096.8 045.0 [OL DISCHRG BOOST] NA 60.00
> 20061002 002728 093.0 088.3 045.0 [OL DISCHRG] NA 60.00
> 20061002 002758 093.0 000.0 045.0 [OB DISCHRG] NA 00.00
>
> Oct 2 00:22:59 aragorn upsmon[234]: UPS mge3000 at localhost on battery
> Oct 2 00:26:45 aragorn upsmon[234]: UPS mge3000 at localhost on line power
> Oct 2 00:27:35 aragorn upsmon[234]: UPS mge3000 at localhost on battery
>
> nut saw this one, and shut everything down, as expected.
Sure, here you see that the input power is actually lost (the line state
flipping back and forth between OL and OB).
> 20061214 224604 100.0 122.4 054.0 [OL CHRG] NA 60.00
> 20061214 224634 099.0 123.7 056.0 [OL DISCHRG] NA 60.00
> 20061214 224704 097.0 000.0 054.0 [OB DISCHRG] NA 00.00
>
> Dec 14 22:46:45 aragorn upsmon[234]: UPS mge3000 at localhost on battery
> Dec 14 22:58:41 aragorn upsmon[234]: UPS mge3000 at localhost battery is low
> Dec 14 22:58:41 aragorn upsd[226]: Client master at 127.0.0.1 set FSD on UPS
> [mge3000]
> Dec 14 22:58:57 aragorn upsmon[234]: Host sync timer expired, forcing
> shutdown
> Dec 14 22:58:57 aragorn upsmon[234]: Executing automatic power-fail
> shutdown
> Dec 14 22:58:57 aragorn upsmon[234]: Auto logout and shutdown proceeding
> Dec 14 22:59:02 aragorn upsd[226]: Host 127.0.0.1 disconnected (read
> failure)
> Dec 14 22:59:08 aragorn upsd[226]: Signal 15: exiting
And here the batteries where probably already low and NUT decided the UPS
was critical and it was time to shutdown.
[...]
> ups.test.interval: 10080
There you go. This seems to be an awful short interval (three hours).
Again, read up on the FAQ and come back if you anything is not clear to
you.
Best regards, Arjen
--
Eindhoven - The Netherlands
Key fingerprint - 66 4E 03 2C 9D B5 CB 9B 7A FE 7E C1 EE 88 BC 57
More information about the Nut-upsuser
mailing list