[Nut-upsdev] Bug#441342: Nut can kill power to UPSs that never went on battery

Arjen de Korte nut+devel at de-korte.org
Sat Sep 15 06:56:27 UTC 2007



>> You may not be aware of it, but this is where the root of the problem
>> lies. Normally, when a UPS is told to shutdown, two things can happen:
>>
>> 1) The input power is gone and the UPS powers off and switches back on
>> when the power returns.
>>
>> 2) Input power is still available and the UPS cycles the power so that
>> the systems that receive power from it, can restart.
>
> In this case, strictly speaking, neither of those two conditions were a
> cause.

But it *is* the reason why NUT isn't working as expected. You would never
have noticed anything abnormal, if the UPS'es would have restarted
automatically, as they are supposed to after sending them a forced
shutdown. In this respect, the apc-hid subdriver is *very* broken. With a
real power outage, you would see the same problem of systems not getting
back online, so this is not limited to setups where NUT monitors many
different UPS'es.

>> Apparently, #2 is not happening for you and from looking at the driver,
>> I can understand why. The shutdown sequence in this driver is not doing
>> what it is supposed to do. This needs fixing, but since I don't have an
>> APC UPS, I can't do that for you. We'll have to track down the developer
>> that wrote this driver, to correct this.
> I have pulled the UPS in question out of production and I should have time
> next week to attach a spare PC to it and get nut talking to it. For
> starters, I'll want to look into why the Smart-UPS's aren't starting back
> up a few seconds after a forced shutdown.

See the above. You will need to look into the shutdown function in
'apc-hid.c' to correct this. If you can pull the development version of
usbhid-ups from the SVN trunk, I can help with that. The stable version
doesn't provide enough debugging information to do that remotely.

>> There is definitly something broken here too. Under no circumstance
>> should a driver indicate both 'on battery' and 'low battery' if the power
>> is not actually out.
> I don't know for certain if this is applicable to the APC Smart-UPS 1500,
> but I think there is a hardware limitation with many UPSs in that they are
> not able to communicate when an on-battery condition is due to a self-test
> or mains failure.

That isn't a problem, since it wouldn't be report anything. :-)

Generally speaking though, most of the time this is caused by mapping the
UPS variables to the incorrect NUT values or the UPS indicates that the AC
failed when a battery test is running. We can correct the first and work
around the second if needed, so this is the second thing that is currently
broken in the apc-hid subdriver.

[...]

> While I see where you are coming from, I maintain that this isn't proper
> behavior. If the power never went out to the UPS in question, then it's
> safer to assume that power will not go out to that UPS (and risk an
> unclean shutdown of those hosts) than to assume that that UPS must be shut
> down (and guarantee the unavailability of those hosts).

This latter should not happen, at least it should not persist. What I'm
worried about, is that what you suggest opens a race condition that
doesn't exist in the present setup. We support configurations where many
different UPS'es work in parallel (various models and even from different
vendors). Not all have near-instant notifications if the power goes out.
Some devices/drivers will take tens of seconds to notice that. Had the NUT
server assumed they were still on line power when it went down, that would
lead to a nasty surprise later on.

[...]

> If a nut server simply isn't running, then client hosts won't refuse to
> mount their disks read-write on startup. I don't see any reason to be so
> paranoid in one situation but not another.

It would be possible to do this though and if memory serves, there are
actually people using this. We even suggest something similar in the
documentation, to delay startup until the batteries have recharged
sufficiently to allow the systems to cleanly shutdown in the event power
fails again.

>> In a single NUT server, multiple UPS system it is impossible to deal
>> with situations where some of the UPS'es monitored receive power
>> from the mains and the one powering the NUT server is not.
> I agree, it is impossible to do perfectly. There are pitfalls to both
> approaches. I believe the current approach violates the principle of least
> surprise.

The surprise you see, is that the systems didn't restart, that is the #1
problem in the apc-hid subdriver. What we should also warn more clearly
for, is that the NUT client-server architecture isn't robust in setups
where some UPS'es may be receiving power and some are not. In case of a
three phase mains, this might mean that you need three NUT servers (one
for each phase), if you use single phase UPS'es.

Best regards, Arjen




More information about the Nut-upsdev mailing list