[Nut-upsuser] upsmon+snmp-ups does not shut down system

William Seligman seligman at nevis.columbia.edu
Mon Jan 9 18:23:37 UTC 2012


On 1/9/12 9:53 AM, Arnaud Quette wrote:

> 2012/1/6 William Seligman <seligman at nevis.columbia.edu>
> 
>> I've googled and RTFM'ed, but still can't solve this one. I hope you folks
>> can.
>>
>> This affects my entire computer cluster, but let's start simple: I've got
>> a computer running NUT; OS is Scientific Linux 5.5; kernel 
>> 2.6.18-274.12.1.el5xen. It connects to an APC SMART-UPS via an APC
>> SmartCard using the snmp-ups driver. It generally works: upsmon will detect
>> if the battery is low (I get an e-mail message); I can control the UPS,
>> inspect it variables, set variables, issue commands, and so on.
> 
> If "On battery" and "Low battery" are both detected, there should be no
> issue.
> 
>> There's just one thing that does not happen: when the UPS goes critical, 
>> the computer does not shut down. The upsmon daemon does not display any 
>> messages, does not write to the syslog, does not send e-mail, etc.; even
>> though I've configured it to do so in upsmon.conf.>>
>> I've tried nut-2.2.2, nut-2.4.3, and nut-2.6.2, and the symptom is the
>> same.
> 
> Using the latest version, when possible, is always a good idea.

Installing nut-2.6.2 on a Scientific Linux 5.5 system was a bit difficult, and
played havoc with my regular yum updates. After I've finished debugging this
problem, I'm going to completely reinstall the OS to make sure I've got a
consistent set of RPMs.

>> I tried issuing a "graceful reboot" command via the APC SmartCard's web and
>> telnet interface. It made no difference; the system still did not shut
>> down.
>>
>> Now let's extend the problem to my cluster: I have a variety of different 
>> computers, all running Scientific Linux 5.5, connecting through different 
>> switches, connecting to different flavors of APC SMART-UPSes, via 
>> SmartCards, each ranging in age from six months to five years. They all
>> exhibit this same symptom, as I painfully discovered during a recent power
>> outage: they all sent me e-mail when the UPSes went to low battery, but
>> none turned off when the UPS went critical. Given the range of hardware
>> involved, this must be a common software problem.
>> 
>> The systems will shut down properly if I do "upsmon -c fsd", so it doesn't 
>> appear to be a permissions problem.
>> 
>> I don't think this is the upsdrv_shutdown() issue described in the snmp-ups
>> man page; I do not care if the UPS shuts down when the computer does, nor
>> do I want it to. I just want upsmon to shut down the system when the UPS
>> goes critical.
>>
>> Here are my config files; the system is tanya, its UPS is tanya-ups. Any
>> advice?
>>
>> ups.conf:
>>
>> [tanya-ups]
>>        driver = snmp-ups
>>        port = tanya-ups
>>        community = private
>>        mibs = apcc
>>
>> upsd.conf:
>>
>> # LISTEN 0.0.0.0 3493
>>
>> upsd.users:
>>
>> [admin]
>>        password = nowayjose
>>        actions = SET
>>        instcmds = all
>>        upsmon master
>>
> 
> it's also a good idea to separate monitoring and administrative users.
> Ie:
> [admin]
>        password = XXX
>        actions = SET
>        instcmds = all
> 
> [monuser]
>        password = XXX
>        upsmon master
> 
>> upsmon.conf:
>>
>> MONITOR tanya-ups at localhost 1 admin nowayjose master
>> MINSUPPLIES 1
>> SHUTDOWNCMD "/sbin/shutdown -h +0"
>> NOTIFYCMD /home/bin/notify.sh # sends me e-mail
>> POLLFREQ 5
>> POLLFREQALERT 5
>> HOSTSYNC 15
>> DEADTIME 15
>> POWERDOWNFLAG /etc/killpower
>> NOTIFYFLAG ONLINE       SYSLOG
>> NOTIFYFLAG ONBATT       SYSLOG+WALL
>> NOTIFYFLAG LOWBATT      SYSLOG+WALL
>> NOTIFYFLAG FSD          SYSLOG+WALL+EXEC
>> NOTIFYFLAG COMMOK       SYSLOG
>> NOTIFYFLAG COMMBAD      SYSLOG
>> NOTIFYFLAG SHUTDOWN     SYSLOG+WALL+EXEC
>> NOTIFYFLAG REPLBATT     SYSLOG+WALL+EXEC
>> NOTIFYFLAG NOCOMM       SYSLOG
>> NOTIFYFLAG NOPARENT     SYSLOG+WALL
>> RBWARNTIME 43200
>> NOCOMMWARNTIME 300
>> FINALDELAY 5
> 
> Your config seems fine.
> An interesting test to do would be to stop upsmon, but keep snmp-ups and
> upsd, then discharge your UPS and to ensure that you indeed get an
> ups.status == "OB LB", which triggers the call to upsmon.conf->SHUTDOWNCMD.
> Note that you need both "OB" and "LB", since you may have "low battery" and
> be "online" at the same time!

This is a good idea, and I ran the test. I disconnected the UPS, and
periodically checked the output of:

upsc tanya-ups at localhost ups.status

Eventually this command returned "OB LB" as you said. But upsmon did nothing. I
waited and eventually the UPS shut power to the system in a hard crash.

So the UPS is sending the correct signals, and snmp-ups is reporting the correct
status. Is there anything else I can check to trace the cause of the problem?

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20120109/d7e68be5/attachment.bin>


More information about the Nut-upsuser mailing list