[Nut-upsuser] upsmon+snmp-ups does not shut down system
William Seligman
seligman at nevis.columbia.edu
Mon Jan 9 18:23:37 UTC 2012
On 1/9/12 9:53 AM, Arnaud Quette wrote:
> 2012/1/6 William Seligman <seligman at nevis.columbia.edu>
>
>> I've googled and RTFM'ed, but still can't solve this one. I hope you folks
>> can.
>>
>> This affects my entire computer cluster, but let's start simple: I've got
>> a computer running NUT; OS is Scientific Linux 5.5; kernel
>> 2.6.18-274.12.1.el5xen. It connects to an APC SMART-UPS via an APC
>> SmartCard using the snmp-ups driver. It generally works: upsmon will detect
>> if the battery is low (I get an e-mail message); I can control the UPS,
>> inspect it variables, set variables, issue commands, and so on.
>
> If "On battery" and "Low battery" are both detected, there should be no
> issue.
>
>> There's just one thing that does not happen: when the UPS goes critical,
>> the computer does not shut down. The upsmon daemon does not display any
>> messages, does not write to the syslog, does not send e-mail, etc.; even
>> though I've configured it to do so in upsmon.conf.>>
>> I've tried nut-2.2.2, nut-2.4.3, and nut-2.6.2, and the symptom is the
>> same.
>
> Using the latest version, when possible, is always a good idea.
Installing nut-2.6.2 on a Scientific Linux 5.5 system was a bit difficult, and
played havoc with my regular yum updates. After I've finished debugging this
problem, I'm going to completely reinstall the OS to make sure I've got a
consistent set of RPMs.
>> I tried issuing a "graceful reboot" command via the APC SmartCard's web and
>> telnet interface. It made no difference; the system still did not shut
>> down.
>>
>> Now let's extend the problem to my cluster: I have a variety of different
>> computers, all running Scientific Linux 5.5, connecting through different
>> switches, connecting to different flavors of APC SMART-UPSes, via
>> SmartCards, each ranging in age from six months to five years. They all
>> exhibit this same symptom, as I painfully discovered during a recent power
>> outage: they all sent me e-mail when the UPSes went to low battery, but
>> none turned off when the UPS went critical. Given the range of hardware
>> involved, this must be a common software problem.
>>
>> The systems will shut down properly if I do "upsmon -c fsd", so it doesn't
>> appear to be a permissions problem.
>>
>> I don't think this is the upsdrv_shutdown() issue described in the snmp-ups
>> man page; I do not care if the UPS shuts down when the computer does, nor
>> do I want it to. I just want upsmon to shut down the system when the UPS
>> goes critical.
>>
>> Here are my config files; the system is tanya, its UPS is tanya-ups. Any
>> advice?
>>
>> ups.conf:
>>
>> [tanya-ups]
>> driver = snmp-ups
>> port = tanya-ups
>> community = private
>> mibs = apcc
>>
>> upsd.conf:
>>
>> # LISTEN 0.0.0.0 3493
>>
>> upsd.users:
>>
>> [admin]
>> password = nowayjose
>> actions = SET
>> instcmds = all
>> upsmon master
>>
>
> it's also a good idea to separate monitoring and administrative users.
> Ie:
> [admin]
> password = XXX
> actions = SET
> instcmds = all
>
> [monuser]
> password = XXX
> upsmon master
>
>> upsmon.conf:
>>
>> MONITOR tanya-ups at localhost 1 admin nowayjose master
>> MINSUPPLIES 1
>> SHUTDOWNCMD "/sbin/shutdown -h +0"
>> NOTIFYCMD /home/bin/notify.sh # sends me e-mail
>> POLLFREQ 5
>> POLLFREQALERT 5
>> HOSTSYNC 15
>> DEADTIME 15
>> POWERDOWNFLAG /etc/killpower
>> NOTIFYFLAG ONLINE SYSLOG
>> NOTIFYFLAG ONBATT SYSLOG+WALL
>> NOTIFYFLAG LOWBATT SYSLOG+WALL
>> NOTIFYFLAG FSD SYSLOG+WALL+EXEC
>> NOTIFYFLAG COMMOK SYSLOG
>> NOTIFYFLAG COMMBAD SYSLOG
>> NOTIFYFLAG SHUTDOWN SYSLOG+WALL+EXEC
>> NOTIFYFLAG REPLBATT SYSLOG+WALL+EXEC
>> NOTIFYFLAG NOCOMM SYSLOG
>> NOTIFYFLAG NOPARENT SYSLOG+WALL
>> RBWARNTIME 43200
>> NOCOMMWARNTIME 300
>> FINALDELAY 5
>
> Your config seems fine.
> An interesting test to do would be to stop upsmon, but keep snmp-ups and
> upsd, then discharge your UPS and to ensure that you indeed get an
> ups.status == "OB LB", which triggers the call to upsmon.conf->SHUTDOWNCMD.
> Note that you need both "OB" and "LB", since you may have "low battery" and
> be "online" at the same time!
This is a good idea, and I ran the test. I disconnected the UPS, and
periodically checked the output of:
upsc tanya-ups at localhost ups.status
Eventually this command returned "OB LB" as you said. But upsmon did nothing. I
waited and eventually the UPS shut power to the system in a hard crash.
So the UPS is sending the correct signals, and snmp-ups is reporting the correct
status. Is there anything else I can check to trace the cause of the problem?
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20120109/d7e68be5/attachment.bin>
More information about the Nut-upsuser
mailing list