[Nut-upsuser] upsmon+snmp-ups does not shut down system
Arnaud Quette
aquette.dev at gmail.com
Wed Jan 11 11:43:09 UTC 2012
2012/1/9 William Seligman <seligman at nevis.columbia.edu>
> On 1/9/12 9:53 AM, Arnaud Quette wrote:
>
> > 2012/1/6 William Seligman <seligman at nevis.columbia.edu>
> >
> >> I've googled and RTFM'ed, but still can't solve this one. I hope you
> folks
> >> can.
> >>
> >> This affects my entire computer cluster, but let's start simple: I've
> got
> >> a computer running NUT; OS is Scientific Linux 5.5; kernel
> >> 2.6.18-274.12.1.el5xen. It connects to an APC SMART-UPS via an APC
> >> SmartCard using the snmp-ups driver. It generally works: upsmon will
> detect
> >> if the battery is low (I get an e-mail message); I can control the UPS,
> >> inspect it variables, set variables, issue commands, and so on.
> >
> > If "On battery" and "Low battery" are both detected, there should be no
> > issue.
> >
> >> There's just one thing that does not happen: when the UPS goes critical,
> >> the computer does not shut down. The upsmon daemon does not display any
> >> messages, does not write to the syslog, does not send e-mail, etc.; even
> >> though I've configured it to do so in upsmon.conf.>>
> >> I've tried nut-2.2.2, nut-2.4.3, and nut-2.6.2, and the symptom is the
> >> same.
> >
> > Using the latest version, when possible, is always a good idea.
>
> Installing nut-2.6.2 on a Scientific Linux 5.5 system was a bit difficult,
> and
> played havoc with my regular yum updates. After I've finished debugging
> this
> problem, I'm going to completely reinstall the OS to make sure I've got a
> consistent set of RPMs.
>
you may have prefered to rebuild an SRPM like that:
http://zid-luxinst.uibk.ac.at/linux/rpm2html/fedora/14/i386/updates/nut-2.6.2-1.fc14.i686.html
> >> I tried issuing a "graceful reboot" command via the APC SmartCard's web
> and
> >> telnet interface. It made no difference; the system still did not shut
> >> down.
> >>
> >> Now let's extend the problem to my cluster: I have a variety of
> different
> >> computers, all running Scientific Linux 5.5, connecting through
> different
> >> switches, connecting to different flavors of APC SMART-UPSes, via
> >> SmartCards, each ranging in age from six months to five years. They all
> >> exhibit this same symptom, as I painfully discovered during a recent
> power
> >> outage: they all sent me e-mail when the UPSes went to low battery, but
> >> none turned off when the UPS went critical. Given the range of hardware
> >> involved, this must be a common software problem.
> >>
> >> The systems will shut down properly if I do "upsmon -c fsd", so it
> doesn't
> >> appear to be a permissions problem.
> >>
> >> I don't think this is the upsdrv_shutdown() issue described in the
> snmp-ups
> >> man page; I do not care if the UPS shuts down when the computer does,
> nor
> >> do I want it to. I just want upsmon to shut down the system when the UPS
> >> goes critical.
> >>
> >> Here are my config files; the system is tanya, its UPS is tanya-ups. Any
> >> advice?
> >>
> >> ups.conf:
> >>
> >> [tanya-ups]
> >> driver = snmp-ups
> >> port = tanya-ups
> >> community = private
> >> mibs = apcc
> >>
> >> upsd.conf:
> >>
> >> # LISTEN 0.0.0.0 3493
> >>
> >> upsd.users:
> >>
> >> [admin]
> >> password = nowayjose
> >> actions = SET
> >> instcmds = all
> >> upsmon master
> >>
> >
> > it's also a good idea to separate monitoring and administrative users.
> > Ie:
> > [admin]
> > password = XXX
> > actions = SET
> > instcmds = all
> >
> > [monuser]
> > password = XXX
> > upsmon master
> >
> >> upsmon.conf:
> >>
> >> MONITOR tanya-ups at localhost 1 admin nowayjose master
> >> MINSUPPLIES 1
> >> SHUTDOWNCMD "/sbin/shutdown -h +0"
> >> NOTIFYCMD /home/bin/notify.sh # sends me e-mail
> >> POLLFREQ 5
> >> POLLFREQALERT 5
> >> HOSTSYNC 15
> >> DEADTIME 15
> >> POWERDOWNFLAG /etc/killpower
> >> NOTIFYFLAG ONLINE SYSLOG
> >> NOTIFYFLAG ONBATT SYSLOG+WALL
> >> NOTIFYFLAG LOWBATT SYSLOG+WALL
> >> NOTIFYFLAG FSD SYSLOG+WALL+EXEC
> >> NOTIFYFLAG COMMOK SYSLOG
> >> NOTIFYFLAG COMMBAD SYSLOG
> >> NOTIFYFLAG SHUTDOWN SYSLOG+WALL+EXEC
> >> NOTIFYFLAG REPLBATT SYSLOG+WALL+EXEC
> >> NOTIFYFLAG NOCOMM SYSLOG
> >> NOTIFYFLAG NOPARENT SYSLOG+WALL
> >> RBWARNTIME 43200
> >> NOCOMMWARNTIME 300
> >> FINALDELAY 5
> >
> > Your config seems fine.
> > An interesting test to do would be to stop upsmon, but keep snmp-ups and
> > upsd, then discharge your UPS and to ensure that you indeed get an
> > ups.status == "OB LB", which triggers the call to
> upsmon.conf->SHUTDOWNCMD.
> > Note that you need both "OB" and "LB", since you may have "low battery"
> and
> > be "online" at the same time!
>
> This is a good idea, and I ran the test. I disconnected the UPS, and
> periodically checked the output of:
>
> upsc tanya-ups at localhost ups.status
>
> Eventually this command returned "OB LB" as you said. But upsmon did
> nothing. I
> waited and eventually the UPS shut power to the system in a hard crash.
>
ooch, mea culpa!
I was too brief in my answer, and forgot to tell you the obvious: remove
your computer from the UPS, in order to avoid such crash.
> So the UPS is sending the correct signals, and snmp-ups is reporting the
> correct
> status. Is there anything else I can check to trace the cause of the
> problem?
>
indeed, though there is an issue, as you've reported initially.
Could you do this test again, but this time:
- remove your server from the UPS,
- start upsmon in debug mode. If it's already started, just call "upsmon -c
stop ; upsmon -DDDDD"
and send us back the output, at least when it should see the "OB LB"
condition, to see what's going on.
cheers,
Arnaud
--
Linux / Unix Expert R&D - Eaton - http://powerquality.eaton.com
Network UPS Tools (NUT) Project Leader - http://www.networkupstools.org/
Debian Developer - http://www.debian.org
Free Software Developer - http://arnaud.quette.free.fr/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20120111/8dc8130f/attachment.html>
More information about the Nut-upsuser
mailing list