[Nut-upsuser] NUT with Cyber Power 700 AVR
Rob Donovan
hikerman2005-nut at yahoo.com
Mon Aug 30 18:42:17 UTC 2010
>> 1) syslog errors every 20+ minutes or so like : Aug 7 10:21:03 ben
>> usbhid-ups[3321]: libusb_get_string: error sending control message:
>> Broken pipe
>
> Not a cause of concern. It is a way of telling that the UPS is
> currently not able to handle a command. Most likely this is due to the
> UPS doing some internal housekeeping functions and the little
> microcontroller inside is not able to handle a command. We will
> probably suppress this in future NUT versions, as it is a common cause
> of false alarms.
>
>> 2) syslog errors on a similar timescale like : Aug 7 08:17:40 ben
>> kernel: [40170.402789] usb 2-1.2: usbfs: USBDEVFS_CONTROL failed cmd
>> usbhid-ups rqt 161 rq 1 len 8 ret -110
>
> Same here. The kernel is informing you that the UPS didn't respond to
> a command (110 = ETIMEDOUT). The cause is most likely the same as the
> above and not a cause of concern either. Unlike the above message,
> there is nothing we can do about this as it is logged by the kernel.
Good to know. Thanks for the reply.
>
>> 3) The machine spontaneously shutdown this morning due to a "low
>> battery" condition. However, 80 minutes later when I noticed the UPS
>> battery was at 100%. I don't think it can charge that fast, so I
>> think this must have been a communication error.
>
> I'm not so sure about that. Don't overestimate the accuracy of the
> battery charge gauge on the UPS. It could be that it is just voltage
> based, which means that it will indicate full charge long before the
> battery is actually full. It could also mean that the battery is bad.
> This may cause nearly instant shutdowns when the mains fails (when the
> battery is under load) while it looks like the battery is (almost)
> full with the mains present (and the battery is not under load).
> Running a battery test usually reveals what is going in.
>
> Best regards, Arjen
Fair points, but I think the battery is good. I've run several
on-battery shutdowns lasting 90s+ by flipping the breaker (using
upssched to initiate shutdown after 60s) and that works fine. I ran a
longer test for 2 or 3 minutes once and watched the UPS displayed
estimated run time count down from 76 minutes as you'd expect it to.
The UPS is brand new. Also, I suspect there was no power cut - in the
past I've had to reset my stove clock after a power cut, and I don't
recall having to do that this time. I have my system set up to shutdown
after 5 mins on battery, rather than wait for a lowbatt condition, so I
doubt the low batt could have been reached due to a power cut.... unless
perhaps it was a night of successive 4 minute power cuts or, given the
stove, 4 minute low-voltage conditions. I guess we'll never know for
sure... in any case, this was enough for me to abandon 2.4.3 and try
Cyberpower's own offering, which suffered from curious delays itself
which I wasn't happy with given that the eventual power-off is timer
based rather than signal based as in nut, and thence back to nut 2.2.2...
So, I went back to nut 2.2.2 under Debian Lenny with both MAXAGE and
DEADTIME set to 150s. This worked OK for 10 days, with the odd type (2)
error from above, and the odd stale data error [aside : it is my
understanding that data must now be stale for 150s for upsmon to log a
stale data warning to syslog, since upsd doesn't pass on the stale data
condition until MAXAGE is reached. So for 30 lots of 5s polls the data
is stale... then it shows up in syslog, and, and this is what's weird,
in almost every case it resolves itself 2s later.... just like it did
when MAXAGE was 15s.] After 10 days it went into a stale data condition
that continued all night.... until I stopped it by restarting nut in the
morning.
Since restarting nut seemed to fix the problem I decided to make
upssched restart nut on a NOCOMM condition. I'll briefly describe how I
did that here in case others are interested:
I set
NOTIFYFLAG NOCOMM SYSLOG+WALL+EXEC
in upsmon.conf in the usual way with
NOTIFYCMD /sbin/upssched
set to call upssched. In upssched.conf I set
CMDSCRIPT /sbin/upsschedcmd
and
AT NOCOMM * EXECUTE restart
/sbin/upsschedcmd is my command script, the relevant portion of which is :
#!/bin/bash
# This script is called by upssched on a UPS event.
# This script is designed to be run by user nut.
case $1 in
restart)
/sbin/upsrestart.x
;;
esac
upsrestart.x is the following C code, compiled using the gcc line in the
comment, and chowned/chmoded to have the ownership/permissions in the
2nd comment :
#include <stdio.h>
#include <unistd.h>
/*
This program is designed to restart nut.
The binary file permissions should be -rwsr-xr-- root:nut
gcc -g -Wall -o upsrestart.x upsrestart.c
*/
int main (int argc, char *argv[])
{
char *arg[] = { "/etc/init.d/nut", "restart", (char *) NULL };
char *env[] = { "USER=root", "PATH=/usr/sbin:/usr/bin:/sbin:/bin",
"HOME=/root", (char *) NULL };
execve (arg[0], arg, env);
// if execve() returns there has been an error
fprintf(stderr,"upsrestart.c : error calling execve()\n");
return(0);
}
What happens is that upssched runs /sbin/upsschedcmd as user nut, which
runs the setuid program upsrestart.x as nut which runs /etc/init.d/nut
restart as effective user root, restarting nut and, it appears so far,
reestablishing connection with the UPS. Since this runs on NOCOMM,
default timeout 300s, that becomes the max time your system can't talk
to your UPS. Since I have DEADTIME set to 150s, a stale UPS that was
last known to be on battery will shutdown before the NOCOMM restart
takes effect.
The binary wrapper is necessary because Linux ignores setuid bits
applied to scripts. Furthermore, modern versions of bash drop setuid
privileges on startup, unless called with -p. The /etc/init.d/nut
script uses /bin/sh. The above works on Debain because, according to
the "system" man page (of all places :) : "Debian uses a modified bash
which does not do this when invoked as sh". On other flavours of Linux
you may need to tweak the first line of /etc/init.d/nut to prevent it
dropping privileges.
I think the above is safe because the binary can only restart nut,
nothing else, and can only be run by root or nut. I'm not exactly a
security expert though, so I might be wrong.
Anyway, I setup the above 10 days ago, and this morning it triggered. I
have it configured to send me an email too. It sent one email, and
restarted nut successfully. Comms were reestablished. The only thing
that didn't go entirely according to plan is that the old upsmon stuck
around as a defunct nut process and a running root process. I don't
know why they didn't die, but they were easily killed off later
manually. It was definitely better to get one email and comms
reestablished after 5 minutes than 70 emails and no communications all
night.
best
/rob
More information about the Nut-upsuser
mailing list