[Nut-upsuser] NUT with Cyber Power 700 AVR

Mon Aug 30 18:42:17 UTC 2010

>> 1) syslog errors every 20+ minutes or so like :  Aug  7 10:21:03 ben 
>> usbhid-ups[3321]: libusb_get_string: error sending control message: 
>> Broken pipe
>
> Not a cause of concern. It is a way of telling that the UPS is 
> currently not able to handle a command. Most likely this is due to the 
> UPS doing some internal housekeeping functions and the little 
> microcontroller inside is not able to handle a command. We will 
> probably suppress this in future NUT versions, as it is a common cause 
> of false alarms.
>
>> 2) syslog errors on a similar timescale like : Aug  7 08:17:40 ben 
>> kernel: [40170.402789] usb 2-1.2: usbfs: USBDEVFS_CONTROL failed cmd 
>> usbhid-ups rqt 161 rq 1 len 8 ret -110
>
> Same here. The kernel is informing you that the UPS didn't respond to 
> a command (110 = ETIMEDOUT). The cause is most likely the same as the 
> above and not a cause of concern either. Unlike the above message, 
> there is nothing we can do about this as it is logged by the kernel.
Good to know.  Thanks for the reply.
>
>> 3) The machine spontaneously shutdown this morning due to a "low 
>> battery" condition.  However, 80 minutes later when I noticed the UPS 
>> battery was at 100%.  I don't think it can charge that fast, so I 
>> think this must have been a communication error.
>
> I'm not so sure about that. Don't overestimate the accuracy of the 
> battery charge gauge on the UPS. It could be that it is just voltage 
> based, which means that it will indicate full charge long before the 
> battery is actually full. It could also mean that the battery is bad. 
> This may cause nearly instant shutdowns when the mains fails (when the 
> battery is under load) while it looks like the battery is (almost) 
> full with the mains present (and the battery is not under load). 
> Running a battery test usually reveals what is going in.
>
> Best regards, Arjen
Fair points, but I think the battery is good.  I've run several 
on-battery shutdowns lasting 90s+ by flipping the breaker (using 
upssched to initiate shutdown after 60s) and that works fine.  I ran a 
longer test for 2 or 3 minutes once and watched the UPS displayed 
estimated run time count down from 76 minutes as you'd expect it to.  
The UPS is brand new.  Also, I suspect there was no power cut - in the 
past I've had to reset my stove clock after a power cut, and I don't 
recall having to do that this time.  I have my system set up to shutdown 
after 5 mins on battery, rather than wait for a lowbatt condition, so I 
doubt the low batt could have been reached due to a power cut.... unless 
perhaps it was a night of successive 4 minute power cuts or, given the 
stove, 4 minute low-voltage conditions.  I guess we'll never know for 
sure... in any case, this was enough for me to abandon 2.4.3 and try 
Cyberpower's own offering, which suffered from curious delays itself 
which I wasn't happy with given that the eventual power-off is timer 
based rather than signal based as in nut, and thence back to nut 2.2.2...

So, I went back to nut 2.2.2 under Debian Lenny with both MAXAGE and  
DEADTIME set to 150s.  This worked OK for 10 days, with the odd type (2) 
error from above, and the odd stale data error [aside : it is my 
understanding that data must now be stale for 150s for upsmon to log a 
stale data warning to syslog, since upsd doesn't pass on the stale data 
condition until MAXAGE is reached.  So for 30 lots of 5s polls the data 
is stale... then it shows up in syslog, and, and this is what's weird, 
in almost every case it resolves itself 2s later.... just like it did 
when MAXAGE was 15s.]  After 10 days it went into a stale data condition 
that continued all night.... until I stopped it by restarting nut in the 
morning.

Since restarting nut seemed to fix the problem I decided to make 
upssched restart nut on a NOCOMM condition.  I'll briefly describe how I 
did that here in case others are interested:

I set

NOTIFYFLAG NOCOMM       SYSLOG+WALL+EXEC

in upsmon.conf in the usual way with

NOTIFYCMD /sbin/upssched

set to call upssched.  In upssched.conf I set

CMDSCRIPT /sbin/upsschedcmd

and

AT NOCOMM   * EXECUTE restart

/sbin/upsschedcmd is my command script, the relevant portion of which is :

#!/bin/bash
# This script is called by upssched on a UPS event. 
# This script is designed to be run by user nut.

case $1 in
  restart)
    /sbin/upsrestart.x
    ;;
esac

upsrestart.x is the following C code, compiled using the gcc line in the 
comment, and chowned/chmoded to have the ownership/permissions in the 
2nd comment :

#include <stdio.h>
#include <unistd.h>

/*

This program is designed to restart nut.
The binary file permissions should be -rwsr-xr-- root:nut

gcc -g -Wall -o upsrestart.x upsrestart.c

*/

int main (int argc, char *argv[])
{

  char *arg[] = { "/etc/init.d/nut", "restart", (char *) NULL };

  char *env[] = { "USER=root", "PATH=/usr/sbin:/usr/bin:/sbin:/bin", 
"HOME=/root", (char *) NULL };

  execve (arg[0], arg, env);

  // if execve() returns there has been an error

  fprintf(stderr,"upsrestart.c : error calling execve()\n");

  return(0);

}

What happens is that upssched runs /sbin/upsschedcmd as user nut, which 
runs the setuid program upsrestart.x as nut which runs /etc/init.d/nut 
restart as effective user root, restarting nut and, it appears so far, 
reestablishing connection with the UPS.  Since this runs on NOCOMM, 
default timeout 300s, that becomes the max time your system can't talk 
to your UPS.  Since I have DEADTIME set to 150s, a stale UPS that was 
last known to be on battery will shutdown before the NOCOMM restart 
takes effect. 

The binary wrapper is necessary because Linux ignores setuid bits 
applied to scripts.  Furthermore, modern versions of bash drop setuid 
privileges on startup, unless called with -p.  The /etc/init.d/nut 
script uses /bin/sh.  The above works on Debain because, according to 
the "system" man page (of all places :) : "Debian uses a modified bash 
which does not do this when invoked as sh".  On other flavours of Linux 
you may need to tweak the first line of /etc/init.d/nut to prevent it 
dropping privileges.

I think the above is safe because the binary can only restart nut, 
nothing else, and can only be run by root or nut.  I'm not exactly a 
security expert though, so I might be wrong.

Anyway, I setup the above 10 days ago, and this morning it triggered.  I 
have it configured to send me an email too.  It sent one email, and 
restarted nut successfully.  Comms were reestablished.  The only thing 
that didn't go entirely according to plan is that the old upsmon stuck 
around as a defunct nut process and a running root process.  I don't 
know why they didn't die, but they were easily killed off later 
manually.  It was definitely better to get one email and comms 
reestablished after 5 minutes than 70 emails and no communications all 
night.

best
/rob