[Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status

Fri Aug 14 22:50:49 UTC 2015

[please keep the list CC'd, thanks]

> On Aug 14, 2015, at 10:27 AM, Bob Brown <bbrown at harpercollege.edu> wrote:
> 
> Would it put NA in the log files if maxage was too short?

I think so. I haven't tried it, but if snmp-ups polls every 30 seconds, and MAXAGE is 15, then with a random polling interval, 50% of the time the data would be more than 15 seconds old. In practice, the driver and upsd were probably started around the same time, and due to differences in timing between upsd and the driver, they occasionally drift out of sync. (Increasing MAXAGE is probably the simplest way around that.)

The usbhid-ups USB driver (which tends to get a lot more testing these days) splits polling into high and low priority updates, so that important variables get update frequently (many times in one MAXAGE period), and other variables get updated later. But successful high-priority updates clear out the "data stale" flag. There are comments in the snmp-ups driver that recommend implementing that there as well, but nobody has written that code yet.

>  Does it sound like a network delay and that's why it put NA and marked the data as stale?

Packet loss between the driver and the SNMP card in the UPS is more likely than delay (the MAXAGE and pollfreq arguments are in seconds), but as I mentioned, the current version of NUT retries a few times, so ultimately, packet loss shouldn't be a problem.

> -Bob
> 
> -----Original Message-----
> From: Charles Lepple [mailto:clepple at gmail.com] 
> Sent: Friday, August 14, 2015 8:28 AM
> To: Bob Brown
> Cc: nut-upsuser at lists.alioth.debian.org
> Subject: Re: [Nut-upsuser] occassional problem wtih upslog and apc ups units via snmp ... [NA] status
> 
> On Aug 14, 2015, at 8:53 AM, Bob Brown <bbrown at harpercollege.edu> wrote:
>> 
>> I see this in syslog:
>> Aug 12 03:57:22 orion upsd[18274]: Data for UPS [a102-apc-10] is stale - check driver
> 
> I don't have a SNMP UPS to test with, and this might be different for NUT 2.4.1, but the current defaults seem to make stale data more likely in your case.
> 
> In http://www.networkupstools.org/docs/man/snmp-ups.html#_extra_arguments the default for "pollfreq" is 30 seconds, and in http://www.networkupstools.org/docs/man/upsd.conf.html the "MAXAGE" default is 15 seconds. So you might want to either increase MAXAGE, or decrease pollfreq.
> 
> I would recommend making MAXAGE at least pollfreq (possibly 1.5x larger), because decreasing pollfreq might run up against the amount of time that it takes to poll all of the SNMP variables from the UPS. There are some retries and timeouts in snmp-ups which make this difficult to guess without seeing the debug logs. You might check the list archives to see if anyone has suggestions on what pollfreq and MAXAGE values worked for them.
> 
>> (one for each UPS).
>> 
>> Then I see a bunch of messages like this:
>> 
>> Aug 12 03:58:02 orion upsd[18274]: write() failed for ::1: Broken pipe
> 
> This just means that the client (upslog, probably) disconnected before upsd expected the session to end. The upsd-to-driver connection is a Unix-domain socket (and the driver-to-UPS connection is UDP).
> 
>> Aug 12 03:58:02 orion snmpd[2872]: Connection from UDP: [127.0.0.1]:50940
>> 
> 
> Might be unrelated, unless I misunderstood your NUT setup (upslog -> upsd -> snmp-ups -> UPS). The snmpd daemon and the UPS comms card are SNMP servers ("agents" in SNMP parlance, IIRC), and snmp-ups is a SNMP client.
> 
> -- 
> Charles Lepple
> clepple at gmail
> 
> 
> 

-- 
Charles Lepple
clepple at gmail