[Nut-upsuser] Tripplite SNMPwebcard communication lost and established randomly
jgould at cddiagnostics.com
Tue Mar 31 13:07:55 UTC 2015
I missed that snmp-ups set pollfreq to 30 by default. I'd hope that it would set deadtime to an appropriate value as well than, no? Is there a way to query what the settings are or what the defaults are?
I'm going to set POLLFREQ=30 and DEADTIME = 90 and see if that has any effect. I'll check for the logs. And if that doesn't get my anywhere I'll look at trying to simulate with snmpwalk like you mentioned.
From: Charles Lepple [mailto:clepple at gmail.com]
Sent: Monday, March 30, 2015 11:20 PM
To: Jason Gould
Cc: NUT Users
Subject: Re: [Nut-upsuser] Tripplite SNMPwebcard communication lost and established randomly
[please use Reply-All to keep the list CC'd, thanks]
On Mar 30, 2015, at 11:04 PM, Jason Gould <jgould at cddiagnostics.com<mailto:jgould at cddiagnostics.com>> wrote:
Thanks. I should also mention the freenas server is setup with an lacp lagg.
I will also mention that even when not setting pollfreq (leave as default) the issue still occurs.
It looks like the default for pollfreq is actually 30 on snmp-ups, which means you would want DEADTIME to be at least 2x that in order for a single stale data event to not trigger the COMMBAD message, and the default is only 15. As you can imagine, choosing default values that work across all the UPS models is somewhat tricky.
FreeNAS has an additional parameters area to add to the upsmon.conf. I verified it does actually write to the config file.
Good to know.
The messages I posted are just what pops up in a putty session. Is there any debugging or logs I can look at to get a better idea of what happens when this is occurring?
NUT drivers write to syslog - you might need to check with the FreeNAS folks to see what happens to those messages. On a regular BSD box, it might be /var/log/messages
As for snmpwalk, is there a command I can let run to determine if there is a problem? Because that is the problem, it happens randomly so unless I sit in front of a computer for 12 hours ready to do something at just the right time, I won't catch it.
I'd run it in a loop (maybe with a 'sleep 30' to mimic the polling interval of NUT), and check the log file every so often. snmpwalk should have more descriptive error messages than snmp-ups, but they both use the same underlying library, so YMMV. There are a few flags that you can add to snmpwalk to adjust the error output.
I also looked at the Cisco router and it isn't showing any packet loss.
It is possible that packet loss is happening on the SNMPwebcard. Those sorts of cards typically do not have fast CPUs. But the snmpwalk results should shed some more light on this.
On Mar 30, 2015 8:30 PM, Charles Lepple <clepple at gmail.com<mailto:clepple at gmail.com>> wrote:
On Mar 30, 2015, at 10:41 AM, Jason Gould <jgould at cddiagnostics.com<mailto:jgould at cddiagnostics.com>> wrote:
Just started using NUT and having a problem. I'm not sure if it is even NUT that is causing the issue.
I'm running the latest FreeNAS 9.3 release.
Network UPS Tools upsmon 2.7.2.
The UPS is a Tripplite SU6000RT4UHVPM with the SNMPwebcard installed.
I'm trying to use the SNMPwebcard, not USB or Serial. I've updated the SNMPwebcard to the latest firmware version (see below).
Thanks, this is good background information.
In FreeNAS I was able to create a connection to the UPS by simply adding the IP address for the Port and defining the driver as "Various ups 3 (various) SNMP - RFC 1628 (snmp-ups)". After that querying it just worked via "upsc tripplite". And everything looks fine. However after a few hours I started getting emails and messages in the console about communication lost (COMMBAD) and shortly after it being established. This happens over and over again. I thought perhaps I needed to change some of the settings in ups.conf. I tried adjusting the pollfreq up to 30, using an snmp_version v2c, and a few other adjustments. However the behavior persists.
There are two stages here: the snmp-ups driver polls every 'pollfreq' seconds, and upsmon sends a warning if an UPS is stale for longer than DEADTIME seconds (as set in upsmon.conf). So if you increase pollfreq, you would probably want to increase DEADTIME as well. (I'm not sure how FreeNAS exposes this setting.) The upsmon.conf documentation recommends multiplying pollfreq by three to get DEADTIME. You may want to go higher than that, depending on how long the interval is between COMMBAD and COMMOK (the messages you quoted only have one-minute resolution).
Unfortunately, it looks like snmp-ups goes into COMMBAD mode if even one of the SNMP queries fails (and it queries a lot of OIDs), so you would need to rely on DEADTIME to filter out any transient packet loss. You might also see if your firewall or switch can prioritize SNMP packets using QoS settings - I am not familiar with how many times NetSNMP retries, but SNMP is UDP-based.
You could also try running 'snmpwalk' by hand against the SNMPwebcard to see if it experiences the same packet loss. If that doesn't work, I think you have a case for Tripp Lite support.
clepple at gmail
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Nut-upsuser