[Nut-upsdev] Wait for network delay

Jim Klimov jimklimov at cos.ru
Tue Feb 18 17:56:55 UTC 2014


On 2014-02-16 16:36, Elliot Dierksen wrote:
> On 2/5/2014 8:24 AM, Charles Lepple wrote:
>> On Feb 4, 2014, at 10:48 PM, Elliot Dierksen wrote:
>>
>>> NUT will complain endlessly about communication errors and never
>>> establish SNMP communication with my APC UPS
>> Hmm, at first glance, I read the "complain endlessly" part as a figure
>> of speech, and figured SNMP would get there eventually since it's UDP.
>> But if you have to stop and restart NUT, that is a different story

> Feb 16 10:15:40 freenas upsmon[2939]: UPS [CR-UPS]: connect failed:
> Connection failure: Connection refused
> Feb 16 10:16:15 freenas last message repeated 7 times
> Feb 16 10:18:17 freenas last message repeated 24 times
> Feb 16 10:19:38 freenas last message repeated 16 times

I believe (recently had similar experience) what happens is as follows:
1) Your OS starts up and begins to start complicated networking which
needs some time to converge and actually work.
2) Your NUT starts up - upsdrv, upsd and upsmon.
3) The drivers have a timeout for startup (45 sec default IIRC),
and snmp-ups does not make it in time. So upsdrv fails, upsd has no
UPS data to publish, and upsmon has nothing to watch - though it
does try and "complains endlessly".

Ways out might be as follows:
1) Restart NUT as you do, or as I do optionally in the attached script
(which evaluates connectivity to the configured UPSes and if any are
missing - schedules to retry itself in a minute via the "at" utility).
Feel free to appropriate the script into the project, if deemed fit :)
2) Make infinite retries and delays until the driver finds the UPS.
Done in a "brute-force manner", this indefinitely delays your OS startup
and might even be a deadlock (i.e. if your NUT is asked to start before
networking).
3) Infinite retries, but in the background as a driver daemon. This
makes sense, since if the driver was initially "connected" and then
lost the connection (i.e. networking gear or the UPS management card
were restarted), it does retry and ultimately finds the UPS again
without manual reloads of NUT.

I am not sure if option like (3) is available in 2.7.x, the attached
script was developed and tested for a Linux system with nut-2.6.3,
and the idea (and an earlier implementation) dates way back. This
version should probably work in Solaris as well (but not yet tested),
though I can't vouch for FreeNAS and other platforms.

HTH,
//Jim Klimov

-------------- next part --------------
#!/bin/sh
#
# chkconfig: 2345 55 89
# description: The UPS monitor and shutdown controller for delayed startup retries
# Copyright (C) 2000-2014 by Jim Klimov
# $Id: ups-delayed,v 1.1 2014/02/18 17:52:49 jim Exp $

SELF="$0"
AT_DELAY="now +1 min"

### Guess the locations of needed programs and config files
### (among variants typical for various distribution layouts)
for F in {,/usr/local}/etc/init.d/upsdrv ; do
    [ x"$UPSDRV" = x ] && [ -s "$F" -a -x "$F" ] && UPSDRV="$F"
done
[ x"$UPSDRV" = x -o ! -x "$UPSDRV" ] && \
    echo "Missing UPSDRV init-script" && exit 1

for F in {,/usr/local}/etc/init.d/upsd ; do
    [ x"$UPSD" = x ] && [ -s "$F" -a -x "$F" ] && UPSD="$F"
done
[ x"$UPSD" = x -o ! -x "$UPSD" ] && \
    echo "Missing UPSD init-script" && exit 1

for F in {,/usr/local}/etc/init.d/upsmon ; do
    [ x"$UPSMON" = x ] && [ -s "$F" -a -x "$F" ] && UPSMON="$F"
done
[ x"$UPSMON" = x -o ! -x "$UPSMON" ] && \
    echo "Missing UPSMON init-script" && exit 1

for F in /etc/{nut,ups}/ups.conf ; do
    [ x"$UPSDRV_CONF" = x ] && [ -s "$F" ] && UPSDRV_CONF="$F"
done
[ x"$UPSDRV_CONF" = x -o ! -s "$UPSDRV_CONF" ] && \
    echo "Missing UPSDRV_CONF" && exit 1

for F in /usr{,/local,/local/ups/}/bin/upsc ; do
    [ x"$UPSC" = x ] && [ -s "$F" -a -x "$F" ] && UPSC="$F"
done
[ x"$UPSC" = x -o ! -x "$UPSC" ] && \
    echo "Missing UPSC client" && exit 1

for F in /var/lib/nut/var/lib/upsd /var/{state,lib}/{nut,ups}; do
    [ x"$STATEDIR" = x ] && [ -d "$F" ] && STATEDIR="$F"
done
[ x"$STATEDIR" = x -o ! -d "$STATEDIR" ] && \
    echo "Missing state directory for sockets and pidfiles" && exit 1

echo_wall() {
	echo "`date`: $*" >&2
	echo "`date`: $*" | wall
}

sched() {
	echo_wall "Scheduling delayed startup of UPS monitoring (at '$AT_DELAY')"
	echo ${SELF} start-now | at $AT_DELAY
}

start() {
	status > /dev/null 2>&1 || sched
}

status() {
	RES=0
	echo "=== Daemon states:"
	${UPSDRV} status || RES=$?
	${UPSD} status || RES=$?
	${UPSMON} status || RES=$?

	for UPS in \
	    `egrep '^\[' ${UPSDRV_CONF} | sed 's,^ *\[\(.*\)\] *,\1,g'`; do
	    echo "=== Querying ${UPS}@localhost:"
	    ${UPSC} ${UPS}@localhost || RES=$?
	done

	echo "=== Scheduled restarter job?"
	atq 2>&1

	if [ x"$VERBOSE" != x ]; then
	    echo ""
	    echo "=== Running processes and statedir ($STATEDIR) contents:"
	    ps -ef | grep -v grep | egrep '[ /]ups|nut'
	    ls -la ${STATEDIR}
	    date
	    echo ""
	fi

	echo "=== Overall result:"
	if [ "$RES" = 0 ]; then
		echo "Status:	[--OK--]"
	else
		echo "Status:	[-FAIL-]" >&2
	fi

	return $RES
}

cleanup() {
	echo "=== Trying to clean-up the NUT state directory $STATEDIR from stale files..."
	${UPSMON} stop

	${UPSD} stop
	rm -f ${STATEDIR}/upsd.pid

	${UPSDRV} stop
	for UPS in \
	    `egrep '^\[' ${UPSDRV_CONF} | sed 's,^ *\[\(.*\)\] *,\1,g'`; do
	    rm -f ${STATEDIR}/*-${UPS}{,.pid}
	done
}

start_now() {
	echo_wall "Trying to delayed-start the UPS monitoring now"
	${UPSDRV} status || \
		${UPSDRV} restart

	sleep 5

	${UPSDRV} status || { cleanup; sched; exit 1; }

	${UPSD} restart
	${UPSMON} restart

	sleep 5

	( status || { cleanup; sched; exit 1; } ) | wall
}

case "$1" in
    start)	start ;;
    status)	[ "$2" = "-v" ] && VERBOSE=1
		status ;;
    start-now)	start_now ;;
    stop)	cleanup ;;
    restart)	cleanup; start_now ;;
esac



More information about the Nut-upsdev mailing list