[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling

Tim Richards tims_tank at hotmail.com
Sun Feb 5 23:25:23 UTC 2017


Hello List,

Any suggestions to solve the following would be most appreciated.

Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 C) with USB communication cables cross connected (ie UPS-webserver1 monitored by webserver2, and vice versa) to allow for stonith/fencing
OS OpenSuse Leap 42.2
NUT version 2.7.1-2.41-x86_64
Fencing agent: external/nut

Problem: When power fails to a single UPS, both nodes are shutdown. The node with the still powered UPS comes back up, but requires manual intervention to keep it providing services. I would like only the node with the "On Battery" UPS to shutdown.
The resupply of services problem seems to be that NUT on the node that comes back up will not restart until the other node restarts.

Stonith and my upssched-cmd script both use

upscmd -u ups-webserver2-master -p mypassword ups-webserver2 at webserver1 shutdown.reboot

or

upscmd -u ups-webserver1-master -p mypassword ups-webserver1 at webserver2 shutdown.reboot

as appropriate. When the cluster software (Pacemaker/Corosync) use the one of above command as part of a fencing operation, only the target node is shutdown, and its UPS's outlets power-cycled. When NUT via my upssched-cmd script issues one of the above commands both nodes shutdown and both of their UPS's outlets power-cycle.

This problem should be very rare, but it would be better to cover it rather than not.

Power failure and resupply to both UPSes (the most common problem for me) works well. I use upssched to set the same timers after power failure on each system. The receive simultaneous shutdown commands, which they obey. When power returns they both come back up.

Stonith/Fencing via the stonith resource agent external/nut resource agent works.

Thanks,
Tim.



My config files

ups.conf

On webserver1
[ups-webserver2]
        driver = usbhid-ups
        port = auto
        desc = "APC Smart-UPS C 1000/1500va"
        vendorid = 051d

On webserver2
[ups-webserver1]
        driver = usbhid-ups
        port = auto
        desc = "APC Smart-UPS C 1000/1500va"
        vendorid = 051d


nut.conf

MODE=netserver


upsd.conf

Webserver1
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.1.21 3493

Webserver2
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.1.22 3493



upsd.users

defines users (special settings required for stonith to work)

On webserver1
[ups-webserver2-slave]
        password = mypassword
        actions = SET
        instcmds = ALL
        upsmon slave

[ups-webserver2-master]
        password = mypassword
        actions = SET
        actions = FSD
        instcmds = ALL
        upsmon master


On webserver2
[ups-webserver1-slave]
        password = mypassword
        actions = SET
        instcmds = ALL
        upsmon slave

[ups-webserver1-master]
        password = mypassword
        actions = SET
        actions = FSD
        instcmds = ALL
        upsmon master

upsmon.conf

Webserver1
MONITOR ups-webserver1 at webserver2 1 ups-webserver1-master mypassword master
MONITOR ups-webserver2 at localhost 0 ups-webserver2-slave mypassword slave

Webserver2
MONITOR ups-webserver2 at webserver1 1 ups-webserver2-master mypassword master
MONITOR ups-webserver1 at localhost 0 ups-webserver1-slave mypassword slave



It needs the following
upsmon.conf

NOTIFYCMD            /usr/sbin/upssched
NOTIFYFLAG ONLINE    SYSLOG+WALL+
NOTIFYFLAG ONBATT    SYSLOG+WALL+EXEC


Configure 'upssched' by editing upssched.conf
upssched.conf

webserver1
CMDSCRIPT /bin/upssched-cmd
PIPEFN /var/lib/ups/upssched/upssched.pipe
LOCKFN /var/lib/ups/upssched/upssched.lock
AT ONBATT ups-webserver2 at localhost START-TIMER onbatt-ups-webserver2 600
AT ONLINE ups-webserver2 at localhost CANCEL-TIMER onbatt-ups-webserver2

webserver2
CMDSCRIPT /bin/upssched-cmd                                               .
PIPEFN /var/lib/ups/upssched/upssched.pipe
LOCKFN /var/lib/ups/upssched/upssched.lock
AT ONBATT ups-webserver1 at localhost START-TIMER onbatt-ups-webserver1 600
AT ONLINE ups-webserver1 at localhost CANCEL-TIMER onbatt-ups-webserver1



Edit /bin/upssched-cmd
/bin/upssched-cmd

webserver1
case $1 in
        onbatt-ups-webserver1)
                logger -t upssched-cmd "UPS-Webserver1 has gone on battery."
                ;;
        onbatt-ups-webserver2)
                logger -t upssched-cmd "UPS-Webserver2 has gone on battery."
                /usr/bin/upscmd -u ups-webserver2-master -p mypassword ups-webserver2 at webserver1 shutdown.reboot
                ;;
        *)
                logger -t upssched-cmd "Unrecognized command: $1"
                ;;
esac

Webserver2
case $1 in
        onbatt-ups-webserver1)
                logger -t upssched-cmd "UPS-Webserver1 has been gone on battery."
                /usr/bin/upscmd -u ups-webserver1-master -p mypassword ups-webserver1 at webserver2 shutdown.reboot
                ;;
        onbatt-ups-webserver2)
                logger -t upssched-cmd "UPS-Webserver2 has gone on battery."
                ;;
        *)
                logger -t upssched-cmd "Unrecognized command: $1"
                ;;
esac





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20170205/553b9c93/attachment-0001.html>


More information about the Nut-upsuser mailing list