[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
Tim Richards
tims_tank at hotmail.com
Sun Feb 5 23:25:23 UTC 2017
Hello List,
Any suggestions to solve the following would be most appreciated.
Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 C) with USB communication cables cross connected (ie UPS-webserver1 monitored by webserver2, and vice versa) to allow for stonith/fencing
OS OpenSuse Leap 42.2
NUT version 2.7.1-2.41-x86_64
Fencing agent: external/nut
Problem: When power fails to a single UPS, both nodes are shutdown. The node with the still powered UPS comes back up, but requires manual intervention to keep it providing services. I would like only the node with the "On Battery" UPS to shutdown.
The resupply of services problem seems to be that NUT on the node that comes back up will not restart until the other node restarts.
Stonith and my upssched-cmd script both use
upscmd -u ups-webserver2-master -p mypassword ups-webserver2 at webserver1 shutdown.reboot
or
upscmd -u ups-webserver1-master -p mypassword ups-webserver1 at webserver2 shutdown.reboot
as appropriate. When the cluster software (Pacemaker/Corosync) use the one of above command as part of a fencing operation, only the target node is shutdown, and its UPS's outlets power-cycled. When NUT via my upssched-cmd script issues one of the above commands both nodes shutdown and both of their UPS's outlets power-cycle.
This problem should be very rare, but it would be better to cover it rather than not.
Power failure and resupply to both UPSes (the most common problem for me) works well. I use upssched to set the same timers after power failure on each system. The receive simultaneous shutdown commands, which they obey. When power returns they both come back up.
Stonith/Fencing via the stonith resource agent external/nut resource agent works.
Thanks,
Tim.
My config files
ups.conf
On webserver1
[ups-webserver2]
driver = usbhid-ups
port = auto
desc = "APC Smart-UPS C 1000/1500va"
vendorid = 051d
On webserver2
[ups-webserver1]
driver = usbhid-ups
port = auto
desc = "APC Smart-UPS C 1000/1500va"
vendorid = 051d
nut.conf
MODE=netserver
upsd.conf
Webserver1
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.1.21 3493
Webserver2
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.1.22 3493
upsd.users
defines users (special settings required for stonith to work)
On webserver1
[ups-webserver2-slave]
password = mypassword
actions = SET
instcmds = ALL
upsmon slave
[ups-webserver2-master]
password = mypassword
actions = SET
actions = FSD
instcmds = ALL
upsmon master
On webserver2
[ups-webserver1-slave]
password = mypassword
actions = SET
instcmds = ALL
upsmon slave
[ups-webserver1-master]
password = mypassword
actions = SET
actions = FSD
instcmds = ALL
upsmon master
upsmon.conf
Webserver1
MONITOR ups-webserver1 at webserver2 1 ups-webserver1-master mypassword master
MONITOR ups-webserver2 at localhost 0 ups-webserver2-slave mypassword slave
Webserver2
MONITOR ups-webserver2 at webserver1 1 ups-webserver2-master mypassword master
MONITOR ups-webserver1 at localhost 0 ups-webserver1-slave mypassword slave
It needs the following
upsmon.conf
NOTIFYCMD /usr/sbin/upssched
NOTIFYFLAG ONLINE SYSLOG+WALL+
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
Configure 'upssched' by editing upssched.conf
upssched.conf
webserver1
CMDSCRIPT /bin/upssched-cmd
PIPEFN /var/lib/ups/upssched/upssched.pipe
LOCKFN /var/lib/ups/upssched/upssched.lock
AT ONBATT ups-webserver2 at localhost START-TIMER onbatt-ups-webserver2 600
AT ONLINE ups-webserver2 at localhost CANCEL-TIMER onbatt-ups-webserver2
webserver2
CMDSCRIPT /bin/upssched-cmd .
PIPEFN /var/lib/ups/upssched/upssched.pipe
LOCKFN /var/lib/ups/upssched/upssched.lock
AT ONBATT ups-webserver1 at localhost START-TIMER onbatt-ups-webserver1 600
AT ONLINE ups-webserver1 at localhost CANCEL-TIMER onbatt-ups-webserver1
Edit /bin/upssched-cmd
/bin/upssched-cmd
webserver1
case $1 in
onbatt-ups-webserver1)
logger -t upssched-cmd "UPS-Webserver1 has gone on battery."
;;
onbatt-ups-webserver2)
logger -t upssched-cmd "UPS-Webserver2 has gone on battery."
/usr/bin/upscmd -u ups-webserver2-master -p mypassword ups-webserver2 at webserver1 shutdown.reboot
;;
*)
logger -t upssched-cmd "Unrecognized command: $1"
;;
esac
Webserver2
case $1 in
onbatt-ups-webserver1)
logger -t upssched-cmd "UPS-Webserver1 has been gone on battery."
/usr/bin/upscmd -u ups-webserver1-master -p mypassword ups-webserver1 at webserver2 shutdown.reboot
;;
onbatt-ups-webserver2)
logger -t upssched-cmd "UPS-Webserver2 has gone on battery."
;;
*)
logger -t upssched-cmd "Unrecognized command: $1"
;;
esac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20170205/553b9c93/attachment-0001.html>
More information about the Nut-upsuser
mailing list