[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
Tim Richards
tims_tank at hotmail.com
Mon Feb 13 13:08:09 UTC 2017
Charles,
Thanks for your reply. Indeed you may be right that the NUT fencing agent might be written with networked UPSes in mind, as healthy nodes could use the network to issue "fence" orders to remove unhealthy ones. I will post here if I find more info.
The problem with the resupply of services is that NUT doesn't restart on the node that comes back up. To recap, I pull the power on one UPS, both nodes shutdown. The remaining mains connected UPS power cycles its outlets, which reboots its node. Because the node has just started, it wants all of its services to be healthy before providing them. This includes the fencing agent, which relies on NUT, which hasn't started. So the node doesn't start the rest of its services (Apache, MySQL, Samba).
Relevant log entries.
Feb 13 23:11:42 xinetd[1647] Reading included configuration file: /etc/xinetd.d/cups-lpd [file/etclxinetd.d/cups-lpd] [linel 7]
Feb 13 23:11:42 systemd[1] Starting LSB: UPS monitoring software (deprecated, remote/local)...
Feb 13 23:11:43 usbhid-ups[2093] Startup successful
Feb 13 23:11:43 upsd[1 932] Starting NUT UPS drivers ..done
Feb 13 23:11:43 upsd[21 04] not listening on 192.168.1.22 port 3.493
Feb 13 23:11:43 upsd[21 04] listening on ::1 port 3493
Feb 13 23:11:43 upsd[2104] listening on 127.0.0.1 port 3493
Feb 1323:11:43 upsd[21041 no listening interface available
Feb 13 23:11:43 startproc[2095] startproc: exit status of parent of /usr/sbin/upsd: 1
Feb 13 23:11:43 usbhid-ups[20931 Signal 15: exiting
Feb 1323:11:43 upsd[1932] Starting NUTUPSserver..failed
Feb 13 23:11:43 systemd[1] upsd.service: Control process exited, codeexited status7
Feb 13 23:11:43 systemd[1] Failed to start LSB: UPS monitoring software (deprecated, remote/local).
Feb 13 23:11:43 systemd[1] upsd.service: Unit entered failed state.
Feb 13 23:11:43 systemd[1] upsd.service: Failed with result 'exit-code'.
I can manually bring the surviving node's services back up if by removing the requirement that Stonith services are enabled. I cannot get NUT to restart until I restart the 2nd node.
Regards,
Tim.
-----Original Message-----
From: Charles Lepple [mailto:clepple at gmail.com]
Sent: Sunday, 12 February 2017 11:57 AM
To: Tim Richards
Cc: nut-upsuser Mailing List
Subject: Re: [Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
On Feb 10, 2017, at 5:48 PM, Tim Richards <tims_tank at hotmail.com> wrote:
>
> I am trying to kill two birds with one stone, that is UPS protection from power failure and cluster node fencing (Stonith) with the UPS ability to cut power to a node. Somebody has done this, as there exists a fencing agent using NUT in the Pacemaker/Corosync (Linux-HA cluster software), I just don't know the best way to go about it.
Some UPS models have more than one serial port, or have a network adapter which can support multiple monitoring systems (via SNMP or HTTP/XML). Is it possible that the NUT fencing agent was written with that case in mind? That would mean that neither node would depend on the other for UPS status.
Can you elaborate on the "resupply of services problem"? With cross-connected UPSes (and only a single comm port per UPS), I am not sure if you can achieve both goals when only one UPS loses power.
(I don't think this sort of setup has been discussed much on the NUT lists, although it certainly sounds like an interesting way to use NUT. If you do find out more about how the NUT fencing agent was intended to be configured, perhaps from the fencing software lists or forums, feel free to post that here was well.)
--
- Charles Lepple
https://ghz.cc/charles/
More information about the Nut-upsuser
mailing list