[Nut-upsuser] Edge case in our NUT deployment, asking for guidance
Arthur Desplanches
adesplanches at buf.com
Thu Apr 21 16:53:23 BST 2022
Hi,
Thanks for the confirmation regarding the FSD flag.
I tried to use nut-ipmipsu for this, since the PSUs were picked up by
nut-scanner on my test server. But I wasn't successful in configuring
the driver correctly, or maybe it's not yet compatible with my chassis
at its current version. I'll try to work on that in the next few days.
If not I'll dive in upssched to see if a solution lies there.
Thanks
Arthur Desplanches
On 4/17/22 01:13, Jim Klimov wrote:
> As far as I know the FSD flag by design can only be raised; many
> phrases refer to it as "latching" - much for the same reasons as you
> outlined: people usually want the datacenter in a predictable
> hands-off state. If something begins to shut down due to critical
> power state of the UPS, everything should power-cycle and come up
> together and in order. So the only way to clear FSD is to restart the
> daemons raising it.
>
> Note some UPSes and their smart drivers would treat as critical any
> situation where battery charge is under a certain threshold - even if
> online and charging at the moment, since the UPS is too depleted for a
> safe shutdown if power is lost again.
>
> I wonder if you can fiddle with ipmi-psu driver for your case. NUT has
> a way to treat blade chassis as an ePDU for the blades. Maybe you can
> get upsmon to monitor an UPS and the other PSU on redundant-PSU systems.
>
> Also see if some smarter scripting with upssched (as the handler of
> signals from upsmon for complex situations) can help...
>
> Hope this helps,
> Jim Klimov
>
> On Fri, Apr 15, 2022, 14:17 Arthur Desplanches <adesplanches at buf.com>
> wrote:
>
> Hi,
>
> I'm working on deploying NUT for our main server room, and I've put
> myself in a situation where there could be an edge case that I don't
> really like, and so I'm asking for a bit of guidance.
>
> Most of the deploymentis fairly standard according to the
> documentation
> (big thanks for its exhaustiveness, by the way), the main change
> is my
> shutdown script. It checks if we have power on both PSUs using
> IPMI, and
> if this is the case, it doesn't shut down (in any other case it
> does :
> ipmi doesn't work, only one PSU is receiving power, only one PSU
> exists
> on the machine, etc). We did this because about 90% of our
> machines have
> dual redundant PSUs, with one on the UPS, the other on mains buton a
> separate circuit. So we could have a situation where the UPS loses
> power, but we still have some on a secondary circuit.
>
> We choose to accept the fact that if the secondary circuit loses
> power
> after our NUT server sent a force shutdown sequence, we may have a
> bad
> shutdown at this point.
>
> What could happen in this situation, is that a machine that is a
> nut-server (A) still has the FSD flag running (because it didn't shut
> itself down) even after power comes back and some machines
> restart. In
> this case, the upsmon on the freshly started machines will see the
> flag
> and then shut themselves down again.
>
> Our workaround currently would be to be aware of this and restart
> nut-server and then nut-monitor on the machine A before starting back
> any of its clients that is currently down.
>
> Is there any idea of a better way to handle this edge case ? Or a
> better
> way to articulate this ? Maybe a way to automatically clear the
> FSD flag ?
>
> Thanks for the help
> Arthur
>
> --
> Arthur Desplanches
> Sysadmin @ BUF Compagnie (buf.com <http://buf.com>)
>
>
> _______________________________________________
> Nut-upsuser mailing list
> Nut-upsuser at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>
--
Arthur Desplanches
Sysadmin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20220421/29955b94/attachment.htm>
More information about the Nut-upsuser
mailing list