[Nut-upsuser] Shutdown the servers first, keep the network running
Greg Troxel
gdt at lexort.com
Sat Nov 23 13:13:13 GMT 2024
Dan Langille via Nut-upsuser <nut-upsuser at alioth-lists.debian.net>
writes:
> I have an idea for my shutdown process at home. My goal: maximize the network run-time. At present, the UPS has a run-time of about 57 minutes.
>
> This is my idea:
>
> * shutdown the servers after 15 minutes of downtime (for me, that's when battery.runtime hits 40)
> * leave the network gear (switches, firewall, wifi) running so I can continue with Internet access
>
> Optionally:
> * when we get down to 10 minutes, let everything else shutdown
>
> The goal: I can keep working from my home office - there's a separate UPS up there.
I live in a town with a high ratio of
trees that could fall on a line
/
electric meters
and thus we have a fairly large number of outages, even though our power
company does a great job.
I will second Kelly's point that once power has been out 5 minutes it is
unlikely to be back soon. I have been keeping track, and basically
- There are a lot of 3-5s outages. I believe these are faults that
clear themselves (squirrel stops conducting :-( or branch falls the
rest of the way) or have cleared by the time the recloser recloses.
Sometimes it is 5s out, 2s on, 5s out, 2s on, 5s out, back on.
Sometimes that and then just out. Often just out cleanly, and
sometimes pretty messy.
- There was one outage recently of just about 3 minutes. I don't
understand what happened, but it was apparently substation-wide. I
am guessing some protection tripped and because they were there it
could be brought up again fast. I have no memory of this ever
happening
- There was a scheduled outage at 0300, when it was well below
freezing, to remove branches from a 115 kV line, that lasted only
13m. A huge round of applause for the guy in the bucket truck who
did not damage the transmission line! Also, I had a 19m outage,
which I suspect was also planned (if not announced) as part of
restoring others after damage.
- After that, I think the fastest was 28m, and 40-70m typical, for
things that were "minor". There have been changes to distribution
protection and these are rarer; I think reclosers are effective at
ensuring that close-to-fault protection devices open, enabling the
rest to stay on.
- Plus some longer ones (multiple hours to small numbers of days),
resulting from more serious damage, from trees on wires to broken
poles.
So aside from the single 3m outage, I would have told you that once
power has been out for 30s, it's going to be 30m at least, maybe more.
But given that, the 5m guidance sounds good.
The other thing is that I more or less believe that running the UPS all
the way out is probably rougher on batteries than shutting down when it
claims 10m. But I also believe that UPS service is really tough on
batteries and they seem to be reliably in need of replacment at 4 years.
And, almost every battery I have pulled from a UPS (which I do when it
becomes troubled) has been messed up, usually a shorted cell or a very
weak cell. Whereas batteries proactively pulled after 5y from a FiOS
ONT, are often ok. So I am not at all sure that trying to be nice to
the batteries is a good strategy.
So I would recommend:
- shut down servers after 5 minutes of outage
- shut down firewall and killpower when runtime <10m (or maybe 5m)
- have some way to start servers, such as switch controllable by
firewall
- once you have a way to bring servers back hands off, consider server
shutdown at 30s of outage, and restore after 15m of no outage
- going over all your non-server stuff and thinking if you can reduce
usage
- log outages and also transfer to battery events. log remaining
runtime vs time so you can see what the mapping is from reported
runtime to actual runtime
Your outage patterns may be different, so I may be off about the precise
timings. I suspect though, that there is a gulf between "protection
device restores power in seconds" and "truck roll". 28m for drive to
fault, visually inspect, decide it's cleared, replace fuse, is amazing
and only happens if the people are in the office next to the truck, and
even then it needs more luck.
More information about the Nut-upsuser
mailing list