[Nut-upsdev] Asking hard questions about the NUT architecture

Tue May 29 23:57:22 UTC 2007

Carlos Rodrigues <carlos.efr at mail.telepac.pt>:
> Journaling isn't magical. No amount of "fixing" is going to change that.

Agreed.  But UPS-controlled shutdown isn't magical either.  Nothing is.
The question is where our effort is best put.  I'm arguing that effort 
spent on UPS-controlled shutdown is inefficient.

> Well, I guess an UPS would fix at least the problems related to
> unexpected corruption due to power outages, which is much better than
> not having an UPS.

No argument there.  It's not the utility of UPSes I'm questioning.

> > Because normally the only response needed is to sync the file
> > descriptors, which is what happens anyway.  The few apps that get
> > SIGPWR wrong are probably screwing the pooch on SIGTERM too, which is
> > how they see a UPS-controlled shutdown.
> 
> Every application running as a daemon has a way to shut itself down.
> If it does that by handling SIGTERM or some other mechanism is
> completely irrelevant. What matters is that doing
> "/etc/init.d/someservice stop" does the trick, and a UPS triggered
> shutdown is going to do just that.

*blink*  *blink*

And a SIGPWR shutdown *isn't* going to do that?  

Um...what planet is your Unix from?  This behavior has been in POSIX 
since like 1987 or something.  The only issue is whether the grace period
between SIGPWR and when the voltage actually goes too low to be useful is
long enough.

> Even if you could have something that triggered an interrupt once the
> power went below some threshold, that thing would still have to be
> fast and accurate. Monitor chips like those supported by lm_sensors
> are neither, and I belive SMC cards aren't either.

Aha!  Finally something like a factual counterargument!  OK, you're 
right: if the voltage supervisor takes forever to notice that the 
power is going south, the grace period could get squeezed to nothing.  
But as I said to Pete, my systems have not behaved as though that's true
since before the turn of the century.

Heck, I've accidentally joggled a power-cord loose while typing to an
Emacs buffer and rebooted without losing even the very last character.
Surprised me first time it happened, because I started out c.1983
administering VAXen on which power glitches frequently meant you were
going to get all close and cuddly with fsdb for a couple hours.
Doesn't surprise me any more.

> But it doesn't end here... the work that the system would have to do
> to sync all there is to be synced would take far longer that the
> available "critical" window. Even if we ignore the fact that there
> would be no way to know if we are just writing garbage from RAM (like
> someone else mentioned). Try stopping an idle mysql instance one of
> these days to see what I mean... And then try stopping VMware (for
> something that takes a *really* long time)...

In principle you've got a point.  It certainly *could* be the case that
checkpointing MySQL or VMWare takes too long.  It certainly *could* be 
the case that RAM flakes out first and corrupts your FS image.  But I've
never seen screwage for which this was a least hypothesis.

What we need is data.  Do you have any?

> There's a lot of stuff out there that "make zero configuration
> impossible". That's just the way it is.

I regard that as a bug to be fixed, not an excuse for inertia.

> And I know the value of my time, yes. I also know the value of my
> money, and the (environmental) cost of disposing of hardware in
> perfectly working order.

So, where did this myth arise that I want you to dispose of anything?

> But the fact is, NUT's complexity isn't legacy driven.

Nonsense.  It absolutely is.  I've looked at the dataflows.  Starting
from the udev-hotplug agent I wrote for gpsd (which solves a rather
trickier version of the problem, because you can't actually deduce a
GPS's type from the USB vendor-id/product-id pair) I'm quite confident
I could write a zero-configuration lightweight monitor for
single-USB-UPS/single-computer installations in Python using fewer
than ten working days.

I'm not going to do it, though, without knowing the NUT project
would actually *use* it.  At this point that is far from established,
though Arnaud does seem to grasp the issues and the need better than anyone
else who has responded.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>