[Nut-upsdev] Asking hard questions about the NUT architecture

oss-list-ups at technorama.net oss-list-ups at technorama.net
Tue May 29 20:51:57 UTC 2007


On Tue, May 29, 2007 at 11:43:49AM -0400, Eric S. Raymond wrote:
> TROUBLING QUESTIONS:
> 
> Let's start with a simple question that strikes at the root of my discontent: 
> 
> 1. In an era when file systems are normally journaled and hardened so they 
> recover clean from power failures, what point is there in having your UPS
> initiate a controlled shutdown N seconds beforehand?

Did you forget about RAID?

The task of a RAID is to maintain an invariant between the data and the 
redundant information it stores. These invariants provide the ability to 
recover data in the case of a disk failure. For RAID-1, this means that 
each mirrored block contains the same data. For parity schemes, such as 
RAID-5, this means that the parity block for each stripe stores the 
exclusive-or of its associated data blocks.

However, because the blocks reside on more than one disk, updates cannot 
be applied atomically. Hence, maintaining these invariants in the face 
of failure is challenging. If a crash occurs during a write to an array, 
its blocks may be left in an inconsistent state. Perhaps only one mirror 
was successfully written to disk, or a data block may have been written 
without its parity update.


Some ATA/SATA drives lie about disk writes (meaning the journal may not 
be updated).

RAM is often the first thing to experience problems during a low power 
condition.  Random data may end up on the disk during a power failure.

Databases need to shutdown properly or they may end up corrupt.

Any application that modifies more than 1 file can end up in an 
inconsistent state after a power failure.


Journaling file systems can't help any of the above.


> 
> I sure don't see enough of one to justify a multi-layer architecture
> involving three concurrent processes, four configuration files, and a
> partridge in a pear tree.  NUT, as it is now, seems to me to be a
> textbook case of massive overengineering and overkill based on
> outdated assumptions.

Is this the 80's?  Do you have enough RAM for 3 processes?  NUT follows 
the Unix philosophy:
    Write programs that do one thing and do it well.
    Write programs to work together.
    Write programs to handle text streams, because that is a universal interface.


Many of the drivers can't be tested by the core NUT developers since 
they don't have the hardware.  If a single program crashes how do you 
know if it's the driver portion, monitoring portion, etc, etc.


> One possible reply is that filesystem hardening sometimes fails.  But
> that objection implies the right solution, which is to fix the
> filesystem hardening rather than messing around with compensatory
> kludges in userspace.

Perhaps you should spend your time hardening the hardware so that NUT 
and UPS's in general are obsolete.


> At modern disk-write speeds there is plenty
> of time for journaling OS buffers between SIGPWR and when the heads 
> no longer get juice.

See "RAM" above.


> Now I'll anticipate another possible reply: even if modern OS/filesystem
> layers have made UPS-controlled shutdowns rather pointless, there's
> still some case for NUT as a tool for logging the state history of
> your powerlines and the UPS itself. 

> Under this assumption, upsmon and its config files would go away but 
> there would still be a role for upsd and the drivers.

Maybe you use google (try "ext3 corruption") instead of making 
assumptions.

There are many ways that a filesystem can be corrupted after power 
failure.  The purpose of a UPS and NUT is to guard against them.


> The rest of this note will be founded on this assumption, because
> without it NUT would be *entirely* pointless.

...

> Which brings me to my next question: 
> 
> 2. Why should we care about contact-closure UPSes any more?  
> 
> NUT carries around a huge load of tools and documentation designed
> for a class of UPSes that can't be health-monitored.  All they can
> do is initiate shutdown before the actual power drop, which as I've 
> argued above is nowadays unnecessary.

See above.


> It follows that NUT could drop all support for contact-closure and
> "dumb" UPSes tomorrow and it wouldn't make a damn bit of difference
> to anybody, at least nobody in the Linux world.  Ext3 is everywhere
> now, and the odds that Linux distros will drop back to a non-hardened
> filesystem in the future are nil.

NUT is Linux only?  Maybe that should be mentioned on the web page.


> 3.  Why should we care about 'smart' serial UPSes any more?
> 
> I've already shown that supporting dumb UPSes is silly under modern
> conditions.  Supporting anything RS232, in particular smart serial
> UPSes, seems almost as pointless in 2007. USB UPSes are ubiquitous
> and cheap; when I recently went shopping for a UPS I found a
> USB-capable unit for $29.99.  Belkin no longer manufactures any
> RS232-only units at all and APC hasn't shipped a new RS232-only design
> since at least 2004.

A $30 UPS tends to fail more often than the power does.

Show me where I can find a 2000VA UPS for $30 to replace the one 
that I have.

> And there are good reasons to drop RS232 if it's only a legacy feature.
> Supporting RS232 greatly complicates the NUT codebase, documentation,
> and installation procedure, because those ports are hard to configure
> and don't announce themselves at device-connection time.

If you don't have a RS232 UPS then you don't need to configure a driver 
for it.

Removing drivers may be your personal preference but it has nothing to 
do with NUT configuration, maintenance or development.


> Concentrating on USB devices would implications that are user-visible.
> In particular, it should mean the entire manual-configuration
> apparatus of NUT (everything in /etc/nut) could be made to go away.

Then how do we make network UPS's work?  How are those automatically 
configured?

Having the hardware drivers abstracted away from the monitoring is good 
design.  Not everything is USB and USB may be replaced in the future 
just like serial ports have partially been replaced today.


> 1. A drivers kit (ups-drivers).

Maybe you should suggest that to the linux kernel developers first.

Why should the drivers be separate?  NUT is useless without drivers.

Do you mean RS232 drivers should be separate?  Why?  Do they take up too 
much disk space?  Is build time a problem?  Are we back in 1980?

Extra drivers sitting on the disk can only help people who need them.

It looks like you only want to make it a PITA to work with anything you 
consider "old".


> 2. A HAL interface for the drivers (ups-hal, depending on ups-drivers).
> 
> 3. For non-HAL environments, the aforementioned zero-configuration monitor
>    driven by hotplug requests (nutless, depending on ups-driver).
> 
> A good first step would be to at least package the drivers separately 
> from upsd and above, and to warn all NUT users that the upper layer
> will be going away in the future.

Bad ideas.  All of them.  Except for auto configuration of USB devices.




More information about the Nut-upsdev mailing list