[Nut-upsdev] Asking hard questions about the NUT architecture

Eric S. Raymond esr at thyrsus.com
Tue May 29 20:07:45 UTC 2007


Charles Lepple <clepple at gmail.com>:
> Can you elaborate a little on how this SIGPWR shutdown works?

Sure.  First, underlying hardware detects that power is going
down. How this is done depends -- normally your motherboard has some
sort of threshold detector that picks up on a voltage drop from the
power supply and raises a hardware-level interrupt (often a pin assert
on the processor).

(See, for example, the "PCI Local Bus Specification" and  
"PCI Bus Power Management Interface Specification" at http://pcisig.org.
Also see http://www.national.com/pf/LM/LMC6953.html for a description of 
a typical voltage supervisor chip conformant with PCI 2.1.)

> Specifically, what sends the signal, and what happens when it is
> received?

Your kernel catches the hardware-level interrupt.  It shotguns SIGPWR
at every process on the system. What happens then is dependent on 
whether there are explicit SIGPWR handlers in the apps; the default
includes performing the equivalent of a sync on each file descriptor 
the process has open.  

The sync calls tell the filesystems beneath the VFS layer that
somebody thinks core buffers are about to fall down go boom.  Ext3,
and other modern filesystems, immediately journals out uncompleted
I/O requests.

At some point before the supplied voltage drops below usable, your
hard drives notice this, go "TILT!" and park their heads.

You have a system that responds safely to power outage when the
amount of time required to resolve uncompleted I/O is less than
the interval between SIGPWR and when the heads get parked.

What's different from a decade ago is three things:

1) At the front end, voltage-supervisor chips are now ubiquitous 
even on low-end PCs -- thank PCI for this, it wasn't true under
older buses.  So you can count on SIGPWR to actually get raised,
which wasn't true under ISA or EISA.

2) The amount of processing cycles modern systems have in the critical
interval is an order of magnitude higher than it used to be.  So is
the volume of bits they can write.  So, in the computer's own terms,
the critical interval is much longer than it used to be (and getting
longer).

3) At the back end, journaling filesystems make it feasible to write
the buffers in a known-good state during the critical interval.

Result:  You can pull the power plug out of the socket repeatedly on a modern 
Linux or Solaris system without risking your data.  UPSes are still useful,
of course, but they don't really do a cleaner or more reliable job of
shutdown than the OS itself any more.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



More information about the Nut-upsdev mailing list