Atlas proposal

Wed Aug 18 00:24:05 UTC 2010

On Wed, 18 Aug 2010, Samuel Thibault wrote:
> Don Armstrong, le Tue 17 Aug 2010 15:13:15 -0700, a écrit :
> > All of these are things that can be detected at run time and
> > appropriate libraries dlopened or codepaths diverged, etc.
> 
> Errr, then you'll need a myriad of libraries/codepaths for all the
> combinations of L1/L2/L3 cache sizes, number of processors, speed,
> etc. etc.

You've got them already; you either select them during build-time, or
you select them during run-time.

> > This answers the wrong half of the question; there's no way to
> > know at build time what precisely the machine is going to be
> > doing.
> 
> In HPC, yes: the machine will just be running atlas.

Unless it's a particularly boring problem, there will be a set of
processes which are using atlas; it's a library, after all, and there
are myraids of different problems which use it to varying degrees.

For example, if I'm using atlas from R, I may be using an R process
for each core using RMPI, each of which is running atlas for only part
of the calculation, so it probably should be using only one core. Or,
I may only be running one R process, and atlas should be using all of
the available cores instead.

> > It's wrong even in HPC unless you tweak the settings of atlas
> > compilation for your particular problem set as well as your
> > hardware and software architecture.
> 
> Err, what example of tuning? The hardware architecture is known: the
> atlas build system is running on it.

Sure, but atlas can't know at build time what the state of the machine
will be when it's running. Optimization isn't just about using all of
the hardware available on a machine maximally by one subset of the
problem; it's about maximizing the throughput of a particular problem,
which may mean that atlas shouldn't use all of the cache, or shouldn't
use as many cores as exist on a particular machine, etc.

> > But all of that is fine; we can't possibly hope to optimize to get the
> > last iota of performance out of a system. We should attempt to provide
> > a reasonable set of optimized binaries (whether that means one or ten
> > is up to the package maintainer),
> 
> The problem is that currently the Atlas build system doesn't have
> any way to do generic optimization, and not agressive L1/L2/L3 cache
> size -related optimizations which will actually make performance
> quite worse whenever running on a machine with a smaller L2 for
> instance.

Oh, I've no doubt that there are serious design issues that need to be
addressed by Atlas upstream, and that we may have to live with a
suboptimal solution. We just should make sure that the packages that
we distribute work as well as we can, and provide good documentation
(or an -auto package?) so that people who need/want an optimized build
can make one.

Don Armstrong

-- 
Nothing is as inevitable as a mistake whose time has come.
 -- Tussman's Law

http://www.donarmstrong.com              http://rzlab.ucr.edu