[Piuparts-devel] _throttle_if_overloaded stability

Andreas Beckmann anbe at debian.org
Mon Feb 25 11:27:06 UTC 2013


On 2013-02-25 04:32, Dave Steele wrote:
> On Sun, Feb 24, 2013 at 9:56 PM, Andreas Beckmann <anbe at debian.org> wrote:
>> On 2013-02-25 02:45, Dave Steele wrote:
>>> This has the potential to oscillate in the presence of multiple
>>
>> possible, but I see no solution without having the slaves communicate
>> with each other (or have a single but multithreaded slave)
>> and maybe I have seen such cases
>> but usually for a longer high-load period one slave seems to go on a
>> longer pause

which is actually a thing I'd prefer: pause a constant number of slaves
for a longer time instead of having all hop on and off

> It should be possible to have a fairly stable setup without having the
> slaves communicate.

what do you consider "stable" on a system with a nondeterministic
foreground load?

>>> slaves. I'd recommend reducing the secs range to under the loadavg
>>> time constant (say to 30-60 seconds, 15 to 45 would be better), and
>>> skipping the delay accumulation.
>>
>> wouldn't that make the situation worse? as many more slaves will hit the
>> short time slot where load is low enough for resume
>>
> 
> So if the load is low because of all of the now-paused slaves, they
> will start to kick back in earlier due to the higher checking rate,
> before the avg slowly sinks even lower. That is a good thing.

but once load dropped \epsilon below resume_threshold *all* slaves will
restart within a checking interval maybe without load being adjusted
inbetween, so load jumps to resume_threshold - \epsilon + k resumed slaves

anyway, I was not trying to optimize idle time utilization by the slaves
but to keep the system sane if there is something else that should get
priority over the slaves

> All other things being equal, increasing the sampling rate will add
> stability, up to the Nyquist rate (about 30 seconds here).
> 
> Also, the -1 in the load_resume calculation is adding to any
> instability there is (i.e. it takes a thundering herd of slaves to
> bring the average from the lower load_reserve back up to load_max).

so far this has been working quite nicely for me ... requiring much less
manual control on the number of slaves

Hmm, that might actually be an interesting online scheduling problem for
my colleagues :-)


Andreas



More information about the Piuparts-devel mailing list