[Piuparts-devel] idle sleep times

Mon Jun 18 02:51:56 UTC 2012

On Sun, Jun 17, 2012 at 4:56 PM, Andreas Beckmann <debian at abeckmann.de> wrote:
>
> I think we need to distinguish "idle" timeouts and "error" timeouts ...
>

Consider that the discriminator on timeouts may be whether the slave
has work to upload or not. 'idle' and 'error' are both cheap. As a
test, I reduced the slave idle time to a very low level, 6 seconds. It
worked surprisingly well. Most slaves get 'master idle' and 'busy
responses when under light load (during 'busy', one slave is keeping
the master busy most of the time). The downside is that the last slave
with work can have problems getting access (and there is a master
spinning most of the time until the queue is clear).

Slaves with logs ready should retry more often.

> ... "there is currently nothing to do
> and I will recompute the package state only in $seconds, so come back
> later (unless you have something important to say like finished logs)"
>

Package state caching would be a good thing. The problem of the
spinning master goes away. It would be no big deal for the slaves to
query much more often. The version I submitted last year used another
directory to store empty logs representing packages available for
test.

>
> 12 hours disabled section - needs manual config update
> 15 minutes on error - hopefully transient, retry frequently
> 1-3 minutes busy - that will be cleared soon
> 60 minutes idle - don't recompute too frequently
>

Given package caching and master idle support, I would say that these
numbers are higher than they need to be.

A related topic - this discussion is essentially about optimizing test
resumption. There is a corresponding issue with slowdowns on test
sections that are nearly complete. When the number of packages left to
test is less than max-reserved * (num_slaves - 1), not all slaves
participate in clearing the queue. It eventually gets to the point
where only a single slave is working the section, even if there are
enough packages left to distribute the load. In testing
retest-manager, this slowdown effect is more significant than the
startup delays. Maybe master someday should adjust max-reserved based
on num_slaves and packages remaining. In the meantime, mitigation
consists of reducing max-reserved (which package caching would
facilitate).

This matters, because retest pruning is largely confined to a single
priority/section. Retesting should be as well.