[Piuparts-devel] Thoughts on masterd

Dave Steele dsteele at gmail.com
Sun Nov 27 21:25:50 UTC 2011


>On Samstag, 26. November 2011, Andreas Beckmann wrote:
>> one master-directory on one master-host, but several concurrent
>> piuparts-master processes may be running on this driven by different
>> slaves ... and all writing to the same log file ==> BUG/RACE!
>
>that, and I really like the master-slave communication to stay simple too.
>instead of several masters master should rather become a daemon, me thinks.

OK, so we have a couple of problems with piuparts-master:
- though there are no obvious signs of problems here, concurrent master
  processes give some of us the willies
- it takes significant time, bandwidth and memory before master can respond
  to the first reserve() call from the slave

The second point is the driver for me. Between Andreas's patch and
mine, we have just added a large multiplier of that time that must
pass before 'unimportant' sections are run (and important sections are
returned to).

I've been thinking about how to go about implementing a fix to this
situation. The goals, as I see it, are:

- minimize overall changes to the software
- greatly improve the performance of piuparts-master, from the
perspective of piuparts-slave
- make it possible for the master process to be exclusive


So here's the idea:

First, acknowledge that piuparts-master's job is to respond to
requests from the slave, quickly and with useful data. Abstract out
responsibilities to update repository status from outside the service.

Next, modify piuparts-master to work exclusively with the filesystem,
under the section directory. The API functions "reserve", "unreserve",
"pass", "fail", "untestable", and "status" are all defined solely by
operations on the underlying filesystem. To support this, a new
directory, say "waiting", would be needed to hold the zero-length
"waiting-to-be-tested" package logs.

I was going to say that package metadata would be cached in the
filesystem for piuparts-master's use, but the slave doesn't really
need this data, and so neither does master. Dropping this information
would limit the states available to "status" command, and to my state
priority logic, to those determinable by the packages subdirectories.
I think that is a worthwhile tradeoff.

At this point you have a master API available to the slave which is
wicked fast, compared to now. As a side effect, it is much less likely
that two master processes of this type would run in parallel, even if
nothing further were done to prevent it. Since the master call becomes
so cheap, it becomes viable to turn the max-reserved value way down,
improving the ability of the slave to return to 'important' sections
more quickly.

At the same time, this new piuparts-master is considerably simpler
than the current code.

Finally, you need a separate process to update the section directories
to represent the current state of the remote repositories. This would
load a representation of the repository in RAM, then add and delete
log files in the directory tree as needed to match (only adding
zero-length logs, to the directories that hold zero-length logs, and
respecting entries in 'reserved'). This guy could run as a cron job.

For cleanliness, a section lock could be used for any operation which
may result in the addition or deletion of a log file. Both
piuparts-master and the cron process would use and respect the lock.
Use pid-checking and a timeout to keep the lock from hanging.

The trade-off of this scheme is a small addition to the typical
latency before slave can see a new submission. This matters mostly to
the guy sitting at the console at piuparts.d.o, who is trying to
quickly see the results of a debugging retest. For this guy, a tool
could be developed to submit the package/log to 'waiting', and
possibly kick off a slave to run it.

At this point, you have:
- an unmodified slave
- a master which is actually simpler
- an architecture with uncompromising support for section and state
testing priority

Based on this, I humbly propose the following short-term roadmap:
- Test and roll-out both Andreas's and my priority patches, as testing
dictates. Both are needed, and work tolerably well under the current
architecture.
- Develop the masterd capability as a CLI, cron-able utility (yes, it
needs a better name)
- add the section lock to both masterd and piuparts-master
- rework piuparts-master to use the filesystem exclusively
- test with mondo slaves and max-reserved=1, then roll out

Thoughts? (and thanks for making it this far)



More information about the Piuparts-devel mailing list