[Aptitude-devel] Resolver status.
Daniel Burrows
dburrows at debian.org
Sun May 10 17:22:51 UTC 2009
On Mon, Apr 20, 2009 at 08:09:32AM -0700, Daniel Burrows <dburrows at debian.org> was heard to say:
> The first one is a very standard optimization in constraint
> satisfaction. Currently, at every step, aptitude examines each
> dependency that it still has to solve, computes the versions that will
> solve that dependency, and examines each one to see if it's a legal
> solution and not known to be a dead-end. This is expensive, and it
> gets more expensive the larger the search gets (larger searches
> generally have more broken dependencies at any given step).
>
> The solution to this is to precompute and cache the list of possible
> solutions to each dependency. In fact, you go a step farther:
> precompute and cache the list of *possible versions of each package*.
> This requires generating more search nodes, since you have to account
> for the fact that a dependency could be solved by package A or package
> B. But the benefit is that if two dependencies require non-overlapping
> versions of a package, this allows you to detect it much earlier and
> throw those inconsistent branches of the search out.
I've been looking into this. After looking at the practical
consequences, I don't think that generating extra search nodes for each
package a dependency could install is a good idea. Occasionally
trimming the search tree isn't worth all those extra nodes. (I can't
say this for sure without implementing it, of course -- if this was an
AI research project I might do that to see if it worked) I do have a
plan for catching this sort of situation early in some restricted but
common cases, though. Basically: once a dependency is reduced to being
solved by a single package, we can check its consistency forward against
other dependencies that are also reduced to that package. This is a
common situation and I expect it to be a flat-out win as far as early
detection of unsolvable dependencies goes.
I also noticed my side note about triggers. This is something of a
tie-in to some ideas I had last month about using a more general
constraint language to express the rules the resolver uses internally to
guide its search. This would give us some more internal separation
between different bits of logic and an easy way to hook in new
constraints and promotions, at the expense of making things more complex
and maybe making optimization harder. Now that I have a plan of attack
on this problem, I'm not sure that this will actually work; it would
have to be quite sophisticated to represent all the book-keeping that's
needed, and probably more work than it's worth.
One important thing to remember is that aptitude actually does a
decent job of avoiding exponential blowup right now. The most complex
cases I could find while trying to stress-test it recently generated
something on the order of several thousand search steps -- not bad
considering that there were hundreds of unsatisfied broken dependencies.
The problem that I'm trying to solve right now is that for a mere few
thousand search steps, aptitude can take up to several minutes! That's
not acceptable, and I think it can be fixed with some book-keeping to
not do the same work over and over again.
Here's the implementation plan that I have so far. These are just
some notes I typed up -- I have a few changes at the end under
"outstanding questions" that I haven't merged into the rest of the text
yet. (the section title should really be "questions that were
outstanding when I got here, but that I have now answered to my
satisfaction")
----- cut here -----
MemoizingSolvers
The new representation of steps needs to support the following actions:
* Finding the dependency with the fewest solvers.
* When a version is chosen, dropping any dependencies that it solves
from the set of outstanding dependencies.
* When a version is chosen, inserting any dependencies that it breaks
into the set of outstanding dependencies, but not resetting solver
lists if the dependency was already broken.
* When a version is chosen, striking versions that it structurally rules
out from dependency solver lists.
* Noting the tier of each solver of a dependency, and updating this
incrementally as new promotions are posted. If all the solvers reach
a higher tier than the current search tier, the step should be
deferred to that tier. If a version triggers a promotion to the
conflict tier, it should be struck from solver lists in that solution.
To do this, we should:
* For each step, store unresolved dependencies in a map where the key is
the dependency and the attached value is a pair containing a set of
the solvers (each solver paired with a reason for its tier) and a
set/list of the structurally forcing reasons for the dependency (i.e.,
justifications for why solvers were dropped).
* For each step, store a set containing a pair (num_solvers, dep) for
each unresolved dependency, sorted by the number of solvers and then
the dependency.
* For each step, store a map that takes each version to a *list*
(immutable as usual) of the dependencies that it solves. This list
can contain dependencies that are already solved; I think it's cheaper
to leave them there than to pull them out immediately.
* For each step, store a set of forbidden versions.
* When a version is chosen,
1. Find the list of solved dependencies and drop each one from the map
of unresolved dependencies, and from the set sorted by the number
of solvers.
2. Drop the version from the reverse index of versions to solved
dependencies.
3. For any structurally forbidden versions (other versions of the
package, solvers of the dep if it was a from-source choice), locate
those versions in the reverse index, strike them from the
corresponding solver lists (if the dependencies still exist), add
forcing reasons to the corresponding force list, and drop them from
the reverse index.
NOTE: when striking a version from a solver list, we have to
remove the dependency from the by-number set and add it
back.
4. Add solvers of the dep, if it was a from-source choice, to the set
of forbidden versions.
5. Find newly unsatisfied dependencies (reverse deps of the old
version, forward deps of the new version). For each one that's
not in the unresolved map, add it to the unresolved map with its
solvers paired with their intrinsic tiers. Structurally forbidden
solvers are not added to the solvers list; instead, they are used
to create the initial list of forcing reasons. Also, insert the
dependency into the by-num-solvers list and insert it into the
reverse index for each of its solvers.
6. Find incipient promotions for the new solution: that is, promotions
that are one install away from being contained in the solution. (an
easy tweak to the current find-promotion algorithm)
7. For each version in the reverse index, if that version completes an
incipient promotion, then apply that promotion to each dependency
in its solvers list. Drop missing dependencies as they are
encountered (optimization). Applying a promotion will either
increase the tier of a solver, or strike a solver; in either case
the change is applied to every dependency in the reverse index
entry for that version.
NOTE: steps 6 and 7 can probably be efficiently combined into a
single operation on a promotion_set
(for_each_incipient_promotion_containing).
NOTE: when striking a version from a solver list, we have to
remove the dependency from the by-number set and add it
back.
NOTE: this is perhaps a bit inefficient; would it be better to do
this only for new versions, and to have a separate path for
handling promotions inserted since the step was created?
* Solutions that have an empty solver list for any dependency are
dead-ends and don't need to be enqueued. Solutions with any singleton
solver lists can be forced immediately. Otherwise, the dependency
with the fewest solvers should be taken as the next branch point.
Outstanding questions:
* Should the handling of new promotions be more incremental? We could
have a global reverse index from versions to a set of the step numbers
containing each version (in their installed list or in a solver), and
that could be used to update the minimal set of steps when a new
promotion is inserted. (note that this could cause a step to become a
dead-end while it's still enqueued -- I'm not sure if it's easy to
yank those dead-ends out of the search queue, but at least we could
generate promotions immediately) (note also that this index should be
updated whenever a version is struck from a solution!)
If this is done, then in steps 6 and 7 above, we can search for
incipient promotions containing the new version plus one version from
the reverse index. As with find_promotion_containing, it might be
better to just pull up the new version's entry in the promotion index
and scan it linearly.
* The by-num-solvers set is a hack. Is there a nice way to store a
priority queue that allows us to quickly remove arbitrary entries and
to quickly increase the priority of an entry? (i.e., decrease the
number of solvers)
The answer is "yes, but not here". We ideally would like a purely
functional structure (like imm::set etc) to avoid excessive copies at
branch points. Purely functional heaps don't support the decreaseKey
operation for obvious reasons (you'd need a pointer to the node, which
is a nonsensical concept for a functional heap). Removing and
reinserting an entry in a set is probably about as good as you can
get.
* Is it possible to quickly detect that two dependencies have
nonoverlapping solver domains for a single package? (I doubt it's
feasible for more than one package)
We could store a map from packages that are *known* to exist in the
output (because some dependency's solvers have been reduced to only
that package), to the set of allowed versions (with a reference to an
arbitrary dependency that will be blamed if a conflict happens). When
a version is struck from a dependency, if the dependency now has only
one package, we intersect its versions with the versions in the map
(or add them if the map had no versions); if the dependency already
had only one package, we drop that version from the list of valid
versions of the package. And of course, any versions that are dropped
from the list of valid versions of the package should be declared
invalid and struck from any solvers list that contains them.
----- cut here -----
Daniel
More information about the Aptitude-devel
mailing list