[Aptitude-devel] Design notes on producing better (or at least more predictable) resolver results.

Mon Mar 2 05:28:44 UTC 2009

  OK, the promotions set is more or less done (except that I have some
new ideas for enhancing it that I might put into practice).  I've also
written a choice_set, which generalizes the "conflictor" stuff in the
current resolver.  Even without tiers, I expect to get a noticable speed
boost as soon as I turn this on:

  (a) The data structures and algorithms have been optimized at the
      "low level" relative to the current resolver (e.g., when finding
      promotions we only reset hit counts on members of the promotion
      set's index that we actually visited; IIRC that routine is called
      millions of times in a moderately large search, so this actually
      could make a difference).

  (b) Although I haven't gone over logs enough to be sure of this, I
      believe that just collecting conflict-like information about
      deferrals will help safe-upgrade out quite a bit.

  (c) Incorporating broken deps into the conflict framework opens the
      possibility of using them to accumulate frameworks.  For instance:
      we could treat "leave this dependency broken" similarly to
      "install a different version of the source of this dependency",
      forbidding the resolver from installing a version that would have
      solved the dependency.  That should get a nice cut-down in the
      branching factor of searches that involve soft deps.

  I was going to finish the new resolver code this weekend, but life was
a little too busy to allow that.  I think I've worked out the last few
tricky questions about the mechanics and logical grounding of generating
promotions, though, so the path ahead should be clear.

  On that note: here's how it works.  I'll start by describing the
current state of affairs:

  For each dependency that the resolver looks at, we can define a set of
"reasons" that the dependency came to the resolver's attention.  If the
source of the dependency was not initially installed, it must have been
installed in a previous resolver step.  If any of the potential targets
of the dependency were installed, each of them must have been installed
at a non-resolving version (otherwise the dependency would have been
resolved already).  (i.e., if the package depends on "libgtk2.0 >=
2.6.0", installing libgtk2.0 version "2.4.0" would do the trick)  In
pseudo-math, we need:

  (DepSourceInstalled /\ InstalledTarget1Removed /\ ...)

  Beyond that, there may be other decisions that were made in the
current solution that knock out other potential resolutions to the
dependency, or those resolutions might themselves contain conflicts.
These are accumulated as part of the "reason" for whatever happens to
the solution.  Finally, we check whether every resolution was knocked
out by one or another reason.  If so, we declare this solution to be
a dead-end and store that "reason" as a new conflict.

  In the new promotion code, things will work the same way, except that
instead of just checking whether each resolution is allowed or
forbidden, we check what its tier is.  Of course: if it's forbidden by a
previous choice, we still skip it and add the choice to the list of
reasons.  But if it isn't forbidden by the "rules of the game"
(monotonicity and not reversing broken-dep decisions), we instead
compute its *tier*.  The tier of each successor is computed both from
the contents of the solution and by examining existing promotions (in
the latter case, we remember the "reasons" for the highest promotion),
and the lowest tier of any successor solution will be used as the tier
of the promotion (since any successor generated in this "state" will
have that tier).  Of course, a promotion should only be generated if the
promotion actually increases the tier of the solution!

  There are also a few simple optimizations I'd like to work in:

  The current conflict code actually stores this as the reason for a
conflict:

  (DepSourceInstalled /\ (Target1Pkg := ThisSolnTarget1Ver) /\ ...)

  i.e., it uses the *specific version* of each package.  That makes the
conflict less general than it could be.  When it comes to "making the
dependency broken", *any* version of a solver other than one that
fulfills the dependency will break that dependency.  Each solver that's
initially installed must have one non-solving version installed.  So we
can immediately expand the coverage of each conflict / promotion by
adding a new choice type, "install a version of P that doesn't satisfy
D" where P is a package and D is a dependency.

  A second opportunity to generate more information has to do with
minimizing promotions.  Aside from the minimum set of choices needed for
a dependency to be "broken", we could potentially have different tiers
for different combinations of solvers.  For instance, maybe installing A
and B together puts us at tier 50, but installing just A puts us at tier
30.  If the current solution has a tier lower than 30, it might be
useful to store the promotion involving A as well: smaller promotions
are always more useful, because they can "hit" more solutions.  This can
be done by storing the tier and associated "reasons" of each resolution
to the dependency.  We can sort the tiers that were "blocked out" by
previous decisions by tier, and try to make smaller and smaller
promotions by throwing out actions from the top down, making sure not to
throw out actions that caused the dependency to be broken, and making
sure to note when an action that blocked out a higher tier also blocked
out a lower tier.

  (you could easily go overboard with this; e.g., generating all
the minimal promotions from the given solution.  But generating *larger*
promotions, even if they have a higher score, seems unlikely to be
useful (we can worry about that if we ever actually encounter those
situations in the search), whereas a smaller promotion might allow us to
glean information that we wouldn't otherwise have ever noticed in the
course of searching)

  Those two are fairly specific.  I also have a few speculative ideas
about somehow propagating tier / conflict information "backwards" in the
search graph.  Basically, it might be that all the children of a search
node ended up being dead-ends (or promoted to tier X, etc).  Right now,
we only ever detect this information if they were all immediately
detected to be dead-ends.  An alternative strategy would be to somehow
link the children to the parent; when all of a search node's children
are "finalized", we can use the tiering information of the children to
generate promotions in the parent (perhaps finalizing it in the
process).  This would work pretty much the same way that I described
above, but it would take place "after the fact".  Also, with tiering,
information might be propagated backwards through the search graph more
than once (i.e., "this is known to be at least tier 2,000; ok, now it's
known to be tier 3,000; ok, turns out it's a conflict").

  If aptitude used a depth-first-search algorithm, this would be
trivial.  Since it doesn't, it will take some footwork and book-keeping
to remember the parent of each search node and somehow tabulate the
ultimate destiny of all the children of a search node.  It's also not
entirely clear how effective this will be; it requires that we
ultimately examine all the children of each search node, after all.
OTOH, remembering parent-child relationships would make it possible to
visualize the search as a tree (which is not so useful for users, but
might yield insights about how to optimize it).

  Daniel