[Aptitude-devel] Resolver status.

Sun May 10 17:22:51 UTC 2009

On Mon, Apr 20, 2009 at 08:09:32AM -0700, Daniel Burrows <dburrows at debian.org> was heard to say:
>   The first one is a very standard optimization in constraint
> satisfaction.  Currently, at every step, aptitude examines each
> dependency that it still has to solve, computes the versions that will
> solve that dependency, and examines each one to see if it's a legal
> solution and not known to be a dead-end.  This is expensive, and it
> gets more expensive the larger the search gets (larger searches
> generally have more broken dependencies at any given step).
> 
>   The solution to this is to precompute and cache the list of possible
> solutions to each dependency.  In fact, you go a step farther:
> precompute and cache the list of *possible versions of each package*.
> This requires generating more search nodes, since you have to account
> for the fact that a dependency could be solved by package A or package
> B.  But the benefit is that if two dependencies require non-overlapping
> versions of a package, this allows you to detect it much earlier and
> throw those inconsistent branches of the search out.

  I've been looking into this.  After looking at the practical
consequences, I don't think that generating extra search nodes for each
package a dependency could install is a good idea.  Occasionally
trimming the search tree isn't worth all those extra nodes.  (I can't
say this for sure without implementing it, of course -- if this was an
AI research project I might do that to see if it worked)  I do have a
plan for catching this sort of situation early in some restricted but
common cases, though.  Basically: once a dependency is reduced to being
solved by a single package, we can check its consistency forward against
other dependencies that are also reduced to that package.  This is a
common situation and I expect it to be a flat-out win as far as early
detection of unsolvable dependencies goes.

  I also noticed my side note about triggers.  This is something of a
tie-in to some ideas I had last month about using a more general
constraint language to express the rules the resolver uses internally to
guide its search.  This would give us some more internal separation
between different bits of logic and an easy way to hook in new
constraints and promotions, at the expense of making things more complex
and maybe making optimization harder.  Now that I have a plan of attack
on this problem, I'm not sure that this will actually work; it would
have to be quite sophisticated to represent all the book-keeping that's
needed, and probably more work than it's worth.

  One important thing to remember is that aptitude actually does a
decent job of avoiding exponential blowup right now.  The most complex
cases I could find while trying to stress-test it recently generated
something on the order of several thousand search steps -- not bad
considering that there were hundreds of unsatisfied broken dependencies.
The problem that I'm trying to solve right now is that for a mere few
thousand search steps, aptitude can take up to several minutes!  That's
not acceptable, and I think it can be fixed with some book-keeping to
not do the same work over and over again.

  Here's the implementation plan that I have so far.  These are just
some notes I typed up -- I have a few changes at the end under
"outstanding questions" that I haven't merged into the rest of the text
yet.  (the section title should really be "questions that were
outstanding when I got here, but that I have now answered to my
satisfaction")

----- cut here -----
MemoizingSolvers

The new representation of steps needs to support the following actions:

* Finding the dependency with the fewest solvers.

* When a version is chosen, dropping any dependencies that it solves
  from the set of outstanding dependencies.

* When a version is chosen, inserting any dependencies that it breaks
  into the set of outstanding dependencies, but not resetting solver
  lists if the dependency was already broken.

* When a version is chosen, striking versions that it structurally rules
  out from dependency solver lists.

* Noting the tier of each solver of a dependency, and updating this
  incrementally as new promotions are posted.  If all the solvers reach
  a higher tier than the current search tier, the step should be
  deferred to that tier.  If a version triggers a promotion to the
  conflict tier, it should be struck from solver lists in that solution.

To do this, we should:

* For each step, store unresolved dependencies in a map where the key is
  the dependency and the attached value is a pair containing a set of
  the solvers (each solver paired with a reason for its tier) and a
  set/list of the structurally forcing reasons for the dependency (i.e.,
  justifications for why solvers were dropped).

* For each step, store a set containing a pair (num_solvers, dep) for
  each unresolved dependency, sorted by the number of solvers and then
  the dependency.

* For each step, store a map that takes each version to a *list*
  (immutable as usual) of the dependencies that it solves.  This list
  can contain dependencies that are already solved; I think it's cheaper
  to leave them there than to pull them out immediately.

* For each step, store a set of forbidden versions.

* When a version is chosen,

  1. Find the list of solved dependencies and drop each one from the map
     of unresolved dependencies, and from the set sorted by the number
     of solvers.

  2. Drop the version from the reverse index of versions to solved
     dependencies.

  3. For any structurally forbidden versions (other versions of the
     package, solvers of the dep if it was a from-source choice), locate
     those versions in the reverse index, strike them from the
     corresponding solver lists (if the dependencies still exist), add
     forcing reasons to the corresponding force list, and drop them from
     the reverse index.

	NOTE: when striking a version from a solver list, we have to
	      remove the dependency from the by-number set and add it
	      back.

  4. Add solvers of the dep, if it was a from-source choice, to the set
     of forbidden versions.

  5. Find newly unsatisfied dependencies (reverse deps of the old
     version, forward deps of the new version).  For each one that's
     not in the unresolved map, add it to the unresolved map with its
     solvers paired with their intrinsic tiers.  Structurally forbidden
     solvers are not added to the solvers list; instead, they are used
     to create the initial list of forcing reasons.  Also, insert the
     dependency into the by-num-solvers list and insert it into the
     reverse index for each of its solvers.

  6. Find incipient promotions for the new solution: that is, promotions
     that are one install away from being contained in the solution.  (an
     easy tweak to the current find-promotion algorithm)

  7. For each version in the reverse index, if that version completes an
     incipient promotion, then apply that promotion to each dependency
     in its solvers list.  Drop missing dependencies as they are
     encountered (optimization).  Applying a promotion will either
     increase the tier of a solver, or strike a solver; in either case
     the change is applied to every dependency in the reverse index
     entry for that version.

      NOTE: steps 6 and 7 can probably be efficiently combined into a
            single operation on a promotion_set
            (for_each_incipient_promotion_containing).

      NOTE: when striking a version from a solver list, we have to
            remove the dependency from the by-number set and add it
	    back.

      NOTE: this is perhaps a bit inefficient; would it be better to do
            this only for new versions, and to have a separate path for
            handling promotions inserted since the step was created?

* Solutions that have an empty solver list for any dependency are
  dead-ends and don't need to be enqueued.  Solutions with any singleton
  solver lists can be forced immediately.  Otherwise, the dependency
  with the fewest solvers should be taken as the next branch point.

Outstanding questions:

* Should the handling of new promotions be more incremental?  We could
  have a global reverse index from versions to a set of the step numbers
  containing each version (in their installed list or in a solver), and
  that could be used to update the minimal set of steps when a new
  promotion is inserted.  (note that this could cause a step to become a
  dead-end while it's still enqueued -- I'm not sure if it's easy to
  yank those dead-ends out of the search queue, but at least we could
  generate promotions immediately)  (note also that this index should be
  updated whenever a version is struck from a solution!)

  If this is done, then in steps 6 and 7 above, we can search for
  incipient promotions containing the new version plus one version from
  the reverse index.  As with find_promotion_containing, it might be
  better to just pull up the new version's entry in the promotion index
  and scan it linearly.

* The by-num-solvers set is a hack.  Is there a nice way to store a
  priority queue that allows us to quickly remove arbitrary entries and
  to quickly increase the priority of an entry?  (i.e., decrease the
  number of solvers)

  The answer is "yes, but not here".  We ideally would like a purely
  functional structure (like imm::set etc) to avoid excessive copies at
  branch points.  Purely functional heaps don't support the decreaseKey
  operation for obvious reasons (you'd need a pointer to the node, which
  is a nonsensical concept for a functional heap).  Removing and
  reinserting an entry in a set is probably about as good as you can
  get.

* Is it possible to quickly detect that two dependencies have
  nonoverlapping solver domains for a single package?  (I doubt it's
  feasible for more than one package)

  We could store a map from packages that are *known* to exist in the
  output (because some dependency's solvers have been reduced to only
  that package), to the set of allowed versions (with a reference to an
  arbitrary dependency that will be blamed if a conflict happens).  When
  a version is struck from a dependency, if the dependency now has only
  one package, we intersect its versions with the versions in the map
  (or add them if the map had no versions); if the dependency already
  had only one package, we drop that version from the list of valid
  versions of the package.  And of course, any versions that are dropped
  from the list of valid versions of the package should be declared
  invalid and struck from any solvers list that contains them.
----- cut here -----

  Daniel