[Aptitude-devel] r3217 - in branches/aptitude-0.3/aptitude: . src/generic/problemresolver
Daniel Burrows
dburrows@costa.debian.org
Sun, 01 May 2005 01:32:13 +0000
Author: dburrows
Date: Sun May 1 01:32:10 2005
New Revision: 3217
Modified:
branches/aptitude-0.3/aptitude/ChangeLog
branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
Log:
Describe more of the algorithm..need to flesh this out/make it more comprehensible.
Modified: branches/aptitude-0.3/aptitude/ChangeLog
==============================================================================
--- branches/aptitude-0.3/aptitude/ChangeLog (original)
+++ branches/aptitude-0.3/aptitude/ChangeLog Sun May 1 01:32:10 2005
@@ -1,5 +1,9 @@
2005-04-30 Daniel Burrows <dburrows@debian.org>
+ * src/generic/problemresolver/model.tex:
+
+ Further description of the algorithm.
+
* src/generic/aptcache.cc:
Make packages installed/removed by the resolver get marked as
Modified: branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
==============================================================================
--- branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex (original)
+++ branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex Sun May 1 01:32:10 2005
@@ -86,13 +86,14 @@
might produce an error indicating that the user should find, download,
and install a new version of the \pkg{SDL} graphics library. A new
version of the \pkg{kmail} mail client might require the user to
-upgrade his or her entire operating system, generally indicating this
-fact by a slew of cascading error messages.
+upgrade his or her entire operating system, indicating this fact by a
+slew of cascading error messages.
As a result of this so-called ``dependency hell'', new and more
automated tools, such as \pkg{apt} and \pkg{up2date}, were developed.
These tools maintain a database of software installed on the user's
system, along with software available from any number of remote sites.
+
To install or remove a piece of software, the user issues a request
to, for instance, ``install wesnoth'' or ``upgrade kmail''. The
installation tool will proceed to find a set of package installations
@@ -302,28 +303,39 @@
An \emph{installation} represents a choice about which package
versions to install; it is a function that maps each package to a
-version of that package. An installation $I$ is \emph{consistent} if
-it satisfies each dependency in the world: that is, for all $v ->
-\{v_1',\dots\}$ in $\D$, either $I(\pkgof{v}) \neq v$ or
-$I(\pkgof{v_i'})=v_i'$ for some $i$.
-
-If $I$ is an installation, then $I;p \mapsto v$ is a new installation
-which installs the version $v$ of the package $p$ and leaves all other
-packages unchanged:
+version of that package.
-\begin{equation}
- (I;p \mapsto v)(p') = \begin{cases}
- I(p'), & p' \neq p \\
- v, & p' = p
- \end{cases}
-\end{equation}
+\begin{definition}
+ An installation $I$ installs a version $v$, written $I \installs v$,
+ if $I(\pkgof{v})=v$.
+\end{definition}
+
+\begin{definition}
+ An installation $I$ is \emph{consistent} if it satisfies each
+ dependency in the world: that is, for all $v -> \{v_1',\dots\}$ in
+ $\D$, either $I \not \installs v$ or $I \installs v_i'$ for some
+ $i$.
+\end{definition}
+
+\begin{definition}
+ If $I$ is an installation, then $I;p \mapsto v$ is a new
+ installation which installs the version $v$ of the package $p$ and
+ leaves all other packages unchanged:
+
+ \begin{equation}
+ (I;p \mapsto v)(p') = \begin{cases}
+ I(p'), & p' \neq p \\
+ v, & p' = p
+ \end{cases}
+ \end{equation}
+\end{definition}
\section{The Dependency Resolution Problem}
\subsection{Problem Definition}
-Let $I$ be an inconsistent installation. We would like to find a
-consistent installation that is ``similar to'' $I$. This is the
+Let $I_0$ be an inconsistent installation. We would like to find a
+consistent installation that is ``similar to'' $I_0$. This is the
dependency resolution problem. In a real package system, it
corresponds to a situation in which the user's requests have resulted
in an invalid state (with unsatisfied dependencies or conflicts); the
@@ -345,6 +357,7 @@
solution is inside the user's head.
Thus, the best we can do is to define some criteria for ``goodness''
+ (to prioritize solutions that are more likely to interest the user)
and allow the user to see alternatives to an undesired solution.
\end{note}
@@ -487,11 +500,14 @@
dependencies, and then moving to a different version (possibly after
resolving some dependencies of the intermediate version). The problem
resolver of \pkg{apt}, for instance, sometimes confuses users by
-exhibiting this behavior. To fix this, I propose a simple rule: a
-solution should never modify a package twice. Formally, if the
-original installation was $I_0$, then for any $I$, the installation
-$I'=I;p \mapsto v$ is only a successor of $I$ if $v \neq
-I_0(\pkgof{v})$ and $I(\pkgof{v})=I_0(\pkgof{v})$.
+exhibiting this behavior. To fix this, I enforce a simple rule in
+generating solutions: a solution should never modify a package twice.
+
+\begin{definition}
+ If the original installation was $I_0$, then for any $I$, the
+ installation $I'=I;p \mapsto v$ is a successor of $I$ if $v \neq
+ I_0(\pkgof{v})$ and $I(\pkgof{v})=I_0(\pkgof{v})$.
+\end{definition}
One might wonder whether this approach risks overlooking solutions:
for instance, maybe it is really necessary to ``go in circles'' in
@@ -563,6 +579,7 @@
installation $I_0$, there exists a consistent installation $I_c'$
such that $I_c'$ is a hybrid of $I_0$ and $I_c$, and $I_0
\nsolmany{I_0} I_c'$.
+ \label{thm:succ-sufficient}
\end{theorem}
\begin{proof}
@@ -591,21 +608,181 @@
\subsubsection{Scoring}
-TODO: content
+The second key ingredient of a best-first search is a scheme for
+ordering search nodes, typically by assigning a numerical value to
+each prospective solution. In doing so, we must balance two
+priorities: the desire to find a solution quickly, and the desire to
+find a \emph{good} solution.
+
+The most obvious way to guide the search towards a solution is to
+``reward'' avenues of inquiry that decrease the number of unsatisfied
+dependencies. This is not, of course, guaranteed to produce a
+solution quickly; however, in practice, it seems to be a sufficient
+hint for the algorithm to reach a goal node in under 100 steps.
+Finding ``good'' solutions is somewhat more difficult, not least
+because of the fact that ``good'' is an ill-defined property. The
+experimental implementation of this algorithm in \pkg{aptitude} uses
+the following criteria to assign scores to nodes:
\begin{itemize}
-\item Score/penalty per broken dependency
-\item Score/penalty per ``step''
-\item Score/penalty for each package version
-\item Arbitrary cutoff to avoid exponential blowup
+\item Most importantly, each version of each package is assigned a
+ separate score. The score of a potential solution is based in part
+ on the \emph{difference} between the total score of the package
+ versions it installs and the score of the versions installed by
+ $I_0$.\footnote{In principle, a solution's score is simply the total
+ score of the package versions it installs; $I_0$ is taken to be a
+ baseline for this component of the score because it does not
+ affect the ordering of solutions, and it is much more convenient
+ to calculate.}
+
+ These weights define the priorities of the search. In the current
+ aptitude implementation, as of this writing, the scores of the
+ versions the package $p$ are assigned as follows\footnote{Very minor
+ tweaking is also performed to adjust the score of packages
+ according to their Debian-assigned ``priority'', but this affects
+ the score by no more than 5 points.}:
+
+ \begin{itemize}
+ \item If $I_0(p)$ was explicitly selected by the user, it is
+ assigned $40$ points.
+ \item If $I_0(p)$ was automatically selected, it is assigned $0$
+ points. This has the effect of biasing the algorithm towards
+ modifying packages whose current states were not explicitly
+ requested by the user.
+ \item If $I_0$ does not remove $p$, its removal is assigned $-300$
+ points. If $p$ is an ``essential'' package, such as the C library
+ or the \pkg{dpkg} package manager, its removal is assigned an
+ additional $-100000$ points.
+ \item If $I_0(p)$ is not the current version of the
+ package\footnote{The ``current version'' refers to the version
+ installed on the system; $I_0$ is simply a representation of the
+ package installation and removal commands issued by the user.},
+ then $p$'s current version -- which corresponds to cancelling a
+ user request or automatic decision -- is assigned $-50$ points.
+ \item Installing $p$ (i.e., the ``default'' version of $p$ if $p$ is
+ not currently installed) is assigned $-20$ points.
+ \item Upgrading $p$ (i.e., the ``default'' version of $p$ if $p$ is
+ currently installed at a different version\footnote{This is a
+ slight misnomer, as it includes downgrades to the default
+ version.}) is assigned $-15$ points.
+ \item Installing a version of $p$ other than the ``default'' version
+ (as usual, if $I_0$ does not install this version of $p$) is
+ assigned -40 points.
+ \end{itemize}
+
+\item A penalty is applied to each search node based on its distance
+ from the root of the search. This works to favor ``simpler''
+ solutions and penalize more complex ones.
+
+\item Nodes that resolve all dependencies are given an additional
+ bonus -- usually a relatively minor one. Goal nodes are moved
+ through the priority queue in the normal manner, rather than
+ immediately being returned, in order to ensure that ``better''
+ solutions are produced first -- and to make sure that solutions that
+ are particularly ``bad'' are not produced unless it is absolutely
+ necessary.
\end{itemize}
+Thus, letting $B(I)$ be the set of dependencies that are ``broken''
+(not satisfied) by $I$ and letting $h(v)$ be the score of the version
+$v$, the total score of an installation is
+
+\begin{equation}
+ h(I) = \alpha_B|B(I)| + \alpha_L \idist{I}{I_0} + \alpha_G \delta(0,|B(I)|) + \sum_{p \in \P} h(I(p))
+\end{equation}
+
+where $\alpha_B$, $\alpha_L$, and $\alpha_G$ are weighting factors and
+$\delta$ is the Kronacker delta function (i.e., $\delta(i,j)$ is $1$
+if $i=j$ and $0$ otherwise). In the current implementation,
+$\alpha_B=-100$, $\alpha_L=-10$, and $\alpha_G=50$.
+
+\section{Avoiding Exponential Blowup}
+
+The algorithm laid out above is sufficient to solve many of the
+dependency problems that are encountered in practice. However, some
+problems still cause the search to take an unacceptable amount of time
+to reach a solution. The problems observed fall into two main
+categories:
+
+\begin{enumerate}
+\item Too many reverse dependencies.
+
+ In order to calculate the score of a successor of an installation
+ (and of course to analyze that solution later on) it is necessary to
+ generate the set of dependencies which are not satisfied by that
+ successor. However, there are some one hundred thousand
+ dependencies in the Debian archive; so that it completes in a
+ reasonable amount of time, the current implementation uses the
+ obvious optimization of only testing those dependencies which either
+ were previously broken, which impinge on the package version being
+ removed, or which impinge on the package version being
+ installed.\footnote{Recall that a successor to $I$ will install
+ version $v$ of $p$, removing $I(p)$ in the process.}
+
+ Unfortunately, some packages have very many reverse dependencies.
+ For instance, if $I$ removes the system C library, over a thousand
+ dependencies will be unsatisfied -- and simply generating the
+ successors of this node will require time \emph{quadratic} in the
+ number of reverse dependencies of \pkg{libc}; i.e., over a million
+ dependencies will be tested. This can impose a significant
+ performance penalty on the process of successor generation.
+
+\item Removing the bottom of a dependency chain.
+
+ When an important library such as GTK is removed, it is necessary to
+ propagate the removal ``up the dependency tree''. However, the
+ search technique outlined above will search exponentially many
+ installations before coming to this solution. Aside from the goal
+ node of ``keep the library on the system'', the first step of the
+ search will enqueue one node for each package depending on GTK; each
+ node will remove its corresponding package. As these nodes are
+ processed, pairs of packages will be removed from the system; then
+ triples, and so on, until the full power set of all packages
+ depending (directly or indirectly) on GTK is generated. Worse, at
+ each step, solutions that suggest installing GTK (and removing many
+ packages) will be generated.
+\end{enumerate}
+
+The problem of ``too many reverse dependencies'' is relatively easily
+solved. Instead of generating sucecssors for every dependency, it is
+sufficient to generate successors for a single dependency (this is
+evident from the proof of Theorem~\ref{thm:succ-sufficient}; a
+stronger condition will be demonstrated explicitly below). In theory,
+this practice could lead to somewhat less optimal ordering of
+generated solutions, but the decrease in the problem's branching
+factor is well worth it.
+
+The second problem is somewhat trickier, but it can be resolved by
+restructuring the search space. To each solution node $I$, attach a
+set $I_F$ of \emph{forbidden} versions; the successors of $I$ are
+restricted to those which do not install any version in $I_F$. For
+all successors $I'$ of $I$, $I'_F \subseteq I_F$; furthermore, if a
+successor $I'$ of $I$ is generated by removing the source version of a
+dependency, then all of the targets of that dependency are members of
+$I'_F$. This has the effect of forcing the algorithm to ``stick
+with'' a decision to forgo installing the targets of a dependency in
+favor of shifting the source.
+
+In combination with the tracking of forbidden versions, the algorithm
+is also modified to detect \emph{essential conflicts}: that is,
+dependencies which cannot be resolved due to the constraints placed on
+the generation of successors. If $d$ is a dependency not satisfied by
+$I$, and if every package version which could resolve a dependency is
+illegal -- either because of the ``only modify it once'' rule or
+because it is a member of $I_F$ -- then $d$ is deemed to be an
+essential conflict and $I$ is discarded without further processing.
+
+Of course, it is important to verify that cutting off wide swathes of
+the search space in this manner does not impede our ability to
+generate solutions:
+
+\begin{theorem}
+ TODO.
+\end{theorem}
+
\section{Implementation}
A prototype implementation exists in the experimental branch of
-\pkg{aptitude}, but it is not currently used as the main problem
-resolver of the program and as a result has not been widely tested.
-Work is ongoing to connect this prototype to the main user interface
-of \pkg{aptitude}.
+\pkg{aptitude}.
\end{document}