[Aptitude-devel] Designing a new feature: storing package histories and more
Daniel Burrows
dburrows at debian.org
Fri Dec 19 07:02:43 UTC 2008
So, I've decided that it's time to knock off a pile of related and
long-standing feature requests (with the flimsy excuse that the GTK+
GUI will make it easier to do, but mostly because they should make
user support a lot easier. ;-) )
I want aptitude to store and log the history of actions the user has
performed on each package. Having this data available solves several
goals at once:
(1) It gives users an easy way to find out how *and why* a package
got on their system (the "why" will be stored in the log too).
(2) It provides a way to roll back an entire pile of changes to
package states at once. (within the limits of the apt system,
of course) In particular, this should be a much nicer way to
back out something like a "build-dep" request. (this is not
cross-session undo, mainly because transparent cross-session
undo isn't possible due to the nature of package management,
but it should solve many of the same problems)
(3) It provides a mechanism for system administrators to audit the
history of a package and attach (timestamped) notes to it.
(4) It allows us to preserve undo information across a su-to-root
operation.
(5) It should allow us to easily implement "redo", something
aptitude has always lacked.
Collecting this data is actually fairly easy; the thing that always
seemed hard when I looked at it before was figuring out where to store
it and how to efficiently retrieve it. I don't want to have to parse
length text files on startup, but I also don't want to rely fully on
a binary database that is hard for users to inspect, opaque, etc.
My current thought is that we can use a two-pronged approach: store the
history *both* in a rotating series of text logs in /var/log, *and* in
a binary database (probably maintained using the excellent sqlite
library) in /var/cache/aptitude. The text logs will be machine-parseable,
and if the database is missing, aptitude will recreate it by scanning
the log files.
Some interesting questions:
(1) Should we log everything, or only actions that permanently change
the system? My current thought is that we should log all the
package-state changes the user performs; sometimes I might want
to see which package I started to install before I changed my mind.
(2) How should this interact with "undo"? Should "undo" destroy
history entries or create new ones? I'm undecided on this, but
I'm leaning towards creating new history entries for the same
reason that I gave in (1). We might want to mark the "undo"
history entries specially, though, so that we can properly
restore the undo stack after invoking a subprocess.
(3) Should we use the current "undo" implementation at all for
packages? I want to throw it all away and purely base undo/redo
on the history stack. Having two parallel systems that do
almost the same thing, but not quite, seems like it's asking for
trouble. The obvious implementation of undo / redo requires us
to scan the history list to find the next action; I think this
should be fast enough, but we can easily augment the list with
some pointers to restore constant-time behavior if it becomes a
problem.
(4) Should the history cache replace pkgstates? I think probably
not: we still need to know what the "last seen" state of each
package was, and that doesn't necessarily correspond to a log
entry. In fact, this feature of pkgstates will become more
critical than ever, since we need it to detect changes to the
dpkg state of a package by other frontends.
(5) Should the history log include the details of resolver
interactions? Should there be a different log along the same
lines to do this? Currently I'm thinking that the answers are
"no" and "maybe". This seems like too much detail for the log
(I don't think the individual steps in the process of resolving
dependencies are very interesting to look back at), but it might
be useful to store this information per-session so the user can
examine what they just did. In fact, something like this is
already stored in the resolver itself; we'd just have to expose
it to the user. On the other hand, I'm having trouble coming up
with a really compelling advantage to doing this (other than
the cool factor).
(6) Should we write the history to the log file and cache
immediately, or defer writing it until we save the other state?
I kind of like the idea of writing history out immediately, but
there will be some technical hurdles to deal with (e.g.,
remembering whether we've actually written out the new package
states, so if the program is killed we don't think that someone
else reset the states -- or maybe that's a reasonable enough
failure mode that it's not worth fixing).
(7) What do we do about old history? For log files logrotate is
a reasonable approach (we could even support loading in all the
log files that meet a certain pattern). But what about the
cache? On the one hand, we shouldn't arbitrarily throw away user
data; on the other hand, users probably don't want their disk
to fill up. :-) I'm not sure what the policy should be, but
aptitude should provide some mechanism for cleaning the cache
out in the form of a "clean-history-cache" command that drops
old entries from the history cache (with a cutoff date). The
Debian package can set the default policy with a file in
/etc/apt/apt.conf.d; probably a reasonable place to start is
"throw away log entries that the default logrotate configuration
would delete".
My plan for implementation is as follows:
(1) Go on vacation for the Christmas holiday.
(2) Afterwards, define a data model for the history entries. I
committed an initial cut of this in
src/generic/apt/history/history_entry.h, but it's quite
incomplete and unlikely to be the last word. It's tempting
to do this slowly with the excuse that "it's design work and
I want to do it right", but I suspect that it's better to
just put *something* together and see where it's lacking
once it's in use.
(3) Write code to convert the history entries to and from text
(should be easy enough to do once the data model is set).
(4) Write unit tests for the code in (3).
(5) Write code to compute the history list for the current session
and store it in a member of aptitude's cache wrapper. This will
largely be a copy-and-paste of the "undo" code.
(6) Write code to implement undo/redo on top of the history list.
(7) Replace the current undo code for packages with the new undo /
redo code.
(8) Fix a bunch of the bugs I wrote in steps 2-7.
(9) Write a GUI viewer for the history list, and code to show the
history of a particular package.
(10) Write code to add the live history list to an sqlite cache
database, and code to retrieve history from the database (in the
viewers of step (9) ). The viewers should show data from the
current session and data from previous sessions, probably with
some indicator of where the current session starts.
(11) Write code to generate log files in /var/log with history
information.
(12) Write code that, if the cache database is missing, reads those
log files to rebuild it.
(13) Implement any goodies that haven't gotten in yet (rollback,
passing history over su-to-root, etc).
I suspect that the file format of the logs and the cache will change
frequently while I'm getting all this ready; afterwards, it might be
worth thinking about how to future-proof everything.
Daniel
More information about the Aptitude-devel
mailing list