[Aptitude-devel] Designing a new feature: storing package histories and more

Fri Dec 19 07:02:43 UTC 2008

  So, I've decided that it's time to knock off a pile of related and
long-standing feature requests (with the flimsy excuse that the GTK+
GUI will make it easier to do, but mostly because they should make
user support a lot easier. ;-) )

  I want aptitude to store and log the history of actions the user has
performed on each package.  Having this data available solves several
goals at once:

   (1) It gives users an easy way to find out how *and why* a package
       got on their system (the "why" will be stored in the log too).
   (2) It provides a way to roll back an entire pile of changes to
       package states at once.  (within the limits of the apt system,
       of course)  In particular, this should be a much nicer way to
       back out something like a "build-dep" request.  (this is not
       cross-session undo, mainly because transparent cross-session
       undo isn't possible due to the nature of package management,
       but it should solve many of the same problems)
   (3) It provides a mechanism for system administrators to audit the
       history of a package and attach (timestamped) notes to it.
   (4) It allows us to preserve undo information across a su-to-root
       operation.
   (5) It should allow us to easily implement "redo", something
       aptitude has always lacked.

  Collecting this data is actually fairly easy; the thing that always
seemed hard when I looked at it before was figuring out where to store
it and how to efficiently retrieve it.  I don't want to have to parse
length text files on startup, but I also don't want to rely fully on
a binary database that is hard for users to inspect, opaque, etc.
My current thought is that we can use a two-pronged approach: store the
history *both* in a rotating series of text logs in /var/log, *and* in
a binary database (probably maintained using the excellent sqlite
library) in /var/cache/aptitude.  The text logs will be machine-parseable,
and if the database is missing, aptitude will recreate it by scanning
the log files.

  Some interesting questions:

   (1) Should we log everything, or only actions that permanently change
       the system?  My current thought is that we should log all the
       package-state changes the user performs; sometimes I might want
       to see which package I started to install before I changed my mind.
   (2) How should this interact with "undo"?  Should "undo" destroy
       history entries or create new ones?  I'm undecided on this, but
       I'm leaning towards creating new history entries for the same
       reason that I gave in (1).  We might want to mark the "undo"
       history entries specially, though, so that we can properly
       restore the undo stack after invoking a subprocess.
   (3) Should we use the current "undo" implementation at all for
       packages?  I want to throw it all away and purely base undo/redo
       on the history stack.  Having two parallel systems that do
       almost the same thing, but not quite, seems like it's asking for
       trouble.  The obvious implementation of undo / redo requires us
       to scan the history list to find the next action; I think this
       should be fast enough, but we can easily augment the list with
       some pointers to restore constant-time behavior if it becomes a
       problem.
   (4) Should the history cache replace pkgstates?  I think probably
       not: we still need to know what the "last seen" state of each
       package was, and that doesn't necessarily correspond to a log
       entry.  In fact, this feature of pkgstates will become more
       critical than ever, since we need it to detect changes to the
       dpkg state of a package by other frontends.
   (5) Should the history log include the details of resolver
       interactions?  Should there be a different log along the same
       lines to do this?  Currently I'm thinking that the answers are
       "no" and "maybe".  This seems like too much detail for the log
       (I don't think the individual steps in the process of resolving
       dependencies are very interesting to look back at), but it might
       be useful to store this information per-session so the user can
       examine what they just did.  In fact, something like this is
       already stored in the resolver itself; we'd just have to expose
       it to the user.  On the other hand, I'm having trouble coming up
       with a really compelling advantage to doing this (other than
       the cool factor).
   (6) Should we write the history to the log file and cache
       immediately, or defer writing it until we save the other state?
       I kind of like the idea of writing history out immediately, but
       there will be some technical hurdles to deal with (e.g.,
       remembering whether we've actually written out the new package
       states, so if the program is killed we don't think that someone
       else reset the states -- or maybe that's a reasonable enough
       failure mode that it's not worth fixing).
   (7) What do we do about old history?  For log files logrotate is
       a reasonable approach (we could even support loading in all the
       log files that meet a certain pattern).  But what about the
       cache?  On the one hand, we shouldn't arbitrarily throw away user
       data; on the other hand, users probably don't want their disk
       to fill up. :-)  I'm not sure what the policy should be, but
       aptitude should provide some mechanism for cleaning the cache
       out in the form of a "clean-history-cache" command that drops
       old entries from the history cache (with a cutoff date).  The
       Debian package can set the default policy with a file in
       /etc/apt/apt.conf.d; probably a reasonable place to start is
       "throw away log entries that the default logrotate configuration
       would delete".

  My plan for implementation is as follows:

   (1) Go on vacation for the Christmas holiday.
   (2) Afterwards, define a data model for the history entries.  I
       committed an initial cut of this in
       src/generic/apt/history/history_entry.h, but it's quite
       incomplete and unlikely to be the last word.  It's tempting
       to do this slowly with the excuse that "it's design work and
       I want to do it right", but I suspect that it's better to
       just put *something* together and see where it's lacking
       once it's in use.
   (3) Write code to convert the history entries to and from text
       (should be easy enough to do once the data model is set).
   (4) Write unit tests for the code in (3).
   (5) Write code to compute the history list for the current session
       and store it in a member of aptitude's cache wrapper.  This will
       largely be a copy-and-paste of the "undo" code.
   (6) Write code to implement undo/redo on top of the history list.
   (7) Replace the current undo code for packages with the new undo /
       redo code.
   (8) Fix a bunch of the bugs I wrote in steps 2-7.
   (9) Write a GUI viewer for the history list, and code to show the
       history of a particular package.
  (10) Write code to add the live history list to an sqlite cache
       database, and code to retrieve history from the database (in the
       viewers of step (9) ).  The viewers should show data from the
       current session and data from previous sessions, probably with
       some indicator of where the current session starts.
  (11) Write code to generate log files in /var/log with history
       information.
  (12) Write code that, if the cache database is missing, reads those
       log files to rebuild it.
  (13) Implement any goodies that haven't gotten in yet (rollback,
       passing history over su-to-root, etc).

  I suspect that the file format of the logs and the cache will change
frequently while I'm getting all this ready; afterwards, it might be
worth thinking about how to future-proof everything.

  Daniel