[Aptitude-devel] Bug#1069183: Bug#1069183: aptitude: already running package installs/upgrade get interrupted because of lost dpkg lock

David Kalnischkies david at kalnischkies.de
Wed Apr 17 20:55:21 BST 2024


(disclaimer: I have never used aptitude)

On Wed, Apr 17, 2024 at 03:05:47PM +0200, Christoph Anton Mitterer wrote:
> May very well be an issue in APT or rather dpkg, still, since I always see it
> from aptitude, I report it here. Please re-assign accordingly.
> 
> 
> I'm seeing this since quite some releases and also every now and then in unstable
> (though probably far less there, as I only run my workstation on unstable, but
> all servers on stable).
> 
> When I concurrently upgrade my servers (~60) via aptitude, out of that number
> arround 10 see the already running install/upgrade process suddenly interrupted
> with message like:

libapt has the tendency to produce a confusing order of messages as it
doesn't print its own errors, so they tend to be printed in bulk late in
the process, so let me reorder and explain the parts I can identify:

> Unpacking util-linux-extra (2.38.1-5+deb12u1) over (2.38.1-5+b1) ...
> Setting up util-linux-extra (2.38.1-5+deb12u1) ...
> dpkg: error: dpkg frontend lock was locked by another process with pid 1064194
> Note: removing the lock file is always wrong, can damage the locked area
> and the entire system. See <https://wiki.debian.org/Teams/Dpkg/FAQ#db-lock>.
> E: Sub-process /usr/bin/dpkg returned an error code (2)

I am assuming here that unpacking and setting up of util-linux-extra
worked fine and that dpkg run ended. The dpkg run after that, which
would probably have installed other things failed due to a lock being
held by something else…

> dpkg: error: dpkg frontend lock was locked by another process with pid 1064194
> Note: removing the lock file is always wrong, can damage the locked area
> and the entire system. See <https://wiki.debian.org/Teams/Dpkg/FAQ#db-lock>.
> E: Sub-process dpkg --set-selections returned an error code (2)
> E: Couldn't revert dpkg selection for approved remove/purge after an error was encountered!

This is libapt trying to clean up after the first dpkg error, which
fails given that (re)setting dpkg selections needs the lock, too.

> Scanning processes...
> Scanning processor microcode...
> Scanning linux images...
> Running kernel seems to be up-to-date.
> The processor microcode seems to be up-to-date.
> No services need to be restarted.
> No containers need to be restarted.
> No user sessions are running outdated binaries.
> No VM guests are running outdated hypervisor (qemu) binaries on this host.

This output (that I trimmed slightly) is from needrestart, it uses
an apt hook (dpkg::post-invoke), that is run after libapt is done
talking with all dpkg calls (regardless of the action being a success
or not). The frontend lock is still active for those hooks – but they
can interface with dpkg if they want to. libdvd-pkg e.g. installs
a package it has just build in the same hook for example (but I think
it is the only example of a package doing this in the archive) without
special care. The environment of the scripts called is prepared
accordingly. (That said, I think needrestart is read-only)

(Now aptitude takes over from libapt again and prints the errors libapt
 encountered/produced)

> Processing triggers for man-db (2.11.2-2) ...
> Processing triggers for libc-bin (2.36-9+deb12u4) ...
> Press Return to continue, 'q' followed by Return to quit.

I think aptitude runs 'dpkg --configure -a' automatically if libapt
ended in an error. Interestingly this just runs triggers. libapt calls
dpkg with --no-tiggers all the time, but the last time to avoid running
them needlessly, which supports my theory that it wanted to make other
dpkg calls, but that (--unpack) call failed.


> Unfortunately it doesn't tell the name of pid 1064194 and the offending process
> is typically always already gone by then.

(Maybe report that as a feature request for dpkg to show some info
 about the pid instead of just the number, but that might be hard to
 implement.)


> Could be check_apt from Icinga or could be /usr/share/prometheus-node-exporter-collectors/apt_info.py
> from prometheus-node-exporter-collectors .

I don't know it, but a casual look suggests this is read-only and as
such wouldn't need any locks? I would at least hope so based on the
name "info"…


> But in any case, shouldn't apitude/apt/dpkg just permantenly hold the lock
> once the process has started until it finishes?

That is how it is supposed to be, but I think aptitude was never changed
to make full use of the frontend lock. Probably unrelated to this issue,
but a quick grep on aptitude shows me:
| $ git grep -A 2 -- '->ReleaseLock' src/generic/apt/aptcache.cc
| :1006:      apt_cache_file->ReleaseLock();
| -1007-      bool dpkg_selections_saved = dpkg_selections.save_selections();
| -1008-      if (! apt_cache_file->GainLock())
which is the old pattern of releasing the lock and calling dpkg in the
hopes that nothing grabs it in the meantime, which was the practice
before dpkg gained the frontend lock (these are aptitudes own methods
that wrap _system->Lock() from libapt that does acquire the frontend
and the dpkg lock – and also releases both if told so).

The solution here should be to hold onto the frontend lock for the
entire run and do the lock&unlock dance for compatibility with the dpkg
lock only. _system->LockInner() is part of that and grep has no hits
for it in aptitude.

So, my suspicion is that aptitude doesn't use the frontend lock and is
hence prune to other front ends grabbing the dpkg (and front end) lock
the moment it releases the dpkg lock for dpkg. Hence the two fails and
the run of needrestart takes long enough for the other front end to
finish so that the last dpkg call aptitude makes succeeds again.


Someone who knows aptitude better – or at least has more than a passing
interested in aptitude – should check the code to proof the suspicions
made here (or disprove them of course).


Best regards

David Kalnischkies
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/aptitude-devel/attachments/20240417/3fa0183a/attachment.sig>


More information about the Aptitude-devel mailing list