Bug#1036920: another problem class from /usr-merge [Re: Bug#1036920: systemd: please ship a placeholder in /usr/lib/modules-load.d/]
Helmut Grohne
helmut at subdivi.de
Tue May 30 10:53:00 BST 2023
Context for d-devel:
Andreas Beckmann noticed that systemd ships an empty directory
/usr/lib/modules-load.d. When removing a package that ships a file in
/lib/modules-load.d (such as multipath-tools), dpkg may in some
circumstanced delete the empty directory owned by systemd.
On Mon, May 29, 2023 at 07:24:09PM +0100, Luca Boccassi wrote:
> Given what was discussed:
I think the conclusion is drawn too quickly here.
> - bookworm is in hard freeze
> - there is no functional impact
In effect, this bug report is an instance of a bug class. I am in the
process of quantifying its effects, but I do not have useful numbers at
this time. As an initial gauge, I think it is about 2000 binary packages
that ship empty directories (which does not imply them to be affected,
rather this is to be seen as a grossly imprecise upper bound).
> - unmerged-usr paths are no longer supported
Then you argue that this bug would affect only unmerged systems, while
it actually is in reverse. Unmerged systems are unaffected by this bug
class. The deletion that Andreas describes can only happen due to the
aliasing introduced by merging. This bug class only affects merged
systems.
In my earlier reply, I also asked Andreas for a practical impact on
systemd users and suggested lowering the severity of this instance.
However, there is more to consider. This poses a problem to piuparts and
thus testing migration. Making piuparts happy is a use case of its own.
When a mitigation for non-essential adduser broke piuparts (again, I'm
sorry about that), the release team decided that piuparts is an
important piece of the release process and therefore the change was to
be reverted. As a result, apt now depends on adduser in bookworm again.
To be clear, I fully support the decision that has been made here and
thank the release team for dealing with resulting issues (e.g. delayed
migration of other packages). Since the problem we are discussing here
is quite similar, I argue that this problem class also should be
considered release critical in general, because it may impact testing
migration. That being said, IANARM and I therefore leave that judgement
to others.
> - as soon as trixie opens for business we might just canonicalize
> everything (assuming all the ducks will be in a row)
You make this look like a simple way forward. For now, I am unconvinced
that canonicalizing paths is the cure to this problem. To dpkg, a
canonicalization looks like removing a file and adding a different file.
Thus the deletion effect that Andreas reported may kick in while
performing that canonicalization. It probably is not that simple though.
As far as I understand it, dpkg first adds new files and then removes
the old ones thus seeing that the directory it tries to delete is not
empty (and we've seen it issue warnings about that case). To me, this
means that we (or rather I) don't understand the problem well enough to
judge it. It might be harmless, but it might be real. We shouldn't be
scared, but "it probably works" may not be the best approach either.
And then Andreas got me thinking. Before delving into that, I'd like to
again express thanks to Andreas. When we see a bug from Andreas, can we
please start with thanking him? Even if the bug ultimately is due to a
limitation in piuparts (as has happened in the adduser case), his work
(and that of other piuparts people such as Nicolas) still adds a lot of
value to Debian. The occasional report that looks harmless initially
tends to point at real problems more often than not. When he writes a
mail, it is full of detail for looking at the issue. I ask us all to
better appreciate that work. Let me do that now: Thank you Andreas,
Holger and Nicolas!
So let's stack-pop to where he got me thinking. A directory is a
resource that can be shared between packages. Andreas demonstrated that
removing one package may remove such a shared resource still being
needed when another package references it via an aliased path. In
effect, we break dpkg's reference counting of shared resources.
Are there other kinds of resources in dpkg that can be shared like
directories? Thinking... Yes, regular files. How can files be shared?
Via Multi-Arch: same. Can that happen for real? Yes. I've attached an
artificial reproducer. Does it happen in the archive? I really cannot
tell yet. In effect, this is yet another bug class derived from Andreas'
directory-loss bug class. This new file-loss bug class is distinct from
the file-loss bug class that resulted in the moratorium.
Etienne Mollier pointed out that "dpkg --verify" helps with diagnosing
whether unexpected file deletion has happened on a particular system. It
also reports other diagnostics and it does not consider any
--path-excludes that have been configured via /etc/dpkg/dpkg.cfg, so use
the output with a grain of salt. Also reinstalling the affected packages
generally resolves the problem on a particular system.
I wish I could give you numbers. I don't have them. I cannot gauge the
impact of these problems at this time. The problems I described may look
scary. Please don't panic. :) I'm positive that we have a clearer
picture soon. I am sending this now, because we don't hide problems -
even if my (our?) understanding of the problem is incomplete.
Still, this definitely poses two (generic) problems that I have not been
aware of earlier and that we need to consider carefully. I remember
Simon Richter asking us to have more test cases and what we see here in
my opinion confirms Simon's view.
So while I promised to write a proposal for complete plan $soon to some
of you, please have a bit more patience with me due to these news.
Helmut
More information about the Pkg-systemd-maintainers
mailing list