[Pkg-mailman-hackers] Bug#1082167: mailman3 fails to (re)start: flawed lock cleanup

Michael Paoli Michael.Paoli at berkeley.edu
Thu Sep 19 04:29:24 BST 2024


Package: mailman3
Version: 3.3.8-2~deb12u2
Severity: important

Dear Maintainer,

Justification for Severity: important:
o significantly impacts operations (fails to (re)start)
o not readily apparent to many/most how to work around issue:
  work-around: remove lock links, but that's not at all clear from
  diagnostics/logs (fails with exception without indicating where the
  locks are or that the locks prevented the (re)start)
o can be very cleanly/simply/minimally patched to existing stable
o convergent with fix in upstream in 3.3.10, see:
  upstream commit 168e6f55
  https://gitlab.com/mailman/mailman/-/issues/1174

I'd also urge the fix (patch included) be applied to next feasible
stable point release update.

Likewise for other Debian distributions (e.g. sid/unstable, testing,
LTS oldstable, (ELTS oldoldstable)) same or quite similar patch should
be applied unless/until caught up with upstream >=3.3.10 (which includes
fix).

If mailman3 isn't cleanly stopped (e.g. application or system crash),
lock links are left behind.  Though the code apparently intends to
correct this (with start --force), in quite common circumstances,
this will fail with python then throwing an exception, and
diagnostics/logs also not providing sufficient information for many/most
to be able to determine that workaround is to remove the stale lock
links (or even where they are, or that that's what triggered issue).
The code fails to remove the stale lock links in the case where:
PID exists, but is no longer Mailman's (the ID running the mailman3
process).  This is a rather probable event from PID number
reuse/collisions, e.g. system is booted, mailman3 (or host) isn't
stopped cleanly, lock links remain, system is restarted and there's high
probability of PID collision (same PID number, but no longer Mailman's),
and this triggers this bug.  Even in other scenarios there's quite
non-trivial probability of PID collision from reuse triggering the bug.

PATCH recommended for file:
/usr/lib/python3/dist-packages/mailman/bin/master.py
$ diff -u .master.from_mailman3:3.3.8-2~deb12u2.py master.py
--- .master.from_mailman3:3.3.8-2~deb12u2.py    2023-01-04 07:23:41.000000000 +0000
+++ master.py   2024-09-19 01:21:35.542316427 +0000
@@ -103,8 +103,8 @@
     try:
         os.kill(pid, 0)
         return WatcherState.conflict, lock
-    except ProcessLookupError:
-        # No matching process id.
+    except (ProcessLookupError, PermissionError):
+        # No matching process id or pid not Mailman's.
         return WatcherState.stale_lock, lock
 
 
$ 
I've well tested that the patch fixes the issue for
mailman3 3.3.8-2~deb12u2
In fact it probably applies for all mailman 3.x <3.3.10 that's not been
so patched.  The above patch also corresponds to upstream's fix in
3.3.10 commit 168e6f55:
https://gitlab.com/mailman/mailman/-/commit/168e6f5557c630a733dfc3b51dac47cb6c9c94a9

Much more information is also available on:
https://gitlab.com/mailman/mailman/-/issues/1174
including further analysis and diagnostics, testing and also how to
conveniently reproduce the issue for testing, etc.

-- System Information:
Debian Release: 12.7
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-25-amd64 (SMP w/1 CPU thread; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages mailman3 depends on:
ii  cron [cron-daemon]           3.0pl1-162
ii  dbconfig-sqlite3             2.0.24
ii  debconf [debconf-2.0]        1.5.82
ii  init-system-helpers          1.65.2
ii  logrotate                    3.21.0-1
ii  python3                      3.11.2-1+b1
ii  python3-aiosmtpd             1.4.3-1.1+deb12u1
ii  python3-alembic              1.8.1-2
ii  python3-authheaders          0.15.2-1
ii  python3-authres              1.2.0-3
ii  python3-click                8.1.3-2
ii  python3-dateutil             2.8.2-2
ii  python3-dnspython            2.3.0-1
ii  python3-falcon               3.1.1-1+b1
ii  python3-flufl.bounce         4.0-3
ii  python3-flufl.i18n           3.0.1-3
ii  python3-flufl.lock           5.0.1-4
ii  python3-gunicorn             20.1.0-6
ii  python3-importlib-resources  5.1.2-2
ii  python3-lazr.config          2.2.3-3
ii  python3-passlib              1.7.4-3
ii  python3-psycopg2             2.9.5-1+b1
ii  python3-public               2.3-4
ii  python3-requests             2.28.1+dfsg-1
ii  python3-sqlalchemy           1.4.46+ds1-1
ii  python3-zope.component       5.1.0-1
ii  python3-zope.configuration   4.4.1-1
ii  python3-zope.event           4.4-3
ii  python3-zope.interface       5.5.2-1+b1
ii  ucf                          3.0043+nmu1

Versions of packages mailman3 recommends:
ii  exim4-daemon-heavy [mail-transport-agent]  4.96-15+deb12u5

Versions of packages mailman3 suggests:
pn  anacron                                <none>
ii  lynx [www-browser]                     2.9.0dev.12-1
ii  mailman3-doc                           3.3.8-2~deb12u2
ii  mariadb-server [virtual-mysql-server]  1:10.11.6-0+deb12u1

-- debconf information excluded



More information about the Pkg-mailman-hackers mailing list