Bug#765577: (no subject)

Faidon Liambotis paravoid at debian.org
Wed Mar 18 17:15:59 GMT 2015


severity 765577 serious
thanks

On Wed, Feb 25, 2015 at 03:24:08PM +0000, Filippo Giunchedi wrote:
> FWIW we're running into the same bug with jessie installer, passing
> 'debug' at boot apparently is enough to not trigger the race with good
> success rate.

Filippo and I both work for the Wikimedia Foundation, where this is
affecting us on dozens of systems.

I tried to debug this extensively and had a chat with Marco d'Itri on
IRC. It's both mine & Marco's opinion that this is an RC bug, thus
elevating this to serious. Unfortunately, Marco told me that he won't
able to tackle this and suggested to reply to this bug report so that
the other udev maintainers can help out.

The result of my own investigation is (not speaking for Marco):

It's clear that there's some race condition happening here both because
there are reports of it happening sporadically (not in my case, though)
and because setting d-i to debug mode fixes it.

Therefore, the operating theory is that multiple events for the same
"add" event are triggered. This race is supposed to be handled, as:

a) write_net_rules takes a lock before writing anything -- it's also
evident this happens, as the duplicate entries have ethNs that are
numerically ascending and not the same for the same card.

b) 75-persistent-net-generator.rules is supposed to be idempotent, as it
bails out early (3rd line) for interfaces that already have a NAME set.
For the ones that don't, it also sets NAME right after the
write_net_rules invocation.

However this still leaves room for a race: write_net_rules is *not*
idempotent and hence if 75-persistent-net-generator.rules gets called
twice in very quick succession, before write_net_rules gets a chance to
finish and name the interface, then an interface will be named twice,
with a different name (and hence, eth0 will be renamed to e.g. eth2).

It's still unknown to me why this is a regression.

I've tried the following, under /lib/debian-installer/start-udev:
1) Adding a "udevadm settle || true" right after the "udevadm trigger".
2) Adding a "sleep 15" before "udevadm trigger"
3) Adding a "sleep 15" (or 3) *after* "udevadm trigger".

Surprisingly, of these three, only (3) worked around the bug.

Another less arbitrary/racy workaround I suggesed was a grep near the
top of write_net_rules' write_rule() function.  Since write_rule()
operates under a lock, this would completely eliminate any kind of race
here. I pitched this to Marco but he wasn't thrilled with the idea -- he
said he'd prefer finding the root cause. I've done the change and tested
it anyway, though, and it successfully aleviates this issue:

diff --git a/debian/extra/write_net_rules b/debian/extra/write_net_rules
index 4379792..fbd1230 100644
--- a/debian/extra/write_net_rules
+++ b/debian/extra/write_net_rules
@@ -60,6 +60,9 @@ write_rule() {
 	local name="$2"
 	local comment="$3"
 
+	# workaround potential races, #765577
+	if grep -q -F $match $RULES_FILE then return; fi
+
 	{
 	if [ "$PRINT_HEADER" ]; then
 		PRINT_HEADER=

Thanks,
Faidon



More information about the Pkg-systemd-maintainers mailing list