Bug#342619: Possible exim retry bug (Re: master mail problems -- help needed)

Jeroen van Wolffelaar jeroen at wolffelaar.nl
Fri Dec 9 00:12:56 UTC 2005


Package: exim4-daemon-heavy
Version: 4.50-8
Severity: serious

On Thu, Dec 08, 2005 at 10:33:54PM +0100, Florian Weimer wrote:
> * Lionel Elie Mamane:
> 
> > On Thu, Dec 08, 2005 at 09:30:52PM +0100, Wouter Verhelst wrote:
> >
> >> The fact that my primary MX is only available through IPv6, and that
> >> this is the case for other people who're having problems too might
> >> then be a better chance at being the problem.
> >
> > My primary MX is IPv6-only, too. I don't have detected a problem yet :)
> 
> Do you receive lots of mail from master.debian.org, and would you
> notice the bounces?  Mail from Debian mailing lists come directly from
> murphy.debian.org, which does not seem to have the problem.
> 
> You also have one IPv4-only MX, which might be enough to prevent the
> Exim bug[1] from occurring.
> 
> [1] I'm not sure if it's a Exim's fault, it's only a hunch.

I'm quite sure it's an exim bug, but haven't quite nailed it yet. The
bug has been witnessed positively both on master.d.o and on
one mailserver I maintain. Interestingly, it doesn't seem to be IPv6
related (or maybe there are two bugs).

The situation on my mailserver was that the primary MX had a long term
unavailability and was way past cutoff time, but the secondary MX worked
fine. However, for some reason, what suddenly happened was that all the
mail queued for the domain in question got bounced for reason of having
a extended time of being unreacheable, past the retry time. Obviously,
that's bogus, as the secundary MX wasn't past cutoff yet.

I've meant to look into the code for this, but didn't yet get around to
it. If someone wants to do so, please -- I seriously suspect that Exim
in Sarge has a serious bug in there somewhere, it's showing up with this
IPv6 and IPv4 multihomed MX's too, after all.

I think this is a serious bug, as it can cause mail to get lost
(bouncing a mail for no good reason at all in some very common
situations like the IPv6 vs IPv4 multimhomed MX's)

Log snippets:

# Primary (long time unreacheable) MX is shrek.vanschaik.tk, secundary
# reacheable MX is mailrelay.direct-adsl.nl

Last succesful delivery:

2005-11-30 17:49:53 1EhV6R-0000uq-Qg shrek.vanschaik.tk [81.207.193.3]:
Connection timed out
2005-11-30 17:50:02 1EhV6R-0000uq-Qg => bas at vanschaik.tk
<sjeik at A-Eskwadraat.nl> R=dnslookup_relay_to_domains T=remote_smtp
H=mailrelay.direct-adsl.nl [195.121.6.56] C="250 2.0.0 jAUGo2wm025502
Message accepted for delivery" QT=3m19s

First failure:

2005-11-30 18:35:41 1EhVnA-0002GK-L1 shrek.vanschaik.tk [81.207.193.3]:
No route to host
2005-11-30 18:35:41 1EhVnA-0002GK-L1 == bas at vanschaik.tk
<sjeik at A-Eskwadraat.nl> R=dnslookup_relay_to_domains T=remote_smtp defer
(113): No route to host
2005-11-30 18:35:41 1EhVnA-0002GK-L1 ** bas at vanschaik.tk
<sjeik at A-Eskwadraat.nl>: retry timeout exceeded

Second failure:

2005-11-30 18:36:43 1EhVrp-0001pB-Jw ** bas at vanschaik.tk
<sjeik at A-Eskwadraat.nl> R=dnslookup_relay_to_domains T=remote_smtp:
retry time not reached for any host after a long failure period

Obviously, the secundary MX was okay, so retry timeout exceeded and
especially the second failure should not have happened.

--Jeroen

-- 
Jeroen van Wolffelaar
Jeroen at wolffelaar.nl (also for Jabber & MSN; ICQ: 33944357)
http://Jeroen.A-Eskwadraat.nl




More information about the Pkg-exim4-maintainers mailing list