Bug#948834: glib2.0: FTBFS: gio/tests/gsocketclient-slow.c: Error resolving ?localhost?: Name or service not known

Simon McVittie smcv at debian.org
Wed Feb 26 11:33:29 GMT 2020


On Sun, 09 Feb 2020 at 19:19:24 +0000, Simon McVittie wrote:
> On Sun, 09 Feb 2020 at 16:45:05 +0100, Mattia Rizzolo wrote:
> > I see glib2.0 is also failing in the r-b infra:
> > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/glib2.0.html
>
> We could probably work around this in glib2.0 with a Build-Depends on
> libnss-myhostname | netbase, or the other way round.

I tried this, and no, that doesn't work; the situation is more subtle
than I thought, and not the fault of pbuilder's /etc/hosts.

localhost *does* resolve in the container. However, it only resolves
with certain options, and those options don't match all the options GIO
is going to use.

Specifically, GResolver is normally implemented by GThreadedResolver,
which uses getaddrinfo with socktype SOCK_STREAM, protocol IPPROTO_TCP,
flags AI_ADDRCONFIG, and a varying family: either AF_UNSPEC, AF_INET or
AF_INET6 depending on options. The "Happy Eyeballs" algorithm exercised
in this test carries out separate AF_INET and AF_INET6 name resolution,
so that it can make HTTP connections via IPv4 and IPv6 in parallel,
and take whichever works first.

Unfortunately, AI_ADDRCONFIG is documented like this (my emphasis):

     If  hints.ai_flags includes the AI_ADDRCONFIG flag, then IPv4 addresses
     are returned in the list pointed to by res only if the local system has
     at  least  one IPv4 address configured, and IPv6 addresses are returned
     only if the local system has at least one IPv6 address configured. **The
     loopback  address is not considered for this case as valid as a
     configured address.**

and pbuilder's network namespace only has loopback addresses. So we
would expect resolving "localhost" to always fail in that namespace with
AI_ADDRCONFIG, which I would have expected to affect more packages than
just GLib - but that doesn't happen, due to #854301.

To debug this I hacked the attached program into a package built in
pbuilder (GLib is inconveniently large, so I added the program to procenv
instead). You can get similar (but not identical!) results without pbuilder
by compiling the program, installing bwrap and using:

    bwrap --unshare-net --dev-bind / / ./getaddrinfo

By experiment, what actually happens is:

no hints (which in glibc means AF_UNSPEC and AI_ADDRCONFIG|AI_V4MAPPED):
    success, return 127.0.0.1 (only, I don't get ::1 for some reason)
AF_INET:
    if AI_ADDRCONFIG: fails with -2 "Name or service not known"
    else: success, return 127.0.0.1
AF_INET6:
    if AI_ADDRCONFIG: fails with -2 "Name or service not known"
    else (pbuilder): fails with -3 "Temporary failure in name resolution"
    else (bwrap): success, return ::1
AF_UNSPEC:
    success, return 127.0.0.1 (even if AI_ADDRCONFIG is set)

Things I don't understand here:

- Why does (AF_UNSPEC, AI_ADDRCONFIG) succeed? Its documentation suggests
  that it would fail the same way as AF_INET and AF_INET6.
  (This has been reported as a bug before, in #854301.)
- Why does (AF_INET6, not AI_ADDRCONFIG) fail in pbuilder? /etc/hosts lists
  both 127.0.0.1 and ::1 as addresses of localhost, so I would expect
  that to work.

The good news is that GLib 2.63.x should fix this, because GLib 2.63.x
implements
<https://tools.ietf.org/html/draft-ietf-dnsop-let-localhost-be-localhost-02>
and hard-codes "localhost" to resolve to 127.0.0.1 and/or ::1 (depending
on the requested address family).

However, I think it's likely to be a recurring problem that unit tests
for network software try to connect to "localhost", use AI_ADDRCONFIG
because it is usually the right thing to do for Internet names, and find
that they cannot resolve that name - particularly if glibc changes its
behaviour to match its documentation (fixing #854301).

Possible solutions:

- In pbuilder's network namespace, assign a useless non-127.0.0.1
  address (perhaps 127.0.0.2) so that AI_ADDRCONFIG thinks we have
  basic IPv4 connectivity and will resolve localhost to 127.0.0.1
- Implement "let localhost be localhost" in either glibc, or everything
  that does name resolution, or both
  (e.g.
  <https://gitlab.gnome.org/GNOME/glib/-/merge_requests/616> in GIO,
  also implemented in Firefox and Chromium)
- Implement a special case that disables AI_ADDRCONFIG when looking up
  localhost in either glibc, or everything that does name resolution,
  or both
  (Mozilla does this, and Firefox still does:
  <https://hg.mozilla.org/releases/mozilla-1.9.2/rev/c5d74bcd7421>
  <https://sources.debian.org/src/firefox-esr/68.5.0esr-1/nsprpub/pr/src/misc/prnetdb.c/?hl=2037#L2037>)
- Make tests that require resolving localhost skip themselves if it
  doesn't resolve. I think this is potentially undesirable because if
  sbuild starts to do the same no-network trick as pbuilder, it would
  effectively reduce our test coverage from every architecture down to
  the 2 architectures where we have autopkgtest (amd64 and arm64).
- Don't test anything involving name resolution (even of localhost) at
  build-time, only in autopkgtest. I think this is undesirable because,
  again, it would reduce our test coverage from every architecture down
  to 2 architectures (amd64 and arm64).

See also:
- https://sourceware.org/bugzilla/show_bug.cgi?id=12377
- https://github.com/zeromq/libzmq/issues/42
- https://fedoraproject.org/wiki/QA/Networking/NameResolution/ADDRCONFIG

    smcv
-------------- next part --------------
A non-text attachment was scrubbed...
Name: getaddrinfo.c
Type: text/x-csrc
Size: 4379 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-gnome-maintainers/attachments/20200226/dc88834a/attachment.c>


More information about the pkg-gnome-maintainers mailing list