Bug#1030223: gobject-introspection: make cross-compilation possible

Thu Jan 11 12:08:53 GMT 2024

Control: retitle -1 gobject-introspection: make cross-compilation possible

Moving discussion of the finer points of gobject-introspection
cross-compilation from release team bug #1059929 to g-i bug #1030223
since the release team probably don't want this on their list, and
retitling the g-i bug to be more general.

On Thu, 04 Jan 2024 at 09:54:52 +0100, on #1059929, Helmut Grohne wrote:
> On Wed, Jan 03, 2024 at 07:22:26PM +0000, Simon McVittie wrote:
> > Or do I need to [...]
> > replace the gobject-introspection-bin | qemu-user | qemu-user-static
> > dependency by python3 | qemu-user | qemu-user-static or similar?
> 
> I am not sure that you are the one who should express a qemu dependency.

Part of how g-ir-compiler works is that it generates and compiles a
"dumper" for the host architecture, links it to the library we are
introspecting (let's say libflatpak), runs it, and parses its output.
This is the "introspection" part of the gobject-introspection name.

The ${GNU_TYPE}-g-ir-compiler wrapper script (which happens to be written
in Python, the same as the upstream g-ir-compiler) explicitly tells
g-ir-compiler to run the "dumper" binary under qemu-user if it detects
that the Python architecture is not one that can run the host architecture.

It is not particularly straightforward for the package that is currently
being built to set this up, particularly if we want to do that without
changing its upstream source code (which I think we do, because changing
upstream source for this would scale very poorly). The one thing that
we can straightforwardly do across multiple build systems (Autotools
and Meson tested, CMake probably also OK) is to substitute a different
executable to be used instead of g-ir-compiler, and the executable I'm
substituting in this case is ${GNU_TYPE}-g-ir-compiler.

I'm using the Python architecture as an approximation of the build
architecture, on the basis that, if we have already successfully started
a Python script, then we already know we can run binaries of the same
architecture as the Python interpreter :-)

The dumper binary is really rather simple: it loads libraries, it
initializes the GObject type system, and it does some very simple file
I/O with the fopen()/fwrite() family. It doesn't need to do any elaborate
computation, so performance is not a concern; and it doesn't need to call
any complicated syscalls, unless the library we're introspecting makes
those syscalls during class initialization (which would be weird, normally
that would happen during instance initialization at the earliest).

> When we reason about dependencies, we care about how they behave
> assuming that you can run them. Whether you can run an executable from a
> package or not is something that is not expressed in our package
> relationships. It's also rather difficult. Consider a few corner cases:
> 
>  * Some amd64 can run i386.

To the best of my knowledge, all amd64 can run i386? Although I suppose
32-bit compat syscalls could conceivably have been disabled at kernel
level (although I don't know why you'd do that and then compile i386
software).

At the moment, i686-linux-gnu-g-ir-compiler running on x86_64-linux-gnu
Python optimistically always runs the dumper binary natively, without qemu
- but it would not be a problem to change that so that it pessimistically
always uses qemu, if you are concerned about corner-cases. As I said,
performance isn't important here.

>  * Most arm64, but not all, can run armhf.

At the moment, arm-linux-gnueabihf-g-ir-compiler pessimistically assumes
that nothing can run armhf, except for armhf itself. If this means we run
qemu a bit more often than we need to, that's fine: it's unlikely to be a
performance or functionality bottleneck.

>  * You may operate in a chroot with some external qemu-binfmt and thus
>    execute any arch.

At the moment, the ${GNU_TYPE}-g-ir-compiler scripts pessimistically assume
that there is no binfmt set up, and will always run qemu-user if it seems
that it might be necessary. Again, if this means we run qemu a bit more
often than we need to, I'm fine with that.

>  * You cannot run hurd-i386 on amd64 even in the presence of qemu-user.

That's a good point, I'll tighten up the dependency so that
gobject-introspection:hurd-i386 (or more generally, non-Linux) requires a
gobject-introspection-bin (and therefore Python) from the matching OS.

In practice non-Linux architectures don't have qemu-user, so the practical
result is that you can build natively on any architecture, or you can
cross-compile for Linux on any other Linux of the same endianness
(endianness must match because of tools limitations).

> When we considered whether cross building should imply disabling tests,
> we went for "no, but yes by default". When you cross build a package for
> i386 on amd64, sbuild and pbuilder will automatically add nocheck to
> DEB_BUILD_OPTIONS and DEB_BUILD_PROFILES. However, you can opt out of
> this behaviour to really run tests despite performing a cross build. I
> think we need a similar mechanism for qemu integration.

The problem with tests is that they test real functionality, so they are
hard mode for qemu-user - they must exercise arbitrary syscalls, they're
often timing-sensitive, and so on. The dumper program used by g-ir-compiler
is much, much simpler - it's not that far beyond a "hello world", so I
would expect any vaguely functional qemu-user architecture to be sufficient.

Also, running tests is not functionally required, but
gobject-introspection is: the GIR XML and typelib are part of the build
products for a library or program. We can turn them off if we have to,
and some libraries and library users will still have useful functionality
without them (like flatpak, which is primarily C code), but some packages
are entirely useless without gobject-introspection data (like gnome-shell,
which is primarily JavaScript).

For packages like flatpak, we have three choices: we can not cross-build
at all (status quo during Debian 12), we can cross-build but exclude GIR
stuff (nogir build profile), or we can cross-build fully. For packages
like gnome-shell, cross-building but excluding GIR is pointless, so our
only options are to not cross-build it at all, or cross-build it fully.

The only ways I can see to cross-build GIR data are to use some sort of
pregenerated/cached information in the source package (which is what I
suggested a while ago, but you didn't like that), or to use an interpreter
like qemu-user to run the dumper.

> When we talked about this, I was having in mind (but probably didn't
> express this explicitly) that such qemu dependencies would happen in
> Build-Depends only.

I would tend to think that qemu dependencies in Build-Depends are
appropriate if and only if it's the source package that is making the
choice to invoke qemu. When it's an implementation detail of the
(cross-capable wrapper around) g-ir-compiler, it seems more appropriate
to put the dependency in the same place as the implementation.

>  * Your satisfiability problem with britney2 probably goes away.

If I'm reading britney2 git history correctly, the satisfiability problem
is now fixed: as of the latest pseudo-excuses update,
gobject-introspection (1.78.1-6 to 1.78.1-9) is only waiting for
autopkgtests to run.

If that doesn't work, then the fallback plan is to swap g-i-bin to
M-A: foreign, and drop the :any from the dependency on its virtual package
name. I was only really using M-A: allowed as an extra safety-catch
against dependent packages depending directly on g-i-bin (which its
Description already asks you not to do) and getting an unexpected
architecture combination, so I think using M-A: foreign would be
functionally OK.

Or if you think that "fallback plan" would actually be better anyway,
I could do that, even though it isn't strictly needed?

>  * We can annotate such qemu dependencies with a build profile e.g.
>    <cross !nocrossemu>. By default, such dependencies would only become
>    active for cross builds, but the second profile enables you to skip
>    them when you know that no emulator is required.

But then how does ${GNU_TYPE}-g-ir-compiler (which is generated from
debian/cross-g-ir-tool.in in the gobject-introspection/experimental
source) know whether it needs to invoke the upstream g-ir-compiler with
or without passing --use-binary-wrapper=qemu-${DEB_HOST_ARCH}?

> Other than this, let me note that M-A:allowed always seemed a little
> annoying to me, because it makes an implementation detail visible to
> consumers. Whenever you think you need M-A:allowed, you may instead
> introduce a layer of indirection. In principle, you could add a real
> binary packages: gobject-introspection-any-endian with Arch:any
> M-A:foreign Depends:gobject-introspection-bin and architecture-dependent
> provides. Then you can just depend on
> gobject-introspection-little-endian without thinking about whether you
> have to add :any.

I'd prefer not to have another trip through NEW and another binary package
name in the Packages file just to avoid M-A:allowed.

Again, if you think it would be better for g-i-bin to be M-A:foreign,
I can do that, and that would avoid M-A:allowed without going through NEW.

> Let me also note that the way you have gobject-introspection (the binary
> package) now fills a similar role to pkgconf/pkg-config and qt5-qmake as
> well as binutils-for-host and hopefully soon also gcc-for-host. That
> pattern seems to work out rather well.

Yes, the impression I got from our discussion at the minidebconf is that
the pattern used by pkgconf is considered to be a good one, so I was
intentionally using a similar pattern.

    smcv