Bug#1030223: gobject-introspection: make cross-compilation possible

Thu Jan 11 13:15:50 GMT 2024

Hi Simon,

On Thu, Jan 11, 2024 at 12:08:53PM +0000, Simon McVittie wrote:
> On Thu, 04 Jan 2024 at 09:54:52 +0100, on #1059929, Helmut Grohne wrote:
> > On Wed, Jan 03, 2024 at 07:22:26PM +0000, Simon McVittie wrote:
> > > Or do I need to [...]
> > > replace the gobject-introspection-bin | qemu-user | qemu-user-static
> > > dependency by python3 | qemu-user | qemu-user-static or similar?
> > 
> > I am not sure that you are the one who should express a qemu dependency.
> 
> Part of how g-ir-compiler works is that it generates and compiles a
> "dumper" for the host architecture, links it to the library we are
> introspecting (let's say libflatpak), runs it, and parses its output.
> This is the "introspection" part of the gobject-introspection name.
> 
> The ${GNU_TYPE}-g-ir-compiler wrapper script (which happens to be written
> in Python, the same as the upstream g-ir-compiler) explicitly tells
> g-ir-compiler to run the "dumper" binary under qemu-user if it detects
> that the Python architecture is not one that can run the host architecture.

Do I understand correctly that cross building to i386 on amd64 would
cause this wrapper to run the i386 binary in qemu?

> It is not particularly straightforward for the package that is currently
> being built to set this up, particularly if we want to do that without
> changing its upstream source code (which I think we do, because changing
> upstream source for this would scale very poorly). The one thing that
> we can straightforwardly do across multiple build systems (Autotools
> and Meson tested, CMake probably also OK) is to substitute a different
> executable to be used instead of g-ir-compiler, and the executable I'm
> substituting in this case is ${GNU_TYPE}-g-ir-compiler.

I agree with the approach taken, but I think g-ir-compiler could be more
clever. Rather than assume that the host architecture is not runnable
when it differs from the build architecture, could it detect that? A
simple way would be invoking arch-test ${DEB_HOST_ARCH}, but it can as
well compile and run trivial program (as autoconf does all the time). If
that happens to not run, it can still prepend qemu. That's not the part
I'm objecting to.

I object to qemu being a hard dependency. I think there are roughly
three ways to make this work and I'd prefer to leave more of this
flexibility to builders:
 a. The host architecture is directly runnable on the CPU.
    Examples: native builds, amd64 -> i386, and often arm64 -> armhf
 b. The build system has qemu-user-static installed outside the build
    chroot.
 c. The chroot contains qemu-user and this needs to be run explicitly.

Making this work deviates from your current setup it two ways:
 * Work in the absence of a qemu binary when the host arch is runnable.
 * Make the qemu dependency optional somehow.

> I'm using the Python architecture as an approximation of the build
> architecture, on the basis that, if we have already successfully started
> a Python script, then we already know we can run binaries of the same
> architecture as the Python interpreter :-)

I agree this is a guess that likely does not misdetect a non-runnable
host architecture as runnable. It still produces misdetections of the
other kind.

> The dumper binary is really rather simple: it loads libraries, it
> initializes the GObject type system, and it does some very simple file
> I/O with the fopen()/fwrite() family. It doesn't need to do any elaborate
> computation, so performance is not a concern; and it doesn't need to call
> any complicated syscalls, unless the library we're introspecting makes
> those syscalls during class initialization (which would be weird, normally
> that would happen during instance initialization at the earliest).

I agree that performance is not a concern. Emulation bugs and
satisfiability is.

> At the moment, i686-linux-gnu-g-ir-compiler running on x86_64-linux-gnu
> Python optimistically always runs the dumper binary natively, without qemu
> - but it would not be a problem to change that so that it pessimistically
> always uses qemu, if you are concerned about corner-cases. As I said,
> performance isn't important here.

Is it actually hard to try both ways so you can do away with such corner
cases? Given that you try running first, it would work the same way for
native and cross.

> At the moment, the ${GNU_TYPE}-g-ir-compiler scripts pessimistically assume
> that there is no binfmt set up, and will always run qemu-user if it seems
> that it might be necessary. Again, if this means we run qemu a bit more
> often than we need to, I'm fine with that.

How about trying instead of assuming? I seem to repeat myself.

> In practice non-Linux architectures don't have qemu-user, so the practical
> result is that you can build natively on any architecture, or you can
> cross-compile for Linux on any other Linux of the same endianness
> (endianness must match because of tools limitations).

Expected.

> The problem with tests is that they test real functionality, so they are
> hard mode for qemu-user - they must exercise arbitrary syscalls, they're
> often timing-sensitive, and so on. The dumper program used by g-ir-compiler
> is much, much simpler - it's not that far beyond a "hello world", so I
> would expect any vaguely functional qemu-user architecture to be sufficient.

Right, but you cannot depend on that anyway. qemu-user is marked
M-A:foreign, but it really doesn't supply emulation for all Debian
architectures, so it is a gamble anyway. I agree that we should default
to using qemu, but there also should be a way for a builder to say "just
run it directly, I know it's going to work".

> The only ways I can see to cross-build GIR data are to use some sort of
> pregenerated/cached information in the source package (which is what I
> suggested a while ago, but you didn't like that), or to use an interpreter
> like qemu-user to run the dumper.

I'm not asking for this kind of complexity. I think it is perfectly fine
to fail if neither direct invocation nor qemu (e.g. since it wasn't
installed) works.

> I would tend to think that qemu dependencies in Build-Depends are
> appropriate if and only if it's the source package that is making the
> choice to invoke qemu. When it's an implementation detail of the
> (cross-capable wrapper around) g-ir-compiler, it seems more appropriate
> to put the dependency in the same place as the implementation.

The argument is reasonable. Your way of looking at it also lowers
maintenance cost as we don't have to modify tons of B-D. I am wondering
about a middle-ground of having a package can-run-arch being M-A:same
and having a maintainer script that validates the property. Then you
could Depends: qemu-user | can-run-arch  (expressing the preference for
qemu-user) while any builder could still --add-depends=can-run-arch to
opt out of qemu.

> Or if you think that "fallback plan" would actually be better anyway,
> I could do that, even though it isn't strictly needed?

I'm happy with either choice. In other cases we went for M-A:foreign. I
wonder though whether we should make such "do not depend" rules
explicitly checkable somehow. There is more of that in the archive.

> >  * We can annotate such qemu dependencies with a build profile e.g.
> >    <cross !nocrossemu>. By default, such dependencies would only become
> >    active for cross builds, but the second profile enables you to skip
> >    them when you know that no emulator is required.
> 
> But then how does ${GNU_TYPE}-g-ir-compiler (which is generated from
> debian/cross-g-ir-tool.in in the gobject-introspection/experimental
> source) know whether it needs to invoke the upstream g-ir-compiler with
> or without passing --use-binary-wrapper=qemu-${DEB_HOST_ARCH}?

I'm hoping that it can simply detect that without having to know.

> > Other than this, let me note that M-A:allowed always seemed a little
> > annoying to me, because it makes an implementation detail visible to
> > consumers. Whenever you think you need M-A:allowed, you may instead
> > introduce a layer of indirection. In principle, you could add a real
> > binary packages: gobject-introspection-any-endian with Arch:any
> > M-A:foreign Depends:gobject-introspection-bin and architecture-dependent
> > provides. Then you can just depend on
> > gobject-introspection-little-endian without thinking about whether you
> > have to add :any.
> 
> I'd prefer not to have another trip through NEW and another binary package
> name in the Packages file just to avoid M-A:allowed.

You also indicated that gobject-introspection-bin should not be
dependend upon, so my argument fully becomes moot: Since noone should
depend on this, no implementation detail becomes visible to reasonable
consumers.

> Again, if you think it would be better for g-i-bin to be M-A:foreign,
> I can do that, and that would avoid M-A:allowed without going through NEW.

As long as it is not meant show up in depends of other packages, it does
not matter.

So I think we mostly agree already. From my pov, the remaining questions
are:
 * Can g-ir-scanner detect whether it needs qemu rather than assume?
 * Can we provide a reasonable opt-out of qemu on the dependency level?

Helmut