[Reproducible-builds] Bug#808207: diffoscope: Filter objdump --disassemble output before diffing it

Mike Hommey mh+reportbug at glandium.org
Thu Dec 17 10:18:36 UTC 2015


Source: diffoscope
Version: 43
Severity: wishlist


When comparing large ELF binaries, some minor differences can end up hurting
the visibility of more important differences.

Specifically, objdump --disassemble displays symbols+offsets for addresses
it derives from IP-relative addressing, like the following:

   9d2be2:     48 8d 05 42 65 24 02    lea    0x2246542(%rip),%rax        # 2c1912b <_fini@@xul45a1+0x1d803>

In the particular case I'm looking at, though, some function ends up pushing
the rest of the .text section, so that the _fini symbol (and many others,
actually) move.

So I end up with a *lot* of differences like:

<   9d2be2:     48 8d 05 42 65 24 02    lea    0x2246542(%rip),%rax        # 2c1912b <_fini@@xul45a1+0x1d803>
---
>   9d2be2:     48 8d 05 42 65 24 02    lea    0x2246542(%rip),%rax        # 2c1912b <_fini@@xul45a1+0x1d7e3>
(note: this is a diff I got manually, because it's easier to visualize than a
copy/paste of the HTML output I got from diffoscope)

The code is the same, the address is the same, but the pseudo-symbol doesn't
match and it actually doesn't matter because that actually points to some place
in .rodata, and the .rodata hasn't moved, only _fini and some earlier symbols
have.

In another case, the symbol between angle brackets is an actual symbol (on
non-stripped binaries) but the symbol name is different because GCC decided
to use a different suffix[1]. For example:

<   9d2f35:     48 8d 05 d1 5b 33 02    lea    0x2335bd1(%rip),%rax        # 2d08b0d <__FUNCTION__.10544+0x29d>
---
>   9d2f35:     48 8d 05 d1 5b 33 02    lea    0x2335bd1(%rip),%rax        # 2d08b0d <__FUNCTION__.10547+0x29d>

The difference might seem interesting to note, but in fact it's not, because it
will already appear in the `readelf --all` diff:

<  17956: 0000000002d08870    21 OBJECT  LOCAL  DEFAULT   16 __FUNCTION__.10544
---
>  17956: 0000000002d08870    21 OBJECT  LOCAL  DEFAULT   16 __FUNCTION__.10547

Anyways, those symbols between angle brackets are just adding noise that would
be better left out. I'm not sure, though, that there is an option to objdump
that allows to make it not display those symbols (and a quick glance at the
binutils source suggests there isn't). I can only suggest sending the output
of objdump through sed :-/

Something like (awful):

@tool_required('objdump')
@tool_required('sed')
def cmdline(self):
    return ['sh', '-c', 'objdump --disassemble --full-contents "%s" | sed "s/<.*>//"' % self.path]


Mike



1. Example of how this can happen:

    $ cat > test.c <<EOF
    enum A {
      FOO, 
    #ifdef WITH_BAR
      BAR, 
    #endif
    };
    void foo() {
      static int a = 0;
    } 
    EOF
    $ gcc -o - -S test.c | grep local
           .local  a.1834
    $ gcc -DWITH_BAR -o - -S test.c | grep local
           .local  a.1835


-- System Information:
Debian Release: stretch/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.2.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)



More information about the Reproducible-builds mailing list