[Reproducible-builds] Bug#808207: diffoscope: Filter objdump --disassemble output before diffing it
Mike Hommey
mh+reportbug at glandium.org
Thu Dec 17 10:18:36 UTC 2015
Source: diffoscope
Version: 43
Severity: wishlist
When comparing large ELF binaries, some minor differences can end up hurting
the visibility of more important differences.
Specifically, objdump --disassemble displays symbols+offsets for addresses
it derives from IP-relative addressing, like the following:
9d2be2: 48 8d 05 42 65 24 02 lea 0x2246542(%rip),%rax # 2c1912b <_fini@@xul45a1+0x1d803>
In the particular case I'm looking at, though, some function ends up pushing
the rest of the .text section, so that the _fini symbol (and many others,
actually) move.
So I end up with a *lot* of differences like:
< 9d2be2: 48 8d 05 42 65 24 02 lea 0x2246542(%rip),%rax # 2c1912b <_fini@@xul45a1+0x1d803>
---
> 9d2be2: 48 8d 05 42 65 24 02 lea 0x2246542(%rip),%rax # 2c1912b <_fini@@xul45a1+0x1d7e3>
(note: this is a diff I got manually, because it's easier to visualize than a
copy/paste of the HTML output I got from diffoscope)
The code is the same, the address is the same, but the pseudo-symbol doesn't
match and it actually doesn't matter because that actually points to some place
in .rodata, and the .rodata hasn't moved, only _fini and some earlier symbols
have.
In another case, the symbol between angle brackets is an actual symbol (on
non-stripped binaries) but the symbol name is different because GCC decided
to use a different suffix[1]. For example:
< 9d2f35: 48 8d 05 d1 5b 33 02 lea 0x2335bd1(%rip),%rax # 2d08b0d <__FUNCTION__.10544+0x29d>
---
> 9d2f35: 48 8d 05 d1 5b 33 02 lea 0x2335bd1(%rip),%rax # 2d08b0d <__FUNCTION__.10547+0x29d>
The difference might seem interesting to note, but in fact it's not, because it
will already appear in the `readelf --all` diff:
< 17956: 0000000002d08870 21 OBJECT LOCAL DEFAULT 16 __FUNCTION__.10544
---
> 17956: 0000000002d08870 21 OBJECT LOCAL DEFAULT 16 __FUNCTION__.10547
Anyways, those symbols between angle brackets are just adding noise that would
be better left out. I'm not sure, though, that there is an option to objdump
that allows to make it not display those symbols (and a quick glance at the
binutils source suggests there isn't). I can only suggest sending the output
of objdump through sed :-/
Something like (awful):
@tool_required('objdump')
@tool_required('sed')
def cmdline(self):
return ['sh', '-c', 'objdump --disassemble --full-contents "%s" | sed "s/<.*>//"' % self.path]
Mike
1. Example of how this can happen:
$ cat > test.c <<EOF
enum A {
FOO,
#ifdef WITH_BAR
BAR,
#endif
};
void foo() {
static int a = 0;
}
EOF
$ gcc -o - -S test.c | grep local
.local a.1834
$ gcc -DWITH_BAR -o - -S test.c | grep local
.local a.1835
-- System Information:
Debian Release: stretch/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.2.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
More information about the Reproducible-builds
mailing list