Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2

Ximin Luo infinity0 at debian.org
Tue May 30 14:21:00 UTC 2017


Ximin Luo:
> Ximin Luo:
>> Package: diffoscope
>> Version: 78
>> Severity: normal
>>
>> Dear Maintainer,
>>
>> diff(1) first reads the contents of one file then the next one:
>>
>> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552
>>
>> This means that if the "files" are actually FIFOs connected to the output of a
>> process, as they are in many cases in diffoscope, the second process has to wait
>> for diff(1) to fully read the output of the first process, before it itself can
>> run. This prevents both processes from running in parallel.
>>
>> An appropriate fix would be to store the output of at least one of the commands
>> into a temporary file, and have diff(1) read from this instead. This has to be
>> done carefully however, to make sure that diff(1) doesn't accidentally read it
>> before the process is finished.
>>
>> [..]
> 
> It seems readelf specifically has weird performance behaviours when running in parallel.
> 
> [..]

I couldn't reproduce the above results on Holger's profitbricks machine, and bunk@ couldn't reproduce it either. That is, running the commands in parallel *did* produce roughly a 2x speed up.

Also on my local machine I got:

    $ ls -laSr /usr/bin/{hot,hokey,darcs}
    -rwxr-xr-x 1 root root 20555008 Oct 28  2016 /usr/bin/hot*
    -rwxr-xr-x 1 root root 29637664 Oct 28  2016 /usr/bin/hokey*
    -rwxr-xr-x 1 root root 37144392 Oct 28  2016 /usr/bin/darcs*

    $ f() { taskset --cpu-list $1 objdump -S /usr/bin/hot >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m12.445s
    user	0m12.408s
    sys	0m0.024s

    real	0m7.653s
    user	0m15.224s
    sys	0m0.040s

    $ f() { taskset --cpu-list $1 objdump -S /usr/bin/hokey >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m24.998s
    user	0m24.896s
    sys	0m0.064s

    real	0m21.197s
    user	0m42.224s
    sys	0m0.076s

    $ f() { taskset --cpu-list $1 objdump -S /usr/bin/darcs >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m38.652s
    user	0m38.532s
    sys	0m0.064s

    real	0m34.323s
    user	1m8.168s
    sys	0m0.104s

i.e. the speed-improvement-due-to-parallelism decreases as the size of the input increases - but I couldn't reproduce this the profitbricks machine either.

Due to the lack of debugging symbols for binutils (#863728) it's hard for me to investigate this further, so I'll pause this for now.

It's probably worth un-reverting e28b540b0b289ce9fda70095160382799d7602a6 perhaps guarded by a CLI flag; though diffoscope's heavy use of Python-based filtering of external commands' output makes this less significant (without also trying to optimise how this filtering is done). In the meantime I'm also using "--exclude-command '^readelf.*\s--debug-dump=info'" to avoid the longest part of ELF processing.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git



More information about the Reproducible-builds mailing list