Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2
Ximin Luo
infinity0 at debian.org
Tue May 30 14:21:00 UTC 2017
Ximin Luo:
> Ximin Luo:
>> Package: diffoscope
>> Version: 78
>> Severity: normal
>>
>> Dear Maintainer,
>>
>> diff(1) first reads the contents of one file then the next one:
>>
>> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552
>>
>> This means that if the "files" are actually FIFOs connected to the output of a
>> process, as they are in many cases in diffoscope, the second process has to wait
>> for diff(1) to fully read the output of the first process, before it itself can
>> run. This prevents both processes from running in parallel.
>>
>> An appropriate fix would be to store the output of at least one of the commands
>> into a temporary file, and have diff(1) read from this instead. This has to be
>> done carefully however, to make sure that diff(1) doesn't accidentally read it
>> before the process is finished.
>>
>> [..]
>
> It seems readelf specifically has weird performance behaviours when running in parallel.
>
> [..]
I couldn't reproduce the above results on Holger's profitbricks machine, and bunk@ couldn't reproduce it either. That is, running the commands in parallel *did* produce roughly a 2x speed up.
Also on my local machine I got:
$ ls -laSr /usr/bin/{hot,hokey,darcs}
-rwxr-xr-x 1 root root 20555008 Oct 28 2016 /usr/bin/hot*
-rwxr-xr-x 1 root root 29637664 Oct 28 2016 /usr/bin/hokey*
-rwxr-xr-x 1 root root 37144392 Oct 28 2016 /usr/bin/darcs*
$ f() { taskset --cpu-list $1 objdump -S /usr/bin/hot >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )
real 0m12.445s
user 0m12.408s
sys 0m0.024s
real 0m7.653s
user 0m15.224s
sys 0m0.040s
$ f() { taskset --cpu-list $1 objdump -S /usr/bin/hokey >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )
real 0m24.998s
user 0m24.896s
sys 0m0.064s
real 0m21.197s
user 0m42.224s
sys 0m0.076s
$ f() { taskset --cpu-list $1 objdump -S /usr/bin/darcs >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )
real 0m38.652s
user 0m38.532s
sys 0m0.064s
real 0m34.323s
user 1m8.168s
sys 0m0.104s
i.e. the speed-improvement-due-to-parallelism decreases as the size of the input increases - but I couldn't reproduce this the profitbricks machine either.
Due to the lack of debugging symbols for binutils (#863728) it's hard for me to investigate this further, so I'll pause this for now.
It's probably worth un-reverting e28b540b0b289ce9fda70095160382799d7602a6 perhaps guarded by a CLI flag; though diffoscope's heavy use of Python-based filtering of external commands' output makes this less significant (without also trying to optimise how this filtering is done). In the meantime I'm also using "--exclude-command '^readelf.*\s--debug-dump=info'" to avoid the longest part of ELF processing.
X
--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
More information about the Reproducible-builds
mailing list