Bug#863636: diffoscope: usage of FIFOs causes pair-comparisons to not run in parallel, wasting performance by about 1/2

Ximin Luo infinity0 at debian.org
Tue May 30 11:40:00 UTC 2017


Ximin Luo:
> Package: diffoscope
> Version: 78
> Severity: normal
> 
> Dear Maintainer,
> 
> diff(1) first reads the contents of one file then the next one:
> 
> https://sources.debian.net/src/diffutils/1:3.5-3/src/io.c/#L552
> 
> This means that if the "files" are actually FIFOs connected to the output of a
> process, as they are in many cases in diffoscope, the second process has to wait
> for diff(1) to fully read the output of the first process, before it itself can
> run. This prevents both processes from running in parallel.
> 
> An appropriate fix would be to store the output of at least one of the commands
> into a temporary file, and have diff(1) read from this instead. This has to be
> done carefully however, to make sure that diff(1) doesn't accidentally read it
> before the process is finished.
> 
> [..]

It seems readelf specifically has weird performance behaviours when running in parallel. My initial attempt at fixing this works in the general case, but not for specific cases that are important to diffoscope and this scenario.

https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?h=experimental&id=e28b540b0b289ce9fda70095160382799d7602a6

Taking diffoscope completely out of the equation, we see the following behaviours:

    $ f() { sleep 3; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m6.010s
    user	0m0.000s
    sys	0m0.000s

    real	0m3.004s
    user	0m0.000s
    sys	0m0.000s

    $ f() { sha256sum /run/shm/elf.$1 >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m0.451s
    user	0m0.432s
    sys	0m0.012s

    real	0m0.221s
    user	0m0.428s
    sys	0m0.004s

(The above two cases confirm that the shell snippets are correct. /run/shm/elf.{1,2} is about 50MB.)

    $ f() { taskset $1 readelf --wide --debug-dump=rawline /run/shm/elf.$1 >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m17.128s
    user	0m17.096s
    sys	0m0.008s

    real	0m19.126s
    user	0m37.256s
    sys	0m0.016s

    $ f() { taskset $1 readelf --wide --debug-dump=rawline /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m3.330s
    user	0m3.324s
    sys	0m0.000s

    real	0m1.674s
    user	0m3.324s
    sys	0m0.000s

    $ f() { taskset $1 readelf --wide --debug-dump=info /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null; }; time ( f 1; f 2; ); time ( f 1 & x=$!; f 2; wait $x; )

    real	0m18.437s
    user	0m18.384s
    sys	0m0.012s

    real	0m29.039s
    user	0m56.196s
    sys	0m0.044s

/usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug is about 4MB and from libc6-dbg=2.24-10.

So it looks like readelf has issues when running for a long time in parallel. (--debug-dump=rawline with a 4MB file sped up when running in parallel.)

However other tools don't seem to have this problem:

    $ time ( { echo 1; echo 2; } | xargs -n1 -I '{}' taskset '{}' readelf --wide --debug-dump=info /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null )

    real	0m18.730s
    user	0m18.644s
    sys	0m0.032s

    $ time ( { echo 1; echo 2; } | parallel --will-cite taskset '{}' readelf --wide --debug-dump=info /usr/lib/debug/.build-id/22/46ba050897f1d98034a7ca4b7ec06b594a373d.debug >/dev/null )

    real	0m33.076s
    user	1m0.844s
    sys	0m0.460s

    $ time ( { echo 1; echo 2; } | xargs -n1 -I '{}' taskset '{}' sha256sum /run/shm/test.iso >/dev/null )

    real	0m33.302s
    user	0m32.332s
    sys	0m0.932s

    $ time ( { echo 1; echo 2; } | parallel --will-cite taskset '{}' sha256sum /run/shm/test.iso >/dev/null )

    real	0m16.843s
    user	0m32.224s
    sys	0m1.272s

Here, /run/shm/test.iso is about 4GB in size, and it does speed up when being processed by sha256sum in parallel.

Running strace on the "readelf" calls shows that most syscalls are done at the start, then for most of the rest of the execution it's just "write()" syscalls and indeed the "sys" times above are negligible. So it seems unlikely that the processes are blocking on anything.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git



More information about the Reproducible-builds mailing list