[Qa-jenkins-dev] Bug#865124: Bug#865124: jenkins.debian.org: reproducible builds: save diffoscope json output

Mon Jul 10 11:36:00 UTC 2017

Ximin Luo:
> Holger Levsen:
>> Hi Ximin,
>>
>> On Mon, Jun 19, 2017 at 04:08:32PM +0200, Ximin Luo wrote:
>>> Since 83 diffoscope supports a --json format output that can be read back into
>>> another instance of diffoscope later. This allows people to experiment with
>>> generating different format outputs, using different settings such as higher
>>> limits, or even their local custom presenters that they haven't released yet,
>>> testing them with "real" output that is hard to make up manually.
>>
>> I fear this might take a little while, we're currently a bit limited
>> on diskspace:
>>
>>          * h01ger wonders if we can compress all those diffoscope .txt and .html files…
>> < mapreri> the .txt, definitely
>> < mapreri> .html… I fear apache needs teaching for those
>> <  h01ger> 72G     dbd
>> <  h01ger> 58G     dbdtxt
>> <  h01ger> out of 190G used in total…
>> <  h01ger> adding buster was quite expensive :)
>> < mapreri> and infinity0 asked for json too now /o\
>> < mapreri> actually, I'm kind of surprised we're not compressing the txt already
>> <  h01ger> definitly first should compress the above
>>
>> …and that partition is "only" 220GB in size…
>>
> 
> Compressing text and json without worrying about apache would definitely be fine for me. I don't think anyone is reading those, or would want to read those, in their web browser. The primary purpose of the json is to be downloaded and read back into diffoscope anyways.
> 

xz compression would save even more disk space. For diffoscope we probably want to pass in a bigger dictionary size, since some diffs are big and the default xz settings don't work on those.

With the GCC-6 diff I recently did, I experienced:

xz --lzma2=dict=1536MiB -9 # compresses the best, but very very slow
lrzip -L9                  # compresses slightly worse than the previous, but very quick
xz --lzma2=dict=1536MiB -2 # similar to lrzip

The nice thing about lrzip is it will auto-adjust dictionary size based on available RAM, but ofc users have to install a non-standard tool to decompress it. 1536MiB is the max size for xz.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git