[Reproducible-builds] generating reproducible ISOs with xorriso

Thomas Schmitt scdbackup at gmx.net
Fri Jun 5 14:57:38 UTC 2015


Hi,

About the --sort-weight-list approach which is possible with
already released xorriso versions:

> (find . -type f -print0 | xargs -0 md5sum | sort | cut -f2- -d/ ; find .
> -mindepth 1 \! -type f | sort | cut -f2- -d/ ) | awk '{ N=N+1; print N " "
> $0 }'

I misunderstood the role of md5sum here. Actually it seems
surplus. Why not just sort the paths ? That would be enough to
give awk a reproducible input sequence.

Ok. The risk of a random collision is avoided and 2 billion
files is not a severe limitation. (But the hardlinks ...)

xorriso will not understand the "\n" which md5sum substitutes
for newline characters in filenames. So trying to process such
filenames will not be reliably reproducible and throw errors:
  xorriso : FAILURE : Cannot find path 'a\nb' in loaded ISO image
One would have to set before -as mkisofs:
  -abort_on fatal
in order to avoid a premature end of the program run.
The attribution of weights would stop in any case.

There is no need to attribute weight to directories.
It applies only to the content source objects of regular files.
("Regular file" in the ISO, not necessarily on hard disk).

So how about this:

   if test $(find . -name '*'$'\n''*' | wc -w) -gt 0
   then
     echo "FOUND FILENAMES WITH NEWLINES UNDERNEATH $(pwd)" >&2
     exit 1
   fi

   find . -type f -print | \
     sort | cut -f2- -d/ | awk '{ N=N+1; print N " " $0 }'


----------------------------------------------------------------
About improved reproducability by default in future xorriso:

Extent location of regular files:

The question was:
If i sort the hardlink-merged IsoFileSrc according to
a ISO 9660 directory tree traversal, will the sequence be
reproducible for trees with identical file names and
attributes ?

I now verified that the directories get sorted according
to their ISO 9660 names. The process of name collision
resolution (mangling) is complicated but depends only on
the user defined input names and their sequence. Name sorting
happens before mangling and afterwards.
(libisofs/ecma119_tree.c funtions ecma119_tree_create(),
 sort_tree(), mangle_tree(), qsort(3) in mangle_single_dir())
So there should be no permutations of identical name lists
possible.

Extent location of directories:

Looks already reproducible.
They get stored after volume descriptors but not before block 32.
(The extent address of the root directory can be read as little
 endian 32 bit number from byte 32924 to 32927 of the ISO.
 ECMA-119 8.4.18 and 9.1.3)
The production of extents traverses the sorted ISO tree.
(libisofs/ecma119.c function write_dirs())
The size of a directory extent depends on name lengths and
attributes of the files inside the directory.

Then there are the Path Tables (nobody reads them):

Looks already reproducible.
The sequence of entries is determined by an array pathlist[]
which gets filled by traversal of the sorted ISO tree.
(libisofs/ecma119.c function write_path_tables())

----------------------------------------------------------------

So i will go for the reproducible array of IsoFileSrc in
libisofs/filesrc.c function filesrc_writer_pre_compute().
The red-black tree shall merge hardlinks but not define
the sequence of data file extents.

This can last a few days. I will give a note to the lists
when the GNU xorriso-1.4.1 development tarball is worth a
test.


Have a nice day :)

Thomas
 



More information about the Reproducible-builds mailing list