[Reproducible-builds] generating reproducible ISOs with xorriso

Thomas Schmitt scdbackup at gmx.net
Fri Jun 5 08:24:08 UTC 2015


Hi,

> > red-black tree
> it makes sense why it is this way, and it also
> makes sense why this does not look good for reproducibility. :/

It is slightly overdesigned. :))
A sorted and deduplicated array would do the same.


> > Brute force would be a giant weight list
> well, -sort-weight-list is an option designed to do exactly that;

I never thought of tenthousands of files in there.
The implementation is not optimized for good speed.

I am currently thinking towards keeping the red-black tree
for hardlink detection but to not take the array from
that tree. Instead the array would be produced by traversing
the ISO 9660 filesystem tree and skipping the files of which
the IsoFileSrc object was already put into the array by a
previous node in the tree.

I must still verify whether libisofs properly fulfills the
sorting demands of ECMA-119 9.3 "Order of Directory Records".
If so, then we would get to the situation which i initially
assumed to be true.

This plan would cost just one bit per IsoFileSrc object but in
worst case inflate the object size by 4 or 8 bytes due to
alignment constraints.
(IsoFileSrc is a typedef of struct Iso_File_Src.
 See libisofs/filesrc.h)

A general advantage would be that the extents of the files
of a directory are stored as neighbors. So copy operations
from the mounted ISO medium could need less head movements.

Another advantage would be that this can be performed
unconditionally. No need to introduce new options.


>  * it says each line in the file is a weight and an iso_rr_path.
> is the weight always interpreted as decimal [...] ?

Currently it's a sscanf(..., "%d", ...).
(In xorriso-1.4.0/xorriso/opts_d_h.c, line 1168.)

> (the latter would be easier to convert the output of an md5sum file

Weight is only a signed 32 bit value. You have to expect the
first random collision around 65000 files.
Further, the idea of a content based sequence of extents
suffers from the systematic risk that two files may well
have the same content. Especially if they are small.

But i realize you propose to checksum the paths, not the
content. (I'm not sure yet whether this would work
properly for hardlinks.)


> is the iso_rr_path the raw ISO name ?

No, it is normally the Rock Ridge name in the ISO. Usually
the original file name on the hard disc with a directory
path which depends on the xorriso command that inserted the
file into the emerging ISO.
To be exacting: It is the name which you give the file inside
the ISO resp. which xorriso loaded from an imported ISO.
(If the loaded ISO has neither Rock Ridge nor Joliet names,
 tphen iso_rr_name is indeed the ISO 9660 name.)
E.g. if you do in -as mkisofs emulation:
  -graft-points /a/b/c=$HOME/my_file
then the iso_rr_path is /a/b/c.


>  * it says "if iso_rr_path leads to a directory then all regular files
>   underneath will get the weight number" -- what if the regular files
>   themselves are specified?
>  * if a file has a weight specified multiple times, which specified
>    weight "wins" -- first or last?

The outcome depends on the sequence of paths in the sort file.
The last line will prevail.


------------------------------------------------------------------
About the remaining deviation in your test:

>  00009840  00 00 00 00 00 00 00 00  00 00 00 00 00 54 46 1a
> |.............TF.|
>  00009850  01 0e 73 06 04 16 39 1e  00 73 06 04 16 39 1e 00
> |..s...9..s...9..|
> -00009860  73 06 04 17 1d 04 00 43  45 1c 01 14 00 00 00 00
> |s......CE.......|
> +00009860  73 06 04 17 1d 0b 00 43  45 1c 01 14 00 00 00 00
> |s......CE.......|

That would probably be a Rock Ridge TF entry, which records
the POSIX timestamps of a file.
The differing byte would then be the seconds part of the ctime.
  2015 Jun 04 23:29:04
versus
  2015 Jun 04 23:29:11
(0x73 = 115 , plus 1900 yields 2015)


It seems that my proposal was incomplete:
  -alter_date_r b $timestamp / --
sets mtime and atime to $timestamp but ctime to the current
time.

So we rather need to set ctime explicitely after it was
implicitely set to current time:
  args="$args -alter_date_r b $timestamp / --"
  args="$args -alter_date_r c $timestamp / --"

xorriso's native commands get into effect strictly sequential.
Like commands in a loop-free shell script. So the second
-alter_date_r overrides the side effects of the first one.


New mail:
> [impressive investigation of Rock Ridge, ECMA-119 and our svn]
> http://libburnia-project.org/changeset/5190

Yes. You will need that for setting ctime.
Released with version 1.3.4.

Debian has packages of version 1.3.2, but debian-cd already
uses GNU xorriso 1.3.6.pl01 for ISO production.
Regrettably George Danchev, the DD who maintains libburnia,
is distracted by real life and looks for a successor.


Any DD or DM here, who has the time to make the move ?
Afaik, Debian has packages for libburn, libisofs, libisoburn,
cdrskin, and xorriso.


Have a nice day :)

Thomas




More information about the Reproducible-builds mailing list