Bug#838188: ocaml: temporary preprocessed file paths make the ocaml compiler produce unreproducible output

Johannes Schauer josch at debian.org
Sun Sep 18 08:51:54 UTC 2016


Source: ocaml
Version: 4.02.3-7.1
Severity: wishlist
Tags: patch
User: reproducible-builds at lists.alioth.debian.org
Usertags: toolchain randomness

Hi,

currently, ocaml embeds the file paths of temporary files that a
preprocessor created into the debug output. This makes several source
packages in Debian unreproducible. To see the effect, look for example
at this diffoscope output of src:botch:

│   ├── data.tar.xz
│   │   ├── data.tar
│   │   │   ├── ./usr/lib/debug/.build-id/03/28382a2670552f3318cc61bdebc13bbeef8f2f.debug
│   │   │   │   ├── readelf --wide --symbols {}
│   │   │   │   │ @@ -56,15 +56,15 @@
│   │   │   │   │      52: 0000000000830838     0 NOTYPE  LOCAL  DEFAULT   25 caml_startup__9
│   │   │   │   │      53: 0000000000830868     0 NOTYPE  LOCAL  DEFAULT   25 caml_startup__10
│   │   │   │   │      54: 0000000000830898     0 NOTYPE  LOCAL  DEFAULT   25 caml_startup__11
│   │   │   │   │      55: 00000000008308c8     0 NOTYPE  LOCAL  DEFAULT   25 caml_startup__12
│   │   │   │   │      56: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS std_exit.ml
│   │   │   │   │      57: 00000000005c4430     0 NOTYPE  LOCAL  DEFAULT   15 caml_negf_mask
│   │   │   │   │      58: 00000000005c4440     0 NOTYPE  LOCAL  DEFAULT   15 caml_absf_mask
│   │   │   │   │ -    59: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /tmp/ocamlpp29daa7
│   │   │   │   │ +    59: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /tmp/ocamlpp4dfb7e
│   │   │   │   │      60: 00000000005c4450     0 NOTYPE  LOCAL  DEFAULT   15 caml_negf_mask
│   │   │   │   │      61: 00000000005c4460     0 NOTYPE  LOCAL  DEFAULT   15 caml_absf_mask
│   │   │   │   │      62: 0000000000836558     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__30
│   │   │   │   │      63: 0000000000836570     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__31
│   │   │   │   │      64: 0000000000836588     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__32
│   │   │   │   │      65: 00000000008365c0     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__2
│   │   │   │   │      66: 0000000000836668     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__5
│   │   │   │   │ @@ -87,15 +87,15 @@
│   │   │   │   │      83: 00000000008367e0     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__23
│   │   │   │   │      84: 00000000008367f8     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__24
│   │   │   │   │      85: 0000000000836810     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__25
│   │   │   │   │      86: 0000000000836820     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__26
│   │   │   │   │      87: 0000000000836868     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__27
│   │   │   │   │      88: 0000000000836880     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__28
│   │   │   │   │      89: 00000000008368c8     0 NOTYPE  LOCAL  DEFAULT   25 camlAnnotate$2dstrong__29
│   │   │   │   │ -    90: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /tmp/ocamlpp21639f
│   │   │   │   │ +    90: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /tmp/ocamlppfd0623
│   │   │   │   │      91: 00000000005c4470     0 NOTYPE  LOCAL  DEFAULT   15 caml_negf_mask
│   │   │   │   │      92: 00000000005c4480     0 NOTYPE  LOCAL  DEFAULT   15 caml_absf_mask
│   │   │   │   │      93: 0000000000836fc0     0 NOTYPE  LOCAL  DEFAULT   25 camlSrcGraphExtras__43
│   │   │   │   │      94: 0000000000836fd8     0 NOTYPE  LOCAL  DEFAULT   25 camlSrcGraphExtras__44
│   │   │   │   │      95: 0000000000836ff0     0 NOTYPE  LOCAL  DEFAULT   25 camlSrcGraphExtras__45
│   │   │   │   │      96: 0000000000837008     0 NOTYPE  LOCAL  DEFAULT   25 camlSrcGraphExtras__46
│   │   │   │   │      97: 0000000000837028     0 NOTYPE  LOCAL  DEFAULT   25 camlSrcGraphExtras__9

I see two ways to fix this problem.

 - instead of choosing a random temporary file name for the preprocessor
   output, choose a stable file name

 - do not include the path to the temporary file created by the
   preprocessor in the debug information

I like the latter option because knowing this path is useless anyway
because the file is only temporary. Unfortunately, I was unable to
figure out a good way to implement this solution.

So instead, I implemented a solution that calculates the path of the
temporary files from the MD5 sum of the preprocessor name and the input
file path. The idea is, that running the same preprocessor on the same
file path should produce the same output and thus choosing the same
filename should not pose any problem. I chose to calculate a hash
instead of using the bare string values because the file paths contain
characters like the slash which must not appear in file names and also
because it allows a stable temporary filename length no matter the
length of the input path.

Here is the patch:

--- a/driver/pparse.ml
+++ b/driver/pparse.ml
@@ -19,9 +19,17 @@ type error =
 exception Error of error
 
 (* Optionally preprocess a source file *)
+external open_desc: string -> open_flag list -> int -> int = "caml_sys_open"
+external close_desc: int -> unit = "caml_sys_close"
 
 let call_external_preprocessor sourcefile pp =
-      let tmpfile = Filename.temp_file "ocamlpp" "" in
+      (* do not use Filename.temp_file as the resulting temporary file name will be
+       * recorded in the debug output of the resulting binary and thus make the
+       * output random and unreproducible *)
+      let temp_dir = Filename.get_temp_dir_name () in
+      let hash = Digest.to_hex (Digest.string (sourcefile^pp)) in
+      let tmpfile = Filename.concat temp_dir ("ocamlpp"^hash) in
+      close_desc(open_desc tmpfile [Open_wronly; Open_creat; Open_excl] 0o600);
       let comm = Printf.sprintf "%s %s > %s"
                                 pp (Filename.quote sourcefile) tmpfile
       in

Applying this patch and rebuilding src:ocaml leads to src:botch becoming
reproducible.

I do not know whether the patch is suitable for inclusion into the
upstream project but I trust that you forward the issue accordingly.

Thanks!

cheers, josch



More information about the Reproducible-builds mailing list