Bug#838188: ocaml: temporary preprocessed file paths make the ocaml compiler produce unreproducible output
Johannes Schauer
josch at debian.org
Sun Sep 18 08:51:54 UTC 2016
Source: ocaml
Version: 4.02.3-7.1
Severity: wishlist
Tags: patch
User: reproducible-builds at lists.alioth.debian.org
Usertags: toolchain randomness
Hi,
currently, ocaml embeds the file paths of temporary files that a
preprocessor created into the debug output. This makes several source
packages in Debian unreproducible. To see the effect, look for example
at this diffoscope output of src:botch:
│ ├── data.tar.xz
│ │ ├── data.tar
│ │ │ ├── ./usr/lib/debug/.build-id/03/28382a2670552f3318cc61bdebc13bbeef8f2f.debug
│ │ │ │ ├── readelf --wide --symbols {}
│ │ │ │ │ @@ -56,15 +56,15 @@
│ │ │ │ │ 52: 0000000000830838 0 NOTYPE LOCAL DEFAULT 25 caml_startup__9
│ │ │ │ │ 53: 0000000000830868 0 NOTYPE LOCAL DEFAULT 25 caml_startup__10
│ │ │ │ │ 54: 0000000000830898 0 NOTYPE LOCAL DEFAULT 25 caml_startup__11
│ │ │ │ │ 55: 00000000008308c8 0 NOTYPE LOCAL DEFAULT 25 caml_startup__12
│ │ │ │ │ 56: 0000000000000000 0 FILE LOCAL DEFAULT ABS std_exit.ml
│ │ │ │ │ 57: 00000000005c4430 0 NOTYPE LOCAL DEFAULT 15 caml_negf_mask
│ │ │ │ │ 58: 00000000005c4440 0 NOTYPE LOCAL DEFAULT 15 caml_absf_mask
│ │ │ │ │ - 59: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlpp29daa7
│ │ │ │ │ + 59: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlpp4dfb7e
│ │ │ │ │ 60: 00000000005c4450 0 NOTYPE LOCAL DEFAULT 15 caml_negf_mask
│ │ │ │ │ 61: 00000000005c4460 0 NOTYPE LOCAL DEFAULT 15 caml_absf_mask
│ │ │ │ │ 62: 0000000000836558 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__30
│ │ │ │ │ 63: 0000000000836570 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__31
│ │ │ │ │ 64: 0000000000836588 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__32
│ │ │ │ │ 65: 00000000008365c0 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__2
│ │ │ │ │ 66: 0000000000836668 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__5
│ │ │ │ │ @@ -87,15 +87,15 @@
│ │ │ │ │ 83: 00000000008367e0 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__23
│ │ │ │ │ 84: 00000000008367f8 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__24
│ │ │ │ │ 85: 0000000000836810 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__25
│ │ │ │ │ 86: 0000000000836820 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__26
│ │ │ │ │ 87: 0000000000836868 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__27
│ │ │ │ │ 88: 0000000000836880 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__28
│ │ │ │ │ 89: 00000000008368c8 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__29
│ │ │ │ │ - 90: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlpp21639f
│ │ │ │ │ + 90: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlppfd0623
│ │ │ │ │ 91: 00000000005c4470 0 NOTYPE LOCAL DEFAULT 15 caml_negf_mask
│ │ │ │ │ 92: 00000000005c4480 0 NOTYPE LOCAL DEFAULT 15 caml_absf_mask
│ │ │ │ │ 93: 0000000000836fc0 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__43
│ │ │ │ │ 94: 0000000000836fd8 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__44
│ │ │ │ │ 95: 0000000000836ff0 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__45
│ │ │ │ │ 96: 0000000000837008 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__46
│ │ │ │ │ 97: 0000000000837028 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__9
I see two ways to fix this problem.
- instead of choosing a random temporary file name for the preprocessor
output, choose a stable file name
- do not include the path to the temporary file created by the
preprocessor in the debug information
I like the latter option because knowing this path is useless anyway
because the file is only temporary. Unfortunately, I was unable to
figure out a good way to implement this solution.
So instead, I implemented a solution that calculates the path of the
temporary files from the MD5 sum of the preprocessor name and the input
file path. The idea is, that running the same preprocessor on the same
file path should produce the same output and thus choosing the same
filename should not pose any problem. I chose to calculate a hash
instead of using the bare string values because the file paths contain
characters like the slash which must not appear in file names and also
because it allows a stable temporary filename length no matter the
length of the input path.
Here is the patch:
--- a/driver/pparse.ml
+++ b/driver/pparse.ml
@@ -19,9 +19,17 @@ type error =
exception Error of error
(* Optionally preprocess a source file *)
+external open_desc: string -> open_flag list -> int -> int = "caml_sys_open"
+external close_desc: int -> unit = "caml_sys_close"
let call_external_preprocessor sourcefile pp =
- let tmpfile = Filename.temp_file "ocamlpp" "" in
+ (* do not use Filename.temp_file as the resulting temporary file name will be
+ * recorded in the debug output of the resulting binary and thus make the
+ * output random and unreproducible *)
+ let temp_dir = Filename.get_temp_dir_name () in
+ let hash = Digest.to_hex (Digest.string (sourcefile^pp)) in
+ let tmpfile = Filename.concat temp_dir ("ocamlpp"^hash) in
+ close_desc(open_desc tmpfile [Open_wronly; Open_creat; Open_excl] 0o600);
let comm = Printf.sprintf "%s %s > %s"
pp (Filename.quote sourcefile) tmpfile
in
Applying this patch and rebuilding src:ocaml leads to src:botch becoming
reproducible.
I do not know whether the patch is suitable for inclusion into the
upstream project but I trust that you forward the issue accordingly.
Thanks!
cheers, josch
More information about the Reproducible-builds
mailing list