Bug#1035704: proj: reproducible-builds: timezone-dependent timestamps in .gsb/.gtx files

Vagrant Cascadian vagrant at reproducible-builds.org
Tue May 9 07:07:08 BST 2023


On 2023-05-09, Sebastiaan Couwenberg wrote:
> On 5/9/23 05:47, Sebastiaan Couwenberg wrote:
>> On 5/8/23 22:43, Vagrant Cascadian wrote:
>>> On 2023-05-08, Sebastiaan Couwenberg wrote:
>>>> On 5/8/23 02:07, Vagrant Cascadian wrote:
>>>>> The attached patch fixes this by not touching these files during the
>>>>> build process.
>>>>
>>>>   From shar(1):
>>>>
>>>> "
>>>>     -m, --no-timestamp
>>>>            do not restore modification times.
>>>>
>>>>            Avoid generating 'touch' commands to restore the file
>>>>            modification dates when unpacking files from the archive.
>>>>
>>>>            When file modification times are not preserved, project build
>>>>            programs like "make" will see built files older than the 
>>>> files
>>>>            they get built from.  This is why, when this option is not
>>>>            used, a special effort is made to restore timestamps.
>>>> "
>>>>
>>>> That should be used when generating the archives instead of your patch
>>>> to not have the mtimes touched when unpacking the archives.
>>>
>>> Is it actually a problem to allow dpkg to normalize the timestamps on
>>> these files rather than forcefully setting them to ... a value from a
>>> shar archive? It is perhaps a naive question; I really do not know.
>> 
>> Where does dpkg normalize the timestamps?

I thought it did as part of dpkg-deb, from the dpkg-deb.1 manpage:

       SOURCE_DATE_EPOCH
           If set, it will be used as the timestamp (as seconds since
           the epoch) in the deb(5)'s ar(5) container and used to clamp
           the mtime in the tar(5) file entries.

The other adjacent files appear to use a timestamp consistent with the
last debian/changelog entry:

  -rw-r--r--···0·root·········(0)·root·········(0)·····1097·2022-12-01·08:50:03.000000·./usr/share/proj/CH

Which is what the dpkg-buildpackage.1 manpage says is used to
SOURCE_DATE_EPOCH...

       SOURCE_DATE_EPOCH
           This variable is set to the Unix timestamp since the epoch of
           the latest entry in debian/changelog, if it is not already
           defined.


>> shar sets the timestamps when the archive is unpacked before the package 
>> built starts.
>> 
>> Some of the files in the diffoscope-results are only installed in 
>> proj-data and not used otherwise during the build.
>> 
>>   * BETA2007.gsb is used in test/gie/DHDN_ETRS89.gie
>> 
>>   * CHENYX06.gsb/CHENYX06_etrs.gsb/CHENYX06a.gsb are only installed
>> 
>>   * egm96_15.gtx is used in test/gie/deformation.gie,
>>     test/gie/more_builtins.gie, test/gie/4D-API_cs2cs-style.gie, and
>>     test/cli/testdatumfile
>> 
>>   * ntf_r93.gsb is used in test/gie/more_builtins.gie,
>>     test/gie/4D-API_cs2cs-style.gie, and test/cli/testdatumfile
>> 
>>   * nzgd2kgrid0005.gsb is used in unit tests

This seems like a strong lead, but I would expect the test suite to run
(and thus modify the timestamps) before dpkg-deb sets the timestamps on
the built packages... so I am still a bit perplexed, but probably just
misunderstanding exactly what happens when and where. :)


>>>> But the diffoscope-results only show a few of the files from the shar
>>>> archives with different mtimes, and all of them get touched when
>>>> unpacking the archive just before the configure target is executed.
>>>
>>> Well, I too was perplexed why other files were not affected, but it is
>>> consistently those .gsb and .gtx files, and the submitted patch allows
>>> to consistently build reproducibly in the dozens of times I've build
>>> it...
>>>
>>>
>>>> shar actually makes the timestamps reproducible by always using the
>>>> mtime recorded in the archive instead of the time the files were 
>>>> unpacked.
>>>>
>>>> Something else is likely changing the mtime after the files are
>>>> unpacked. Some of these grids are used by the testsuite for example.
>>>
>>> I will try to look into it further, but without really being familiar
>>> with the proj codebase (or even what proj itself does)... any additional
>>> very specific suggestions where to look next would definitely be
>>> appreciated!  :)
>> 
>> CMake's configure_file() is used to copy the .gsb & .gtx files from 
>> CMAKE_CURRENT_SOURCE_DIR to CMAKE_CURRENT_BINARY_DIR, that might be 
>> touching the mtimes too. See: data/CMakeLists.txt

Thanks, that is definitely worth taking a look at...


> Seeing how the mtimes are off by two hours, this could be the difference 
> between UTC and CEST.

For clarity, it is almost definitely timezone related, but actually
UTC-14 and UTC+12 (e.g. off by 26 hours), which is used for the TZ
variable on tests.reproducible-builds.org:

  ···83696·2018-02-22·07:28:23.000000·./usr/share/proj/BETA2007.gsb
  vs.
  ···83696·2018-02-21·05:28:23.000000·./usr/share/proj/BETA2007.gsb

Note the difference of date...


> The latter was in effect when the archives were 
> created:
>
>   $ grep "Made on" debian/datumgrids*.shar
>   debian/datumgrids-ch.shar:# Made on 2018-02-26 22:27 CET by <bas at anubis>.
>   debian/datumgrids.shar:# Made on 2018-09-15 20:13 CEST by <bas at anubis>.
>
> But why does it only affect the binary GSB & GTX files, and not also the 
> binary ntv1_can.dat file or text files like README.DATUMGRID and the 
> init files (the ones without extensions)?

That is the question, eh? :)

Will try to poke at it more tomorrow...


live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20230508/1f8e796e/attachment-0001.sig>


More information about the Pkg-grass-devel mailing list