Bug#1052371: game-data-packager: a crash while rebuilding quake2-full-data deb package

Simon McVittie smcv at debian.org
Fri May 2 18:44:35 BST 2025


On Wed, 07 Feb 2024 at 12:42:01 +0100, Sébastien Noel wrote:
>A fix could be as simple as
>
>--- a/game_data_packager/build.py
>+++ b/game_data_packager/build.py
>@@ -903,7 +903,7 @@ class PackagingTask:
>         if provider is None:
>             try_to_unpack: Collection[str] = self.game.files
>             should_provide = set()
>-            distinctive_dirs = False
>+            distinctive_dirs = True
>         else:
>             try_to_unpack = set(f.name for f in
>provider.provides_files)
>             should_provide = set(try_to_unpack)
>
>
>but i'm not familiar enough with that part of the code to have a full
>view of the implications of that change...

I don't think that's the correct fix, but I've been able to identify a 
different change that I find easier to justify, so I pushed that 
instead.

>Context:
>- the file provided on the CLI is "unknown/untrusted" by g-d-p and so
>there isn't any "provider" for the stream and for that case,
>'distinctive_dirs' was set to False (what's the reasoning for that ?)

The origin of this seems to be commit ef4dc6c7 "zipfile: match on 
basename for unknown & quircky archives" [sic]. Originally, it was only 
for zip files, where the equivalent of this bug couldn't actually cause 
a crash, because zip files are seekable (although it might have led to 
some inefficiency, unpacking the same file more than once). Later, I 
refactored the same function into consider_stream(), in commit f7915ca5 
"Unify code to stream members from a TarFile or ZipFile". After that 
point, it *can* cause a crash when unpacking a non-seekable archive, 
like a tar file or (as in this case) the tar file embedded in a .deb.

What distinctive_dirs means is: if it's true (which is the default), we 
will assume that a file named "foo/bar/abc.bin" can only be found in a 
directory named "foo/bar", and if we see an "abc.bin" somewhere else, 
it's a different file. This is usually what we want: we know the layout 
that the directory tree will have, and there's no point in unpacking a 
file named something like directx/setup.exe in the hope that it might 
secretly be mygame/setup.exe.

However, if the user of g-d-p has passed an unknown archive file on the 
command-line, after ef4dc6c7 we will consider any "abc.bin" in any 
directory to be potentially the one we are looking for.

I don't know why this was done: the commit message doesn't say, and the 
change isn't part of a merge request. But I can guess: probably 
Alexandre had (or was thinking about) an unofficial archive containing 
files in a layout that was not the one we expect, for example:

     quake2-pak0.zip
      \- pak0.pak    (same content we expect to find in baseq2/pak0.pak)

and in that scenario, we will not successfully match pak0.pak to 
anything unless we are willing to disregard the path.

>- quake2 have multiple (different) gamefiles that have the same
>basename()

Not just that, but they are the same size, for example:

     31329     14d330fe2a9af05a6254b2c2bda9f384 baseq2/players/crakhor/a_grenades.md2
     ...
     31329     8b26b6a4863b7e2c30b4dcd7867a6d10 baseq2/players/cyborg/a_grenades.md2

This means that g-d-p cannot tell which one it's looking at by just 
looking at the metadata available in the tar file: to disambiguate, it 
needs to know either the path (which we have told it to ignore), or the 
content (which requires actually unpacking).

When I run

     /path/to/run-gdp-uninstalled \
     -d. --no-search --no-compress --debug \
     quake2 input/quake2-full-data_85_all.deb

with this temporary change

--- a/game_data_packager/build.py
+++ b/game_data_packager/build.py
@@ -945,6 +945,12 @@ class PackagingTask:
                      # proceed to next entry
                      continue

+                logger.debug(
+                    'Unpacking %s from %s because it might be %s',
+                    entry.name,
+                    name,
+                    wanted.name,
+                )
                  should_provide.discard(filename)

                  if filename in self.found:

it becomes obvious what the problem is:

DEBUG:game_data_packager.build:Unpacking ./usr/share/games/quake2/baseq2/players/crakhor/a_grenades.md2 from ref/quake2-full-data_85_all.deb because it might be baseq2/players/crakhor/a_grenades.md2
extracting ./usr/share/games/quake2/baseq2/players/crakhor/a_grenades.md2 from ref/quake2-full-data_85_all.deb
DEBUG:game_data_packager.build:found baseq2/players/crakhor/a_grenades.md2 at /tmp/gdptmp.sun12x3u/tmp/baseq2/players/crakhor/a_grenades.md2
DEBUG:game_data_packager.build:... matches baseq2/players/crakhor/a_grenades.md2
DEBUG:game_data_packager.build:Unpacking ./usr/share/games/quake2/baseq2/players/crakhor/a_grenades.md2 from ref/quake2-full-data_85_all.deb because it might be baseq2/players/cyborg/a_grenades.md2

... we are unpacking the same member of the archive more than once, 
which at best is inefficient, and at worst fails because the archive 
isn't seekable.

Instead, we should unpack it once, then loop through all the files that 
it might possibly be, and check whether it does in fact match any of 
them. That's what I've pushed.

While fixing this, I noticed that if we have more than one file with 
identical content (quake2 often does this for weapon.md2 and 
shotgun.md2), we would inefficiently unpack all of them, even though the 
first one we unpacked can be used as the data source for all of the 
duplicate copies. Now we only unpack the first, and skip the others.

     smcv



More information about the Pkg-games-devel mailing list