Bug#779937: GDP shouldn't process all yaml files all the time

Simon McVittie smcv at debian.org
Fri Mar 13 00:09:12 UTC 2015


On 09/03/15 19:25, Alexandre Detiste wrote:
[smcv wrote:]
>> Perhaps if we profiled what takes all the time, we'd be able to avoid it
>> - e.g. by "compiling" the parts of the YAML files that are actually
>> needed for the argument parser into a pickle file or something, and
>> loading the rest lazily.
>
> Do you mean compiling it during the package build or by postinst
> script like the .pyc files ? I guess this pickle format is
> arch-independant.

So it turns out the default YAML parser in python3-yaml uses a
pure-Python implementation. Oddly enough, that's really slow. Switching
from Loader to CLoader (which uses libyaml) speeds it up by a factor of
about 20 in a simple microbenchmark (load quake2.yaml repeatedly).

Next idea: YAML is pretty complicated, JSON is much simpler while still
human-editable, maybe we can use that? That turns out to be simple to
do, and is another factor of 20 speedup. I'll push the results shortly.

I'd like to keep using YAML for the source files, because it's a lot
more pleasant to write (standard JSON doesn't even tolerate a trailing
comma after the last item of a list or the last pair in the map, which
is really really irritating) - but we can easily convert it to JSON as a
preprocessing step during 'make', which is what I've done

Another fun fact is that (according to other people's benchmarks, I
didn't try it myself) pickled data is both slower and larger than JSON.

>> parts of the YAML files that are actually needed
>
> That's only the package name, aliases, longname & demo_for tags.

I think a factor of 400 should be enough to keep your RPi happy until we
add quite a lot more games :-)

If we need more, I suspect kicking out the md5sums etc. into separate
files (probably in plain md5sums format, or whatever), and loading them
"lazily", would get us another significant speedup. File lists could
maybe be kicked out too; I expect the rest of the metadata for games,
packages and CD tracks is small enough to not matter either way.
Combining all the remaining metadata into one big JSON file (perhaps a
map from game name to data) might also be good.

    S



More information about the Pkg-games-devel mailing list