[Piuparts-devel] get_files_owned_by_packages
Niels Thykier
niels at thykier.net
Wed May 15 06:34:00 BST 2019
Herbert Fortes:
> Hi,
>
> I did a refactor to get_files_owned_by_packages[0]. I did
> 5 versions.
>
> [0] - https://salsa.debian.org/debian/piuparts/blob/develop/piuparts.py#L1661
>
> The best version for the programmer is the one with
> pathlib and dict.setdefault:
>
> vdir = Path("var/lib/dpkg/info")
> vdict = {}
>
> for basename in vdir.glob("*.list"):
> for line in basename.read_text().split("\n"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This smells a lot like it will read the whole file into memory, then
split it into lines (i.e. have the whole file in memory twice for a
short while) and the looping over it.
I suspect the old one iterated over the lines one-by-one via buffered
reads. This /might/ explain the performance difference you see.
I suspect that something like:
"""
with open(path) as fd:
for line in fd:
...
"""
Will be considerably faster if /var/.../info has a non-trivial size
(though I am unsure if the "basename" variable from your example can be
passed to open)
> vdict.setdefault(line.strip(), []).append(basename.stem)
>
> del vdict['']
> return vdict
>
> But it costs a lot.
>
> orig: 30.34737383400352 segundos (100 loops).
> path_obj: 73.14698908800347 segundos (100 loops).
>
> To a pkg maintainer it is not a big problem, tough.
>
> That would be worth?
>
>
>
> Regards,
> Herbert
>
> _[...]
Thanks,
~Niels
More information about the Piuparts-devel
mailing list