[Piuparts-devel] get_files_owned_by_packages

Niels Thykier niels at thykier.net
Wed May 15 06:34:00 BST 2019


Herbert Fortes:
> Hi,
> 
> I did a refactor to get_files_owned_by_packages[0]. I did
> 5 versions.
> 
> [0] - https://salsa.debian.org/debian/piuparts/blob/develop/piuparts.py#L1661
> 
> The best version for the programmer is the one with
> pathlib and dict.setdefault:
> 
> vdir = Path("var/lib/dpkg/info")
> vdict = {}
> 
> for basename in vdir.glob("*.list"):
>     for line in basename.read_text().split("\n"):
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This smells a lot like it will read the whole file into memory, then
split it into lines (i.e. have the whole file in memory twice for a
short while) and the looping over it.

I suspect the old one iterated over the lines one-by-one via buffered
reads.  This /might/ explain the performance difference you see.


I suspect that something like:

"""
  with open(path) as fd:
    for line in fd:
      ...
"""

Will be considerably faster if /var/.../info has a non-trivial size
(though I am unsure if the "basename" variable from your example can be
passed to open)


>         vdict.setdefault(line.strip(), []).append(basename.stem)
>         
> del vdict['']
> return vdict
> 
> But it costs a lot.
> 
> orig:     30.34737383400352 segundos (100 loops).
> path_obj: 73.14698908800347 segundos (100 loops).
> 
> To a pkg maintainer it is not a big problem, tough.
> 
> That would be worth?
> 
> 
> 
> Regards,
> Herbert
> 
> _[...]

Thanks,
~Niels




More information about the Piuparts-devel mailing list