Bug#913274: Incorrectly parsing whitespace in Sources.iter_paragraphs

Marcus Furlong furlongm at gmail.com
Wed Nov 14 06:31:42 GMT 2018


> > Passing the contents does the correct thing in all other cases, so not
> > sure why it would be having an issue with this?
>
> Ahah!
>
> TagFile only accepts filehandles, not static data:
>
> https://salsa.debian.org/apt-team/python-apt/blob/master/python/tag.cc#L750
>
> In deb822.py there is a function _is_real_file() and that is used so that
> python-apt's TagFile is only invoked on filehandles and not on text data,
> diverting to the in-built parser when TagFile cannot be used.

So in my case, the in-built parser is being used and it is stricter
than python-apt's parser?

> BTW if you are read()ing so that you can deal with the compressed Pacakges.gz,
> TagFile can handle on-the-fly decompression.
>
> In [1]: from debian.deb822 import Packages
>
> In [2]: with open('Packages.gz') as fh:
>    ...:     for p in Packages.iter_paragraphs(fh):
>    ...:         if 'version' not in p:
>    ...:             print(p)
>
> (wild guess as to why you might be doing this!)

One reason I'm doing it is for decompression, but a second reason is
to provide feedback to the user via a progress bar. To do that, the
Packages is downloaded, decompressed, packages counted via regex, then
parsed.

I have tried to do this in a generic way, as the code also handles yum
and yast repos. These mostly follow the same logic; download package
list, decompress package list, count occurrences, parse.

   https://github.com/furlongm/patchman/blob/master/patchman/repos/utils.py#L296-L334

Looking at the git blame, most of that code has been working fine for
6+ years. This is the first time I've ever come across a repo with
whitespace in that section.

> I've been thinking that in cases where iter_paragraphs was called with
> use_apt_pkg=True and then apt_pkg is not used contrary to what was requested,
> iter_paragraphs should generate a warning. That risks becoming noisy in a way
> that is not desirable, but also perhaps gets us away from this ambiguous
> behaviour where the use_apt_pkg setting has been ignored.
>
> I wonder what the likelihood is that introducing a warning would break someone
> else's code? (It would break an autopkgtest, for instance, by writing to
> stderr)

If other tools/libraries are more tolerant, including python-apt,
would it make sense for python-debian to be more tolerant when using
the in-built parser? In that case, the two parser implementations
would be more consistent.

--
Marcus Furlong



More information about the pkg-python-debian-maint mailing list