Bug#913274: Incorrectly parsing whitespace in Sources.iter_paragraphs
furlongm at gmail.com
Wed Nov 14 06:31:42 GMT 2018
> > Passing the contents does the correct thing in all other cases, so not
> > sure why it would be having an issue with this?
> TagFile only accepts filehandles, not static data:
> In deb822.py there is a function _is_real_file() and that is used so that
> python-apt's TagFile is only invoked on filehandles and not on text data,
> diverting to the in-built parser when TagFile cannot be used.
So in my case, the in-built parser is being used and it is stricter
than python-apt's parser?
> BTW if you are read()ing so that you can deal with the compressed Pacakges.gz,
> TagFile can handle on-the-fly decompression.
> In : from debian.deb822 import Packages
> In : with open('Packages.gz') as fh:
> ...: for p in Packages.iter_paragraphs(fh):
> ...: if 'version' not in p:
> ...: print(p)
> (wild guess as to why you might be doing this!)
One reason I'm doing it is for decompression, but a second reason is
to provide feedback to the user via a progress bar. To do that, the
Packages is downloaded, decompressed, packages counted via regex, then
I have tried to do this in a generic way, as the code also handles yum
and yast repos. These mostly follow the same logic; download package
list, decompress package list, count occurrences, parse.
Looking at the git blame, most of that code has been working fine for
6+ years. This is the first time I've ever come across a repo with
whitespace in that section.
> I've been thinking that in cases where iter_paragraphs was called with
> use_apt_pkg=True and then apt_pkg is not used contrary to what was requested,
> iter_paragraphs should generate a warning. That risks becoming noisy in a way
> that is not desirable, but also perhaps gets us away from this ambiguous
> behaviour where the use_apt_pkg setting has been ignored.
> I wonder what the likelihood is that introducing a warning would break someone
> else's code? (It would break an autopkgtest, for instance, by writing to
If other tools/libraries are more tolerant, including python-apt,
would it make sense for python-debian to be more tolerant when using
the in-built parser? In that case, the two parser implementations
would be more consistent.
More information about the pkg-python-debian-maint