Bug#913274: Incorrectly parsing whitespace in Sources.iter_paragraphs

Stuart Prescott stuart at debian.org
Mon Dec 31 08:44:17 GMT 2018


Hi Marcus,

> So in my case, the in-built parser is being used and it is stricter
> than python-apt's parser?

That is correct.

> > BTW if you are read()ing so that you can deal with the compressed
> > Pacakges.gz, TagFile can handle on-the-fly decompression.
> > 
> > In [1]: from debian.deb822 import Packages
> > 
> > In [2]: with open('Packages.gz') as fh:
> >    ...:     for p in Packages.iter_paragraphs(fh):
> >    ...:         if 'version' not in p:
> >    ...:             print(p)
> > 
> > (wild guess as to why you might be doing this!)
> 
> One reason I'm doing it is for decompression, but a second reason is
> to provide feedback to the user via a progress bar. To do that, the
> Packages is downloaded, decompressed, packages counted via regex, then
> parsed.
> 
> I have tried to do this in a generic way, as the code also handles yum
> and yast repos. These mostly follow the same logic; download package
> list, decompress package list, count occurrences, parse.

I guess you could peek at Packages.gz first to get the total number and then 
feed Packages.gz to iter_packages. That way you get the real filehandle and 
also the significant speed benefits of using the apt_pkg parser rather than 
the internal one.

> > I've been thinking that in cases where iter_paragraphs was called with
> > use_apt_pkg=True and then apt_pkg is not used contrary to what was
> > requested, iter_paragraphs should generate a warning. That risks becoming
> > noisy in a way that is not desirable, but also perhaps gets us away from
> > this ambiguous behaviour where the use_apt_pkg setting has been ignored.
> > 
> > I wonder what the likelihood is that introducing a warning would break
> > someone else's code? (It would break an autopkgtest, for instance, by
> > writing to stderr)

To this end, I've just prepared:

https://salsa.debian.org/python-debian-team/python-debian/merge_requests/8

It should, in your case, at least make it visible that you have requested 
apt_pkg but that request was not/could not be honoured.

> If other tools/libraries are more tolerant, including python-apt,
> would it make sense for python-debian to be more tolerant when using
> the in-built parser? In that case, the two parser implementations
> would be more consistent.

The problem is that iter_paragraphs is used in situations where that construct 
should be a paragraph separator, such as in debian/control.

https://bugs.debian.org/715558   (and many duplicates)

Perhaps the internal parser needs a 'strict'ness parameter that controls this 
behaviour. I'll look at that next.

cheers
Stuart

-- 
Stuart Prescott    http://www.nanonanonano.net/   stuart at nanonanonano.net
Debian Developer   http://www.debian.org/         stuart at debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7



More information about the pkg-python-debian-maint mailing list