python-debian, wrap-and-sort, paragraphs separators and comments

Stuart Prescott stuart at debian.org
Wed Jul 30 14:50:58 UTC 2014


Dear python-debian maintainers,

There have been two on-going sore points in deb822 handling:

* maintainers accidentally have whitespace in the blank line separating 
paragraphs in their d/control files. Policy discourages this but permits it and 
dpkg handles it ok so they don't notice until wrap-and-sort deletes entire 
packages from d/control on them.

* maintainers are permitted to use comments (# at the start of a line) within 
d/control but if they do so in the middle of a multi-line field, the field is 
then truncated and information would be lost by wrap-and-sort.

(I'm highlighting wrap-and-sort here, but other parsers of the deb822 format 
like autopkgtest , piuparts and various maintainer tools like 
debian/autodeps.py will also be affected)

We have two different deb822 parsers exposed within python-debian -- apt's 
TagFile has been used whenever possible (on real files only, not on other 
iterables), and an "internal" parser that just uses some regular expressions 
in python. These two sets of bugs come from apt's TagFile being a fast but 
strict parser that doesn't support these extensions because it was neither 
required nor desirable for it to do so. For most uses of deb822's 
iter_paragraphs such as wrap-and-sort, the speed improvement of using TagFile 
isn't needed and the strictness that is imposed by TagFile is unhelpful.

My proposal is then:

* ensure the internal parser supports whitespace and comments as required

* switch the default parser in iter_paragraphs to the internal parser 
(changing the kwarg to be use_apt_pkg=False) and clearly document the 
situations where an application might safely choose to use TagFile 
(use_apt_pkg=True).

* provide convenience classmethods for in the Packages and Sources classes 
that do default to using TagFile since these files should be syntactically 
strict and performance is important here.

The potential downside of this is that code that uses Deb822.iter_paragraphs 
on big files and has not already explicitly chosen use_apt_pkg=True will suffer 
a performance hit (although it will still work). The only example of this in 
the archive that I can see is in apt-xapian-index:

	deb822.Deb822.iter_paragraphs(open("/var/lib/debtags/vocabulary", "r")):

(I'll cheerfully send a patch for this)

Code that is already using Packages.iter_paragraphs or Sources.iter_paragraphs 
will be unaffected (this works because iter_paragraphs is an inherited 
classmethod but I can't see any examples of it in the archive). 

On the upside, the two bugs (plus their many dupes) python-debian has 
inherited from wrap-and-sort will be fixed.

The two patches are attached or, if you'd rather see the context, 

  git clone ssh://git.nanonanonano.net/srv/git/python-debian.git -b paragraphs

(if I update the patches based on comments, I can't promise not to rewrite 
history on that branch!)


I'd very much like some feedback on these proposed changes rather than just 
ending in a consensual silence ;)

cheers
Stuart


-- 
Stuart Prescott    http://www.nanonanonano.net/   stuart at nanonanonano.net
Debian Developer   http://www.debian.org/         stuart at debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Allow-whitespace-only-lines-to-separate-paragraphs.patch
Type: text/x-patch
Size: 2569 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-python-debian-maint/attachments/20140731/c7deb842/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Switch-default-to-internal-parser-for-deb822.patch
Type: text/x-patch
Size: 10422 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-python-debian-maint/attachments/20140731/c7deb842/attachment-0001.bin>


More information about the pkg-python-debian-maint mailing list