Bug#750247: python-debian: deb822 wrong result when space in newline after paragraph
Stuart Prescott
stuart at debian.org
Tue Jun 3 14:58:21 UTC 2014
Hi apt and devscripts maintainers,
Tools parsing deb822-style documents currently disagree about where paragraphs
should be split within the document. The result is that wrap-and-sort from
devscripts currently eats control files if there are separator lines between
paragraphs that contain whitespace, while the rest of our build system accepts
whitespace-only lines as separators.
wrap-and-sort uses python-debian's deb822 module to work out what's in
debian/control so bugs about wrap-and-sort's behaviour here have ended up with
python-debian. I've sought to address this problem and attached is a patch for
python-debian that fixes it by explicitly including whitespace in python-
debian's test for new lines between paragraphs.
The problem with this patch is that python-debian's deb822 parser isn't
necessarily used by devscripts... python-debian will use python-apt's
apt_pkg.TagFile if it is available and it also has this same behaviour. If I
were to apply this patch to python-debian, there would be different behaviour
depending on whether python-apt is installed and whether the text being
interpreted is a file or a sequence/string:
* if python-apt is not installed or the text is not a file or the call to
iter_paragraphs includes "use_apt_pkg=False", then the code path in
deb822.py is used and the split is done correctly. Using the same test
case as in #655988 where the correct answer should be '3':
$ python -c 'import deb822; print len([p for p in
deb822.Deb822.iter_paragraphs(open("control-wspace"), use_apt_pkg=False)])'
3
$ python -c 'import deb822; print len([p for p in
deb822.Deb822.iter_paragraphs(open("control-wspace").readlines())])'
3
* if python-apt is installed and the text is in a file and "use_apt_pkg=False"
is not given, then libapt will *not* split the paragraphs as desired.
$ python -c 'import deb822; print len([p for p in
deb822.Deb822.iter_paragraphs(open("control-wspace"))])'
2
:(
So... we could change python-debian with the attached patch and *also* change
devscripts/control.py to pass "use_apt_pkg=False" and that would fix wrap-and-
sort.
However, making those two changes alone would mean that the use of
apt_pkg.TagFile is not just a performance boost for iter_paragraphs as
advertised in its documentation, it will also have different behaviour. I don't
think that's a particularly good idea. It's creating a horrible interface and
potentially making for very difficult debugging for other users of deb822.py.
python-apt maintainers: do you think it's reasonable to change apt_pkg.TagFile
(presumably by changing libapt-pkg) to split paragraphs not only on blank
lines but also on whitespace-only lines? For reference, policy §5.1 permits
such control files with pretty rubbery language:
The paragraphs are separated by empty lines. Parsers may accept lines
consisting solely of spaces and tabs as paragraph separators, but control
files should use empty lines.
I tend to err on the side of the parser being lax and the generator being
strict, which makes me think that both deb822.iter_paragraphs and
apt_pkg.TagFile should split on these whitespace-only lines.
cheers
Stuart
--
Stuart Prescott http://www.nanonanonano.net/ stuart at nanonanonano.net
Debian Developer http://www.debian.org/ stuart at debian.org
GPG fingerprint 90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Allow-whitespace-only-lines-to-separate-paragraphs.patch
Type: text/x-patch
Size: 2565 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-python-debian-maint/attachments/20140604/22067267/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.alioth.debian.org/pipermail/pkg-python-debian-maint/attachments/20140604/22067267/attachment.sig>
More information about the pkg-python-debian-maint
mailing list