Bug#604093: Bug #604093: python-debian: iter_paragraphs: be more robust against RFC822 comments in unicoded files -- swallows the trailer
Stuart Prescott
stuart at debian.org
Mon Jul 28 14:51:14 UTC 2014
Hi Yaroslav,
A while ago, you reported a bug against python-debian's handling of comments
in deb822.
In that bug, the following input was used to illustrate the problem:
> $> cat confuse.txt
> Goodone: value0
>
> ; xxx might be unicode: Ярик
>
> Entry: value
[...]
> or may be comments shouldn't be detached from paragraphs according to
> rfc822? (in any case the divergence between unicode/plain handling is
> sub-optimal)
The only place where comments are defined in deb822 files that I can find is in
policy §5.1 and they are only defined for debian/control files and comments are
started by a # at the start of the line.
The difference you observed due to changing the encoding is actually due to
changing which parser was used inside python-debian -- when passed a real
filehandle, iter_paragraphs currently tries to use apt's TagFile parser. When
parsing the input with an encoding, a real filehandle is not provided to
iter_paragraphs and so the internal parser is used instead. If
use_apt_pkg=False is added to the call to iter_paragraphs or if the file is
first slurped in with readlines(), then the internal parser is used. The
difference here is not really unicode/plain handling but whether a filehandle
was passed or not.
I'm coming to the conclusion that use_apt_pkg=False should be the default for
a variety of reasons -- the remaining question is then whether the detached
and syntactically invalid comment should cause the parser to bail out or
somehow continue past it. Currently, the parser sees this as a null paragraph
which (incorrectly?) triggers its end of iteration condition.
cheers
Stuart
--
Stuart Prescott http://www.nanonanonano.net/ stuart at nanonanonano.net
Debian Developer http://www.debian.org/ stuart at debian.org
GPG fingerprint 90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7
More information about the pkg-python-debian-maint
mailing list