Bug#604093: Bug #604093: python-debian: iter_paragraphs: be more robust against RFC822 comments in unicoded files -- swallows the trailer

Stuart Prescott stuart at debian.org
Mon Jul 28 14:51:14 UTC 2014


Hi Yaroslav,

A while ago, you reported a bug against python-debian's handling of comments 
in deb822.

In that bug, the following input was used to illustrate the problem:

> $> cat confuse.txt 
> Goodone: value0
> 
>  ; xxx might be unicode: Ярик
> 
> Entry: value

[...]

> or may be comments shouldn't be detached from paragraphs according to
> rfc822? (in any case the divergence between unicode/plain handling is
> sub-optimal)

The only place where comments are defined in deb822 files that I can find is in 
policy §5.1 and they are only defined for debian/control files and comments are 
started by a # at the start of the line.

The difference you observed due to changing the encoding is actually due to 
changing which parser was used inside python-debian -- when passed a real 
filehandle, iter_paragraphs currently tries to use apt's TagFile parser. When 
parsing the input with an encoding, a real filehandle is not provided to 
iter_paragraphs and so the internal parser is used instead. If 
use_apt_pkg=False is added to the call to iter_paragraphs or if the file is 
first slurped in with readlines(), then the internal parser is used. The 
difference here is not really unicode/plain handling but whether a filehandle 
was passed or not.

I'm coming to the conclusion that use_apt_pkg=False should be the default for 
a variety of reasons -- the remaining question is then whether the detached 
and syntactically invalid comment should cause the parser to bail out or 
somehow continue past it. Currently, the parser sees this as a null paragraph 
which (incorrectly?) triggers its end of iteration condition.

cheers
Stuart


-- 
Stuart Prescott    http://www.nanonanonano.net/   stuart at nanonanonano.net
Debian Developer   http://www.debian.org/         stuart at debian.org
GPG fingerprint    90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7



More information about the pkg-python-debian-maint mailing list