Bug#750247: python-debian: deb822 wrong result when space in newline after paragraph

David Kalnischkies david at kalnischkies.de
Sat Jun 7 17:41:28 UTC 2014


On Wed, Jun 04, 2014 at 12:58:21AM +1000, Stuart Prescott wrote:
> python-apt maintainers: do you think it's reasonable to change apt_pkg.TagFile
> (presumably by changing libapt-pkg) to split paragraphs not only on blank
> lines but also on whitespace-only lines? For reference, policy §5.1 permits
> such control files with pretty rubbery language:
>
>   The paragraphs are separated by empty lines. Parsers may accept lines
>   consisting solely of spaces and tabs as paragraph separators, but control
>   files should use empty lines.

(not python-, but apt "proper" ;) )

My reading is actually quiet different: First sentence. Period. Parsers
may be more relaxed, but do not expect it: control files should use
empty lines.

(aka: "should" not in a "as Mylord pleases", but in a "if you don't have
a damn good reason to do otherwise, follow my lead" as "Non-conformance
with … should … will generally be considered a bug" in §1.1)


> I tend to err on the side of the parser being lax and the generator being
> strict, which makes me think that both deb822.iter_paragraphs and
> apt_pkg.TagFile should split on these whitespace-only lines.

Being lax usually costs performance. The TagFile parser in libapt deals
by default only with machine generated files so it tends to be rather
strict to not waste time accounting for things which never happen in
practice. Looking at 'apt-cache stats' on my machine (with arguable
many sources) shows more than half a million sections are being parsed,
so "just ignoring some spaces" could have very visible effects for me.

There is an exception to this, the preferences file(s), which have
a deb822 format as well. The pkgTagSection is here therefore subclassed
to allow commented lines (aka: lines starting with #) before and after
each section (in between just works "by accident" now that I see the
code – again as I wrote that ~4 years ago as one of my first patches…).


I tell you this because comments are allowed in a control file by
policy, so using pristine pkgTagFile here will have interesting effects.
(like multiline fields split by comments) A non-empty should-be-empty
line might be the smallest of your problems…


We will have to work on that for the preferences file anyway, and
reading control file ourself isn't unheard of as well, but that isn't
going to be provided by stock pkgTagFile I assume. I could imagine
a pkgTagSection which can be told how relaxed it should be (or at least
allows more plugin code than at the moment), but that code doesn't exist
yet and I have no idea how the python layer looks like on top of that…


Best regards

David Kalnischkies
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-python-debian-maint/attachments/20140607/d5b50ddc/attachment-0001.sig>


More information about the pkg-python-debian-maint mailing list