Bug#585393: Please be more robust against bogus data in a deb822 file

John Wright jsw at debian.org
Wed Aug 4 08:12:33 UTC 2010


Hi Michael,

On Thu, Jun 10, 2010 at 10:54:39AM +0200, Michael Vogt wrote:
> Package: python-debian
> Version: 0.1.16
> Severity: normal
> 
> It appears that the deb822.Deb822.iter_paragraph method gets confused
> if there are bogus entries (like a single line) in the file. Below is
> a test that shows the behavior. Depending on the policy the excpeted
> value is either 2 or 3 (2 if we want to discard invalid stanzas).

What is your use case for this?  I'm having a hard time seeing a good
way to handle bogus data consistently.  What should the parser yield
when it encounters a bogus stanza?

> It appears that the problem is "while len(x) != 0" in deb822.py, that
> will make the parser stop on the first bogus line. Attached is a
> possible patch for this that makes the EOF handling explicit. 

It should be noted that this behavior is specific to the native parser
(which is used when you specify use_apt_pkg=False or you don't have
python-apt installed).  When iter_paragraphs uses apt_pkg, it returns a
bogus Deb822 object for the bogus line.  Because of apt_pkg's TagParser
implementation, it may appear to have a key corresponding to the bogus
line, but actually trying to get the value for that key will raise
KeyError.  This is not good behavior - it breaks the map interface - but
unless we check for validity of the data (which would defeat the purpose
of using apt_pkg), I don't know how do make it better.

Interestingly, with your patch, the native parser returns an empty
Deb822 object (essentially {}).  This probably is the best behavior we
can ask for - although I think it should at least raise a warning, and
the behavior should be documented.  And...it would be really nice if we
could make the apt_pkg one do the same thing.

Any ideas?

-- 
John Wright <jsw at debian.org>





More information about the pkg-python-debian-maint mailing list