Bug#585393: Please be more robust against bogus data in a deb822 file
John Wright
jsw at debian.org
Wed Aug 4 08:12:33 UTC 2010
Hi Michael,
On Thu, Jun 10, 2010 at 10:54:39AM +0200, Michael Vogt wrote:
> Package: python-debian
> Version: 0.1.16
> Severity: normal
>
> It appears that the deb822.Deb822.iter_paragraph method gets confused
> if there are bogus entries (like a single line) in the file. Below is
> a test that shows the behavior. Depending on the policy the excpeted
> value is either 2 or 3 (2 if we want to discard invalid stanzas).
What is your use case for this? I'm having a hard time seeing a good
way to handle bogus data consistently. What should the parser yield
when it encounters a bogus stanza?
> It appears that the problem is "while len(x) != 0" in deb822.py, that
> will make the parser stop on the first bogus line. Attached is a
> possible patch for this that makes the EOF handling explicit.
It should be noted that this behavior is specific to the native parser
(which is used when you specify use_apt_pkg=False or you don't have
python-apt installed). When iter_paragraphs uses apt_pkg, it returns a
bogus Deb822 object for the bogus line. Because of apt_pkg's TagParser
implementation, it may appear to have a key corresponding to the bogus
line, but actually trying to get the value for that key will raise
KeyError. This is not good behavior - it breaks the map interface - but
unless we check for validity of the data (which would defeat the purpose
of using apt_pkg), I don't know how do make it better.
Interestingly, with your patch, the native parser returns an empty
Deb822 object (essentially {}). This probably is the best behavior we
can ask for - although I think it should at least raise a warning, and
the behavior should be documented. And...it would be really nice if we
could make the apt_pkg one do the same thing.
Any ideas?
--
John Wright <jsw at debian.org>
More information about the pkg-python-debian-maint
mailing list