[DRE-maint] Bug#534721: closed by Stefano Zacchiroli <zack at debian.org> (Re: Bug#534721: libhpricot-ruby1.8: Hpricot's XML parser fails to parse simple, valid XML)

T Chan something-bz at sodium.serveirc.com
Wed Feb 24 03:33:29 UTC 2010


The bug in Message #25 is still present in 0.8.1-1 although not as grave (the change is INT2NUM -> INT2FIX which eliminates false negatives but not false positives).

Compare the colliding-hash case:

$ ruby -r hpricot -e "print Hpricot.XML('<afuaf></zhgaa>')"
<afuaf></afuaf>
$ ruby -r hpricot -e "print Hpricot.XML('<aaaayi></zkbaaa>')"
<aaaayi></aaaayi>

With the non-colliding case:

$ ruby -r hpricot -e "print Hpricot.XML('<afuaf></a>')"
<afuaf></a></afuaf>
$ ruby -r hpricot -e "print Hpricot.XML('<aaaayi></a>')"
<aaaayi></a></aaaayi>

(Tested on x86, but it should also work on amd64. Additionally, the INT2NUM -> INT2FIX adds additional collisions on x86 since it drops the high bit of the hash.)

It's unclear *why* the string-hashing even exists, since it's unlikely to offer a significant performance (and rb_str_hash(tag) appears to be re-calculated in several places anyway).

Additionally, it still accepts invalid XML and outputs even worse XML. The correct thing to do is to return an error. "Garbage in, garbage out" applies to well-formed programs; it's not an excuse for a compiler to not implement errors.

-TC






More information about the Pkg-ruby-extras-maintainers mailing list