[DRE-maint] Bug#534721: Patch

something-bz at sodium.serveirc.com something-bz at sodium.serveirc.com
Fri Jul 10 15:28:12 UTC 2009


So it turns out the main bug does not affect amd64 (or more generally, roughly systems where (LONG_MAX>>2) >= INT_MAX).

There are at least two bugs in the tag comparison:
- Assumes number equality is the same as VALUE equality.
- Assumes hash equality is the same as string equality.

The first causes the bug: Small numbers are stored as (VALUE)((n<<1)|1); numbers not representable as such are stored as a bignum. If you're fortunate enough to be using amd64, the size of a VALUE (unsigned long) is sufficiently larger than the size of a hash (int) that you can always represent it as a "fixed" number and you never see this bug.

The second bug means sometimes <tag></othertag> will appear to be valid if tag and othertag have the same hash and both hashes are "fixable". Searching for hash collisions is not that difficult - the first example should work on amd64, the second works on x86:

  $ ruby -r hpricot -e "print Hpricot.XML('<afuaf></zhgaa>')"
  <afuaf></zhgaa></afuaf>

  $ ruby -r hpricot -e "print Hpricot.XML('<aaaayi></zkbaaa>')"
  <aaaayi></aaaayi>


The patch below never bothers with the hash comparison because you have to do the string comparison anyway. If you want to re-add the hash comparison as a possible optimization (for e.g. the HTML case), the better way is to replace INT2NUM with INT2FIX instead of doing a bignum comparison.  (hpricot_scan.rl is also patched, though you can just apply the same patch file to both.)

The other two bugs are still there:
- The "BogusETag" prints as </tag>, so it still produces potentially invalid XML. It should probably print something more sensible (<tag hpricot="bogus whatever"/>?).
- Invalid XML is still accepted in XML mode, while the correct thing to do is fail.


There's another oddity in hpricot_scan.rl/hpricot_scan.c:
- The definition of H_ELE(N) for a cBogusETag does H_ELE_SET(ele, H_ELE_ATTR, rb_str_new(raw, rawlen));. I'm not sure what this is for (ele.attr should be a dict, for a start).


Anyone feel like submitting this upstream?

--- hpricot_scan.c.0	2009-07-10 13:48:13.000000000 +0100
+++ hpricot_scan.c.3	2009-07-10 14:19:16.000000000 +0100
@@ -326,11 +326,11 @@
     // (see also: the search above for fixups)
     //
     name = INT2NUM(rb_str_hash(tag));
     while (e != S->doc)
     {
-      if (H_ELE_GET(e, H_ELE_HASH) == name)
+      if (rb_str_cmp(H_ELE_GET(e, H_ELE_TAG),tag) == 0)
       {
         match = e;
         break;
       }
 
--- hpricot_scan.rl.0	2009-07-10 14:43:44.000000000 +0100
+++ hpricot_scan.rl.3	2009-07-10 14:44:07.000000000 +0100
@@ -359,11 +359,11 @@
     // (see also: the search above for fixups)
     //
     name = INT2NUM(rb_str_hash(tag));
     while (e != S->doc)
     {
-      if (H_ELE_GET(e, H_ELE_HASH) == name)
+      if (rb_str_cmp(H_ELE_GET(e, H_ELE_TAG),tag) == 0)
       {
         match = e;
         break;
       }
 






More information about the Pkg-ruby-extras-maintainers mailing list