[Python-modules-team] Bug#663189: buffer overflow in python-pyfribidi

Fri Mar 9 11:49:11 UTC 2012

Jakub Wilk <jwilk at debian.org> writes:

>>The reason is the following (see
>>https://github.com/pediapress/pyfribidi/issues/2):
>>
>> fribidi_utf8_to_unicode consumes at most 3 bytes for a single
>> unicode character, i.e. it does not handle unicode character above
>> 0xffff.
>
> As far as I can see this is not true. In Debian, we allocate 4 bytes
> per characters. (An upstream version, which the Debian package is
> based on, is completely broken in this respect: it allocates a buffer
> of static size. See bug #570068)

upstream is pretty much dead in this case. I've published our version on
PyPI. However, I didn't ask or inform the original authors about that.

>
>> For a 4 byte utf-8 sequence it will generate 2 unicode characters,
>> which overflows the logical buffer.
>
> I'm confused. What is "it" in your sentence? Why 2 Unicode characters?

"it" refers to the 4 byte utf-8 sequence.

here's the inner loop of "fribidi_utf8_to_unicode" from
fribidi-char-sets-utf8.c:

,----
|   length = 0;
|   while ((FriBidiStrIndex) (s - t) < len)
|     {
|       register unsigned char ch = *s;
|       if (ch <= 0x7f)		/* one byte */
| 	{
| 	  *us++ = *s++;
| 	}
|       else if (ch <= 0xdf)	/* 2 byte */
| 	{
| 	  *us++ = ((*s & 0x1f) << 6) + (*(s + 1) & 0x3f);
| 	  s += 2;
| 	}
|       else			/* 3 byte */
| 	{
| 	  *us++ =
| 	    ((int) (*s & 0x0f) << 12) +
| 	    ((*(s + 1) & 0x3f) << 6) + (*(s + 2) & 0x3f);
| 	  s += 3;
| 	}
|       length++;
|     }
`----

Assume you have a 4-byte utf-8 sequence. One loop step consumes a maximum of
3 bytes of that 4-byte sequence (there's no "4 byte" case), leaving
1-byte of that sequence for further processing. this 1 byte will
generate another unicode character. pyfribidi uses the length of the
python unicode string as buffer size, which is less than what the
fribidi_utf8_to_unicode generates. and there you have your buffer
overflow.

to confirm the issue, you can add an assert and check that
fribidi_utf8_to_unicode's return value (the length of the string) equals
unicode_length.

>
> Anyway I tried to double the buffer size (8 bytes per characters of
> original string) but this didn't fix the crash. So likely the problem
> lies somewhere else.

I'm pretty sure my analysis is correct and I'm not so quite sure what
you did here.

-- 
Cheers
Ralf