Bug#786718: libmpg123: incorrect check/decoding for utf-16 surrogates in id3 parser

Thomas Orgis thomas-forum at orgis.org
Mon May 25 10:08:36 UTC 2015


Hi Yuriy!

Am Sun, 24 May 2015 23:08:12 +0300
schrieb "Yuriy M. Kaminskiy" <yumkam at gmail.com>: 

> utf-16 decoder in id3 parser improperly identifies surrogate pairs, 
> resulting in improper identification of characters in 0xf800-0xfffe 
> range as leading surrogate and decoding failure.
> 
> E.g. attempt to decode title "「x」~y~" results in:
> [id3.c:1065] error: Invalid UTF16 surrogate pair at 0 (0xff62).
> and empty parsed title.

Could you please send me (mpg123 upstream maintainer) a little (piece
of an) example file to add as regression test for this? As ID3 tag
writers also have a history of messing up encoding, I'd like to use the
original and not a fake I did myself;-)

Regarding the patch: Oh, yes, I see stupid me not getting the proper
idea about bit masks back in 2006/2007 in this case.

Let's recap to be on the safe side:

high surrogate range: 0xD800 to 0xDBFF
 low suggogate range: 0xDC00 to 0xDFFF

Do we agree on that or is my knowledge of UTF-16 outdated?

I sense that the mask 0xf800 doesn't cover the first range properly,
neither. We need to detect bit sequences between

0b1101100000000000
0b1101101111111111

We don't want to catch

0b110111xxxxxxxxxx

in there. So a proper mask should be

0b1111110000000000

which is 0xfc00 in hex, too. Verifying the low surrogate range:

0b1101110000000000
0b1101111111111111

The mask

0b1111110000000000

seems appropriate here, too. How convenient. This smells of intelligent
design, doesn't it? ;-) So 0xfc00 should be used both for low and high
surrogates to properly tell them apart with the additional bit.

I'm attaching a revised patch that should enter mpg123 trunk shortly.

Feel free to yell and show the error in my current reasoning …


Alrighty then,

Thomas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpg123-utf16-surrogate.patch
Type: text/x-patch
Size: 1031 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-multimedia-maintainers/attachments/20150525/945027f4/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digitale Signatur von OpenPGP
URL: <http://lists.alioth.debian.org/pipermail/pkg-multimedia-maintainers/attachments/20150525/945027f4/attachment.sig>


More information about the pkg-multimedia-maintainers mailing list