Bug#786718: libmpg123: incorrect check/decoding for utf-16 surrogates in id3 parser
Thomas Orgis
thomas-forum at orgis.org
Mon May 25 10:08:36 UTC 2015
Hi Yuriy!
Am Sun, 24 May 2015 23:08:12 +0300
schrieb "Yuriy M. Kaminskiy" <yumkam at gmail.com>:
> utf-16 decoder in id3 parser improperly identifies surrogate pairs,
> resulting in improper identification of characters in 0xf800-0xfffe
> range as leading surrogate and decoding failure.
>
> E.g. attempt to decode title "「x」~y~" results in:
> [id3.c:1065] error: Invalid UTF16 surrogate pair at 0 (0xff62).
> and empty parsed title.
Could you please send me (mpg123 upstream maintainer) a little (piece
of an) example file to add as regression test for this? As ID3 tag
writers also have a history of messing up encoding, I'd like to use the
original and not a fake I did myself;-)
Regarding the patch: Oh, yes, I see stupid me not getting the proper
idea about bit masks back in 2006/2007 in this case.
Let's recap to be on the safe side:
high surrogate range: 0xD800 to 0xDBFF
low suggogate range: 0xDC00 to 0xDFFF
Do we agree on that or is my knowledge of UTF-16 outdated?
I sense that the mask 0xf800 doesn't cover the first range properly,
neither. We need to detect bit sequences between
0b1101100000000000
0b1101101111111111
We don't want to catch
0b110111xxxxxxxxxx
in there. So a proper mask should be
0b1111110000000000
which is 0xfc00 in hex, too. Verifying the low surrogate range:
0b1101110000000000
0b1101111111111111
The mask
0b1111110000000000
seems appropriate here, too. How convenient. This smells of intelligent
design, doesn't it? ;-) So 0xfc00 should be used both for low and high
surrogates to properly tell them apart with the additional bit.
I'm attaching a revised patch that should enter mpg123 trunk shortly.
Feel free to yell and show the error in my current reasoning …
Alrighty then,
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpg123-utf16-surrogate.patch
Type: text/x-patch
Size: 1031 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-multimedia-maintainers/attachments/20150525/945027f4/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digitale Signatur von OpenPGP
URL: <http://lists.alioth.debian.org/pipermail/pkg-multimedia-maintainers/attachments/20150525/945027f4/attachment.sig>
More information about the pkg-multimedia-maintainers
mailing list