SDL_ttf - UTF8_to_UNICODE (bug report)

hi,

i was recently reading some information about UNICODE
and UTF8 encoding while playing with SDL_ttf lib,
then i just noticed a little slip in the function
UTF8_to_UNICODE() – taken from CVS SDL_ttf.c

i got resources about UTF8 from RFC3629 at
http://www.ietf.org/rfc/rfc3629.txt

Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)--------------------±--------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

here’s the CODE with COMMENTS

static Uint16 *UTF8_to_UNICODE(Uint16 *unicode, const char *utf8, int len)
{
int i, j;
Uint16 ch;

for ( i=0, j=0; i < len; ++i, ++j ) {
	ch = ((const unsigned char *)utf8)[i];
	if ( ch >= 0xF0 ) {
		ch  =  (Uint16)(utf8[i]&0x07) << 18;
		ch |=  (Uint16)(utf8[++i]&0x3F) << 12;
		ch |=  (Uint16)(utf8[++i]&0x3F) << 6;
		ch |=  (Uint16)(utf8[++i]&0x3F);
	} else
	if ( ch >= 0xE0 ) {
		
		// ****** NOTE1 ******
		// well, we only use 4 bits in the first octet
		// so it should be 0x0F instead of 0x3F.
		// here, the 6th bit is equal to 1
		// and that makes the char range no more 
		// 0000 0800 - 0000 FFFF (2048 to 65,535)
		// but rather 131,072+
		// oops.. 60,000 chars have vanished...
		
		// it should be
		// ch = (Uint16)(utf8[i] & 0x0F) << 12;
		// instead of
		ch  =  (Uint16)(utf8[i]&0x3F) << 12;
		
		// **** END NOTE1 ****
		
		ch |=  (Uint16)(utf8[++i]&0x3F) << 6;
		ch |=  (Uint16)(utf8[++i]&0x3F);
	} else
	if ( ch >= 0xC0 ) {
		
		// ****** NOTE2 ******
		// here it should be 0x1F instead of 0x3F
		// because we use 5 bits and not 6
		// but since the 6th bit is 0 anyway,
		// it is working properly.
		// at least, only if there's no error in the string
		ch  =  (Uint16)(utf8[i]&0x3F) << 6;
		// **** END NOTE2 ****
		
		ch |=  (Uint16)(utf8[++i]&0x3F);
	}
	unicode[j] = ch;
}
unicode[j] = 0;

return unicode;

}
null

Forfait AOL ADSL 5 M?ga ? 22.90EUR/mois

Thanks! Your patch has been added to CVS.

See ya!
-Sam Lantinga, Software Engineer, Blizzard Entertainment