Some old new things about IME and XIM support

Christophe_Cavalaria · March 23, 2007, 7:28pm

Ryan C. Gordon wrote:

As you said, if you don’t mind I ask you,
would the interface TTF_RenderUNICODE_* of SDL_ttf be changed?
It uses Uint16 pointer to receive a wide string now,
and I think I might have to follow it’s interface.

The problem is that “Unicode” on Windows, the TCHAR type, is 16-bit, as
are all OS interfaces. My understanding is that in Windows XP, this
stopped being UCS-2 and started being UTF-16…I would be amazed if most
applications handled this well, though.

Yuck!!! UTF-16, the crappiest unicode encoding available. All the problems
of UTF-8 and UCS-4 at the same time. Btw, SDL_ttf uses UCS-2 not UTF-16 as
it’s input type for the UNICODE functions so there might be an issue there.

Eventually this should change in SDL and support libraries too, but
mostly that will be from people contributing patches.

Then maybe it would be best to add in SDL_ttf one additional set of
functions which handle UCS-4 correctly. I’ve had a look at the code and it
is rather trivial to do. It’s even easier than the current functions since
you can get rid of a few type casts in the operation ( libttf uses 32bit
chars strings no matter which system it is )

I had a recent small project where I completly ditched unicode handling
because there was no way for me to make SDL_ttf take my std::widestring on
Linux. Unicode wasn’t too important but would have been a neat bonus for
me. But facing problems the second time I’m doing string manipulation with
SDL_ttf because of that was too much for what it was worth.

And UTF-8 was not a solution at all in that case.

Christophe_Cavalaria · March 23, 2007, 7:34pm

Ryan C. Gordon wrote:

Did you mean I should let SDL-IM just only provide utf-8 output?
If you did, I also think it should be.

In my humble opinion, utf-8 must be the way to go,
but comparing with UCS-16 or UCS-32,
handling utf-8 string is not as direct as handling UCS-16.
So I prefer to support utf-8 and UCS-16 synchronously, like SDL_ttf.

I love UTF-8 myself, since it lets us ignorant Americans not think about
Unicode very much. Plus it Mostly Works with legacy APIs and
doesn’t have byte-ordering issues.

But UCS-4 seems to be the way to go if you don’t have huge string
requirements, since it’s much simpler to process…something like SDL
would probably be tossing around a few characters at a time, coming from
the keyboard, so byte-ordering is less of an issue, too.

For information, the reason UTF-8 sucks for string processing is simple. All
you have to do is write the code that handles line breaks correctly with
UTF-8 strings as input.

Btw, it would be a good helper function to provide in SDL_ttf. It is a very
common requirement.

Daniel_K_O · March 23, 2007, 10:50pm

Christophe Cavalaria escreveu:

Btw, it would be a good helper function to provide in SDL_ttf. It is a very
common requirement.

Let’s not forget that proper Unicode handling is hard. For example,
Pango (and MS’ Uniscribe too, I believe) does MUCH more than just glyph
rendering. If you ignore the other issues (like proper layouting of
complex scripts), you end up supporting only some languages. SDL_ttf is
more like a quick hack for TTF rendering, not a complete solution.—
Daniel K. O.

Christophe_Cavalaria · March 23, 2007, 11:18pm

Daniel K. O. wrote:

Christophe Cavalaria escreveu:

Btw, it would be a good helper function to provide in SDL_ttf. It is a
very common requirement.

Let’s not forget that proper Unicode handling is hard. For example,
Pango (and MS’ Uniscribe too, I believe) does MUCH more than just glyph
rendering. If you ignore the other issues (like proper layouting of
complex scripts), you end up supporting only some languages. SDL_ttf is
more like a quick hack for TTF rendering, not a complete solution.

Well, you have to admit that for 99.999% of the projects using SDL, you’ll
never get to the point where you translate your application to one of those
language.

Torsten_Giebl · March 23, 2007, 11:23pm

Hello !

Well, you have to admit that for 99.999% of the projects using SDL, you’ll
never get to the point where you translate your application to one of
those
language.

But why ? Even a simple tool can be easier to use
when the user can select his native language.

Also many applications do not touch the hardcore
user, for example TuxPaint.

CU

Christophe_Cavalaria · March 23, 2007, 11:30pm

Torsten Giebl wrote:

Hello !

Well, you have to admit that for 99.999% of the projects using SDL,
you’ll never get to the point where you translate your application to one
of those
language.

But why ? Even a simple tool can be easier to use
when the user can select his native language.

Also many applications do not touch the hardcore
user, for example TuxPaint.

I didn’t say it wouldn’t be useful or worth it to translate it. I said it
wouldn’t happen. And not really for good reasons mind you. More along the
lines of, not enouth manpower for a small userbase that can work as well
with english version, nobody with the skills available to do it etc …

courage · March 24, 2007, 4:13am

courage <dr.courage gmail.com> writes:

Honestly, present patch of SDL-IM is not flexible.
Maybe improving SDL-IM to support some kind of IM module to
setup any other IM library, like IIIMF or SCIM(although they both support XIM),
would be more flexible, but it could increase some API in SDL.
^^^^^^^^ or not.

I could disarray some concepts.
I had said that IM Application is client side and SDL Application
is server side before, because for end-user sight, IM Apps “SEND” the strings
to SDL App, and SDL Apps “RECEIVE” them.

But in implement sight, SDL App should “GET” information for IM server.
For win32,
IME App -> Imm32.dll -> event queue <- SDL App/IME Client

For XIM, the same concepts is:
XIM App -> IM Server -> X <- XIM Client App

(Maybe it is not correct, but I thought it is close to reality.)

If you could receive this concept, I bet that you would say
"If we could avoid the event driving, we could avoid to patch SDL."

Sure. In Linux, the design of IIIMF or SCIM would like this:
IM module(IM App) -> socket -> IM server <- socket <- IM client(SDL App)

For supporting XIM, the flow is a little changed:
IM module(IM App) -> socket -> IM server -> socket -> X <- XIM client(SDL App)

(Again, maybe it is not correct, but I thought it is closely.)

But there is a big problem.
A lot of people use system built-in IM App, especially in Microsoft system,
That’s why we have to patch SDL to support old and not good standard.

And Thanks to everyone who is interested to discuss this topic.
I am so encouraged.

Daniel_K_O · March 24, 2007, 3:09pm

Christophe Cavalaria escreveu:

Well, you have to admit that for 99.999% of the projects using SDL, you’ll
never get to the point where you translate your application to one of those
language.

With that kind of elitist attitude you probably won’t bother to support
anything other than english.

Maybe I won’t translate my application to those languages, but I still
want to support it; that is, allow the users to read and write in their
own languages. Maybe they want to name their character in their
language; maybe they want to chat with other players in their language;
maybe someone will host online games translating it all by hand in some
.ini/.xml file; maybe the user will browse and load local files, named
in their language.—
Daniel K. O.

Bill_Kendrick · March 24, 2007, 10:55pm

How’s SDL_pango coming along? I’ve had it in the back of my head to
switch Tux Paint from SDL_ttf to SDL_pango. Finding the time, though…?

-bill!On Fri, Mar 23, 2007 at 07:50:46PM -0300, Daniel K. O. wrote:

Let’s not forget that proper Unicode handling is hard. For example,
Pango (and MS’ Uniscribe too, I believe) does MUCH more than just glyph
rendering. If you ignore the other issues (like proper layouting of
complex scripts), you end up supporting only some languages. SDL_ttf is
more like a quick hack for TTF rendering, not a complete solution.

icculus · March 24, 2007, 11:28pm

With that kind of elitist attitude you probably won’t bother to support
anything other than english.

He makes a fair point, though: most programmers DON’T think about
Unicode at all, especially in America. Existing programs can be very
hard to retrofit with multi-byte character support at all, and most of
those probably wouldn’t properly handle right-to-left languages, or
case-insensitive comparison, etc.

Joel Spolsky wrote a fantastic article that everyone should read;
understanding Unicode and i18n can be really overwhelming when you
start, and he gives you the basics very well:

http://www.joelonsoftware.com/articles/Unicode.html

And it’s also clear after reading the article that there are some smart
changes that should be made to things like SDL_ttf.

–ryan.

Daniel_K_O · March 25, 2007, 2:21am

Ryan C. Gordon escreveu:

And it’s also clear after reading the article that there are some smart
changes that should be made to things like SDL_ttf.

My point was, that seems beyond the scope of SDL_ttf, which is just a
convenient wrapper for FreeType.

What’s the point in creating half-solutions? This will not make SDL_ttf
any more useful for i18n. I prefer to remove all the unicode
functionality from SDL_ttf, to make it clear that it isn’t enough for i18n.

As for a link, I find much more enlightening this Owen Taylor’s (old)
paper on Pango; read just the introduction:
http://people.redhat.com/otaylor/ols2001/pango.ps.gz---
Daniel K. O.

courage · March 25, 2007, 2:21am

Ryan C. Gordon <icculus icculus.org> writes:

We’re already using XIM in the latest SDL, but that’s just to fill in
the “unicode” field for keypresses. I suspect it’s probably not nearly
as robust as it should be…I’d be surprised if it worked well with,
say, Arabic, but I can’t even start to speak authoritatively about
either XIM or Arabic to know.

Although I haven’t hacked the latest SDL source,
I also notice that there are some XIM codes in source.

However, it doesn’t seems to provide any new API or structure in documents.
I also thought that just supports different keyboard layout for key strokes.
Maybe this support could help some European language.
But in CJK(Chinese, Japanese, Koren) conditions, it would be not enough.

In CJK conditions, people usually don’t change their keyboard layout and
just use American keyboard layout,
and all the CJK words are composited by some kind of rules
in a small IM window before outputing.

I have no idea about Arabic. Is it a phonetic language or an ideogram?
If it is a phonetic language, we could assume
that it could be supported as Japanese.

Windows and Mac input methods are probably totally lacking in SDL.

Better keyboard input is something that should go into 1.3. Sam
mentioned he had some ideas that World of Warcraft inspired, but the
revamping the input subsystem is still on the TODO list. Having someone
that understands how IM works would be very helpful.

World of Warcraft is so famous and popular in the world,
so it is impossible not to support IM handle.
Because chating with each player with their native language is
the necessary feature for any MMORPG.
(But honestly, I don’t have any experience of playing WoW.)

Scott_Harper · March 25, 2007, 2:37am

World of Warcraft is so famous and popular in the world,
so it is impossible not to support IM handle.
Because chating with each player with their native language is
the necessary feature for any MMORPG.
(But honestly, I don’t have any experience of playing WoW.)

WoW in America does NOT support IME usage. I can only assume that it
would in China, else the users would be forced to use romanized
chinese, which seems like a very bad idea to me. Though I cannot
say for sure.

– Scott