How IME works for CJK text input

Hi all,

In this email, I’d like to give a brief overview of how input methods work
for CJK (Chinese, Japanese, Korean) languages.

In most of modern operating systems, IME works pretty much the same way
for CJK languages, normally, there are several of them for each system
running as separate processes or dynamic libraries. Traditionally, there
must be a graphical way to select which IME you want to use. On Mac OS X,
that’s in the top right of menu bar; on Windows, that’s in the bottom
right status bar. And users may prefer to use keyboard shortcuts to
switch between these input methods, in Mac OS X that’s CMD + Space, in
Windows (and most CJK input methods in X11), that’s Ctrl + Space.

After a input method is chosen, key press from the keyboard can be
redirected to that input method, it will consume these key then generate
some candidate text, which means these text are not fixed yet, for
example, if I input “an” while a Chinese Pinyin IME is on, candidate
text will be “?”, “?”, “?”, etc. Normally the IME will bring up a
new floated window on top of the current application, near the location
of current cursor. In this window it will list all these candidate
text, one by one, then the user can use number key to choose which
one is the correct guess. Then, a text input event will be send to
that application, with the chosen text as a parameter, the application
received that event will display the text at the correct position.

However, that’s not the whole story, modern IME frameworks support
a new way of text input called “on the spot”, which means candidate
text will be shown within the application along with text inputed,
it can make the whole text input process more smooth for users (so
that they don’t need to check the application window and candidate
text window constantly, back and forth, they can keep their focus
on one window), to support “on the spot” text input, applications
should handle some more events like drawing and clearing the
candidate text.

  • Jiang

This is more or less how Tux Paint’s own internal Input Method works.

If you’re in the Japanese locale and tap [Right Alt] while using
Tux Paint’s text tool (which is VERY basic - it just assembles a string,
asks SDL_Pango to create a buffer, and blits it; no arrow key support
or selection or cut-n-paste), you’ll switch between Hiragana, Katakana
and English.

Type, e.g., [T][S][U] and first you’ll see “t”, then “ts”, and then
those characters will go away and it will be replaced with a ‘tsu’ character.

I admit I’m currently having difficulty getting any of our IMs to respond,
except Traditional Chinese, but it SHOULD be working. :^) And we so far
support Traditional Chinese, Japanese, Thai and Korean.

Many other locales get enough from SDL’s Unicode support that they were
supported ‘out of the box,’ with no Input Method layer.

… Sigh. Now to figure out what broke.

-bill!On Thu, May 28, 2009 at 02:57:07PM +0800, Jjgod Jiang wrote:

However, that’s not the whole story, modern IME frameworks support
a new way of text input called “on the spot”, which means candidate
text will be shown within the application along with text inputed,