UTF-16 clipboard

PiotrGrochowski · November 8, 2022, 3:29pm

“Looks like SDL_SetClipboardText and SDL_GetClipboardText on Android uses the JNI functions NewStringUTF and GetStringUTFChars to convert to and from Java strings.” And the specification for these functions mentions modified UTF-8, not actual UTF-8. And modified UTF-8 basically means storing null characters as a two byte sequence (C0 80) and storing non-BMP characters in two surrogates. Indeed, I can confirm that when I have non-BMP characters in the clipboard and I use SDL_GetClipboardText on Android, they are represented in two surrogates rather than their actual UTF-8 encoding, and that is invalid in UTF-8. D83E DF00 D83E DF01 D83E DF02 D83E DF03 (U+1FB00 U+1FB01 U+1FB02 U+1FB03) becomes ED A0 BE ED BC 80 ED A0 BE ED BC 81 ED A0 BE ED BC 82 ED A0 BE ED BC 83 (invalid UTF-8). So not only does SDL not support system 8-bit encoding and UTF-16 APIs, but the UTF-8 that it was intended to support isn’t even correct in all platforms.