UTF-16 clipboard

SDL_iconv() can be used to conveniently move from UTF-8 to whatever encoding you like. I’m not unsympathetic to the concern about unbounded data lengths, but it’s also probably reasonable to assume most clipboards’ contents are measured in bytes, not megabytes.

It doesn’t do it like that for the graphics. Instead it supports many different graphic formats so that you can use whatever is most efficient on your platform.

That said, I wouldn’t have thought the clipboard was that performance critical that it would matter if you had to convert to/and from UTF-8, but maybe if you have huge strings it matters, I don’t know…

I think what SDL does is probably fine for most applications. If you want to optimize the clipboard handling for Windows then perhaps it’s not too bad if you have to use Windows specific functions outside of SDL, is it?

Indeed SDL video and SDL audio support multiple formats so it doesn’t make sense for SDL text not to support multiple formats either. I can output 8-bit audio and I can output 16-bit audio as well.
“The whole purpose of an abstraction layer like SDL is to hide the differences between platforms by providing a uniform interface, so the same application will run, unmodified, on all those platforms. That necessarily implies that the same text encoding will be used, so there has got to be a conversion somewhere.” However, when the user can choose the encoding, it becomes even more abstractive since it allows the library to input and output in the user’s format.

Are you really asking the SDL developers to (essentially) write multiple versions of every function that takes a string?

1 Like

SDL_ttf does something like that. For every function that takes a string you can choose between three different versions.

TTF_RenderText_Solid (Latin1)
TTF_RenderUTF8_Solid (UTF-8)
TTF_RenderUNICODE_Solid (UCS-2)

But looking at the implementation it seems like the “Text” and “UNICODE” versions convert to UTF-8 internally so this is more about convenience and not about performance.

Note that it doesn’t support full UTF-16 or “system 8-bit encoding” as suggested by Piotr.

All UTF-8 string functions interfacing non-UTF-8 platforms must have some sort of conversion built into them. It is possible that either there is separate conversion code for every single text function, or it is shared in a common function. The way to handle multiple encodings would be to make the conversion code a branch depending on the current global text encoding. (UTF-8 could be default to preserve backwards compatibility, but system 8-bit encoding and UTF-16 would be available as well)

“Every other platform supported by SDL 2.0, as far as I know, uses UTF-8 (Linux, MacOS, Android and iOS at least).” Doesn’t Android have UTF-16 clipboard?

“Text
A CharSequence. ”
“A CharSequence is a readable sequence of char values. This interface provides uniform, read-only access to many different kinds of char sequences. A char value represents a character in the Basic Multilingual Plane (BMP) or a surrogate. Refer to Unicode Character Representation for details.”

It looks that way based on the interface although I don’t know how it’s stored internally.
It’s probably because Android is based on Java which uses UTF-16 a lot.

The Android clipboard seems to also support some additional functionality with uri and intent, not just plain-text.

Looks like SDL_SetClipboardText and SDL_GetClipboardText on Android uses the JNI functions NewStringUTF and GetStringUTFChars to convert to and from Java strings.

“Looks like SDL_SetClipboardText and SDL_GetClipboardText on Android uses the JNI functions NewStringUTF and GetStringUTFChars to convert to and from Java strings.” And the specification for these functions mentions modified UTF-8, not actual UTF-8. And modified UTF-8 basically means storing null characters as a two byte sequence (C0 80) and storing non-BMP characters in two surrogates. Indeed, I can confirm that when I have non-BMP characters in the clipboard and I use SDL_GetClipboardText on Android, they are represented in two surrogates rather than their actual UTF-8 encoding, and that is invalid in UTF-8. D83E DF00 D83E DF01 D83E DF02 D83E DF03 (U+1FB00 U+1FB01 U+1FB02 U+1FB03) becomes ED A0 BE ED BC 80 ED A0 BE ED BC 81 ED A0 BE ED BC 82 ED A0 BE ED BC 83 (invalid UTF-8). So not only does SDL not support system 8-bit encoding and UTF-16 APIs, but the UTF-8 that it was intended to support isn’t even correct in all platforms.

I was wondering about that when I read the code. What you describe is obviously a bug. I think you should report it.

I am a gitphobe so I’m not going to submit any GitHub issues.

I understand. I also don’t use GitHub. I’m going to flag your post and hope someone else takes care of it.

Does this clipboard convert? I thought it was only copying bytes …

If it was just copying bytes you would have problems when copy-pasting between programs that use different text encodings. The SDL clipboard functions use UTF-8. That means that on platforms where the underlying clipboard API use some other encoding there has to be some conversion happening.

That’s the precise opposite of how a cross-platform abstraction layer works. Yes, it hides the differences between platforms by providing a uniform interface to the developer, but we’re not talking about a developer interface here. We’re talking about the OS side of the interface, and the entire purpose of the abstraction layer is to translate between the developer interface and the OS’s preferred, native way of doing things.

This means that if the OS is using UTF-16, you use UTF-16, even if the developer interface is UTF-8. If you’re not doing that consistently across the various supported platforms, your claim to have a cross-platform abstraction layer goes right out the window.

If that’s the case I completely misunderstood what the OP was asking for, sorry. I thought he was wanting SDL2 to provide UTF-16 as an alternative encoding at the developer interface, specifically at the SDL_GetClipboardText() interface. I’ve deleted my comment.

SDL audio supports multiple formats (8-bit, 16-bit, 32-bit, float) so does that mean it cannot be considered a cross-platform abstraction layer?

What I mean is that I process UTF-16 internally in my code, and I would like the SDL string functions to be able handle UTF-16 as well (and such that I would have a way of specifying UTF-16 encoding beforehand, like how I select audio format before starting audio), in such a way that I can pass the pointer to UTF-16 string to the functions, and conversions between UTF-8 only occur when the direct external interface relies on UTF-8.

Oh. Looks like I was the one who misunderstood it then. Yeah, there’s some sense to that, but it’s a more difficult position to defend. And honestly, that’s not something I would use SDL for in the first place.

A lot of what’s in SDL is extremely duplicative of standard, expected functionality in virtually all standard libraries and existing application-development frameworks. If you want to copy text to the clipboard and you don’t like SDL’s way of doing so, do it your language’s way instead. If your language is cross-platform enough to care about using SDL for cross-platform development, its clipboard implementation is going to be too, so there’s really no downside to it.

What are you talking about? I use C++98 (compiling with Digital Mars), and C++ has no standard clipboard interface.

If you only care about Windows anyway and can’t live with the overhead of converting between UTF-8 and UTF-16, just use the WinAPI directly

1 Like

Of course I can do a Win32 clipboard, though that is not cross-platform. The thing is that the conversions should only be involved when the actual external interface uses a different format than the app’s internal format.