Wierd A displays when I want to remove polish letter - text input

Hi! When I input ą or ę, or ó and then I want to remove it with a simple click, it renders A with a hat on. Also ł and ś renders l with a point and then when I click backspace again, it deletes this letter.

after backspacing ą or ę, or ó

after backspacing ś or ł I cannot upload one more image

text input code is simple :

std::string buffer;
//...
	switch (event.type) {
				case SDL_KEYDOWN:
					if (event.key.keysym.sym == SDLK_BACKSPACE && buffer.size() > 0) {
						buffer.pop_back();
					} else if (event.key.keysym.sym == SDLK_c && SDL_GetModState() & KMOD_CTRL) {
						SDL_SetClipboardText(buffer.c_str());
					} else if (event.key.keysym.sym == SDLK_v && SDL_GetModState() & KMOD_CTRL) {
						buffer = SDL_GetClipboardText();
					}
					break;
				case SDL_TEXTINPUT:
					if (!((event.text.text[0] == 'c' || event.text.text[0] == 'C') && (event.text.text[0] == 'v' || event.text.text[0] == 'V') && SDL_GetModState() & KMOD_CTRL)) {
						buffer += event.text.text;
					}
					break;
				}	

//...
drawstring(buffer, ...);

How to fix it?
Can I somehow easly remove special letters, to not accidentally render it (would be better if those letters would be available, but if this bug is hard to fix, better remove them) ?

I think your problem is that you are trying to handle an utf8 string as an ascii string.

If you encode your data with UTF-8, some letters might use more than one byte. Your special characters (ś, ą, …) use this. So removing one byte at the end of the string might actually break the letters. Id recommend reading up on encodings, especially UTF-8. As a starting point, just good old wikipedia https://en.wikipedia.org/wiki/UTF-8

As a quick solution, i think what you can do is check (when executing the removal) if the current last byte looks like this: “0b10xxxxxx”, where x can be anything. If it does not, just remove the last byte. If it does look like that, you need to remove all bytes up until and you find one where it looks like this. “0b11xxxxxx”.

Im not sure what functionality SDL provides for handling UTF-8 to be honest.

Good luck with your project!

This is how you handle backspace for utf8 string in C++:

if (event.key.keysym.sym == SDLK_BACKSPACE) {
    std::size_t textlen = SDL_strlen(buffer.c_str());
    do {
        if (textlen == 0) {
            break;
        }
        if ((buffer[textlen - 1] & 0x80) == 0x00) {
            /* One byte */
            buffer.erase(textlen - 1);
            break;
        }
        if ((buffer[textlen - 1] & 0xC0) == 0x80) {
            /* Byte from the multibyte sequence */
            buffer.erase(textlen - 1);
            textlen--;
            if (textlen == 0) {break;} // invalid character
        }
        if ((text[textlen - 1] & 0xC0) == 0xC0) {
            /* First byte of multibyte sequence */
            buffer.erase(textlen - 1);
            break;
        }
    } while(true);
}

You can find the original C version here:
https://hg.libsdl.org/SDL/file/783d1cff9b20/test/testime.c#l285

1 Like