For a TTF_Atlas is kerning good enough? Do you need ligatures?

Levo · August 1, 2024, 2:13pm

I had a post where I volunteered to write atlas code for sdl_ttf since I’m already doing it for SDL2 in my own project (although in C++). I wrote some code and to my surprise people wanted a render string function instead of the function you can copy paste from the example to do it. A maintainer said kerning is very important to him and I saw no easy way to implement it. I also tried out harfbuzz and for good text rendering it seems very involved, I’d have to track what font the unicode characters belongs to and call hb shape with that specific font file. That means I need to deal with loading many font files and disambiguating overlap, track if you unload a font if that’s desired etc

To simply things I want to know what you want out of a ttf atlas lib? Dealing with harfbuzz is not something I want to do right now for reasons in above paragraph. For kerning I may have to write code that reads the font file directly in binary, grab the kerning table and build a hashtable with it because freetype doesn’t directly offer it to me. Is font files → atlas buffer → texture, a render function, example code to render yourself, and kerning all you need? If I’m missing something else let me know the usecase. I do have a function to reserve space in the altas buffer so you can have custom images in the same texture.

I’m not exactly sure of an ‘easy’ way to implement ligatures. I know HB will deal with it because it looks at the substitution table. Or maybe I can deal with basic HB use if most people only need to load one font file

rtrussell · August 2, 2024, 8:47am

Fair enough, you should only offer to do what you are happy with. But for my application Harfbuzz is essential, since I support complex scripts like Arabic which aren’t practical to do any other way.

Harfbuzz isn’t difficult to use in practice (without a font atlas), it’s only necessary to call TTF_SetFontScriptName() and TTF_SetFontDirection() before calling the standard TTF_RenderUTF8_Blended() etc. function(s).

It is a pain to have to do the script and direction detection myself, when the equivalent routines in Windows do that automatically, so if you were to consider implementing it, that would be a valuable extension.

Levo · August 2, 2024, 4:19pm

Does the user edit arabic? How do you handle arabic and english on the same line? Do you mix any languages and fonts files?

I may eventually need to do this but that’s more likely end of year if people like my project. Do you know if hb_buffer_guess_segment_properties works fine? Because I don’t know what unicode letters belong to which language which uses RTL

rtrussell · August 2, 2024, 6:03pm

I support the user editing Arabic etc., yes, but currently not using the IME which I haven’t got to grips with in SDL. So I am assuming the necessary characters can be entered conventionally using the keyboard (which will be the case with Arabic, but not necessarily with CJK languages for example).

This involves some ‘interesting’ code when dealing with moving the caret through the text, for example, because it has to change direction dynamically. It does seem to work pretty well however.

I have a table relating the Unicode code point to the script and direction, so for example code points 0x0590 - 0x05FF are Hebrew (right-to-left) and 0x0600 - 0x06FF are Arabic (left-to-right). This allows me to detect at what points in the line the script and/or direction changes, and I render the sections separately.

In general I am trying to reproduce something close to the behavior of the Windows ScriptStringOut() function, which makes everything easy. It handles things like caret positioning and highlighting automatically; I do wish SDL2_ttf had incorporated something similar!

No, currently I am assuming all the glyphs needed will be in the same font file, which implies using a Unicode font with good coverage. Generally I will use something like DejaVuSans which covers the most common languages, but for the more obscure languages the choice of font is limited.

Again, ScriptStringOut() makes it easy because font substitution when a glyph is missing is automatic, not so with SDL2_ttf. The inclusion of Harfbuzz has improved things a lot, but it still feels as though it is seriously behind-the-curve as far as support for rendering and editing bi-directional complex scripts is concerned.

Levo · August 5, 2024, 12:48am

I’m really bad at paying attention to online stuff

Does the public know what you’re working on? I can’t think of what program may allow users to write arabic text. I’m assuming you’re not doing a notepad app

table relating the Unicode code point to the script and direction

Is this an official table? I should look into this as well. I wonder if harbuzz has this and if I don’t need to worry

What the heck does scriptstringout do? All its parameters are in and its return value is hresult, which is suppose to be S_OK… what the heck is that function

rtrussell · August 5, 2024, 8:33am

It’s the code editor in BBC BASIC so, yes, quite a lot like Notepad! The user can type in lines of BASIC code like this (you can try editing these lines in your reply, the Discourse editor behaves just like mine does):

arabic$ = "هنا مثال يمكنك من الكتابة من اليمين" : REM Right-to-left
hebrew$ = "זוהי הדגמה של כתיבת טקסט בעברית מימין לשמאל" : REM Right-to-left

It’s derived from ‘official sources’, although I can’t remember now where I found them. The various language tags used by TTF_SetFontScriptName() are listed here.

It renders (and optionally highlights part of) a Unicode string, doing all the work of language (script) detection, contextual character substitution, bi-directional printing etc. for you.

Although it (and its companion functions in uniscribe.dll) look complicated, they’re not when used in the most straightforward way.

Levo · August 6, 2024, 3:04am

I’m thinking out loud here

Lets say I feel up to implement everything, and part of my API is for you to specify the font face with the string you’re rendering. Would you be fine? I’d probably have an array of all the codepoints in the font but /256 so I don’t need an array of 1M elements.

You’d have to figure out what text in the buffer belongs to which face, than I’d call harfbuzz guess (the language) with that text+face combo. Would this be fine or is that a lot of work on your end? I could have some kind of font+face manager, but since I haven’t written that code I don’t know how many lines that’ll end up taking or how useful it is, because fonts may overlap.

There could be a performance penalty if I scan the text every time you call the render function so that’s why I prefer it to be on the caller side. In my font rendering code my text has an attribute table associated with it (that has nothing to do with fonts). For my usecase it’d make sense to put it there rather than scan the text everytime to figure out which face to call harfbuzz with. I’m just thinking out loud, when I write more code my thoughts will change

rtrussell · August 7, 2024, 9:53am

I don’t think that would be practical, because I would be passing a Unicode string which eventually gets rendered using context-dependent glyphs, and I don’t know what those glyphs will be!

It’s only at the lower level (e.g. within Harfbuzz) that the glyphs are determined, so it’s only then that it can be discovered whether or not they are available in the current font.

Font substitution, if supported at all, would need to take place at a lower level, and if it’s not currently supported in Harfbuzz it’s probably not reasonable to expect you to support it either.

What would be really useful is automatic script and direction detection, so that one could render a bi-directional string in one call, rather than having to split it into segments - each with a different script and/or direction - as I do now.