Text output - getting more speed in SDL

The below contains the fastest text-output function I currently know how
to write for SDL. It is faster than any SDL-based code I’ve been able to find
on the Web: Not only does it run rings about “TTF_Render_Text()”, it also
leaves pre-rendering to a bitmap in the dust.

 But that don't mean a thing.  The application I'm using SDL for requires 

a highly efficient text-output algorithm, which this one isn’t. If you think
you know something about code optimization using SDL, I’d be grateful if you
had a look.

http://runegold.org/sangband/files/text_speed.zip

Any useful results will be emailed to Sam Lantinga.

Leon Marrick wrote:

 The below contains the fastest text-output function I currently know how 

to write for SDL. It is faster than any SDL-based code I’ve been able to find
on the Web: Not only does it run rings about “TTF_Render_Text()”, it also
leaves pre-rendering to a bitmap in the dust.

 But that don't mean a thing.  The application I'm using SDL for requires 

a highly efficient text-output algorithm, which this one isn’t. If you think
you know something about code optimization using SDL, I’d be grateful if you
had a look.

http://runegold.org/sangband/files/text_speed.zip

It looks rather horrifying. Saying “run rings about
TTF_Render_Text()” is over-simplifying a fair bit, since your
implementation cannot do grayscale glyphs. If someone wants a
general text output system that can be easily extended to use TTF
with grayscale, your implementation may need to be rewritten in
the long term.

Why isn’t a glyph cache used? Is there some sort of memory
restriction? Do you really need the performance? The maximum
performance you need is what is enough to render a screenful of
glyphs at monitor refresh rate; anything more would not really
improve anything, I think.

Any useful results will be emailed to Sam Lantinga.

I don’t see the point of this; such implementation details are the
application domain, not in the libsdl domain, so why is there a
need for the above sentence?–
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

It looks rather horrifying.

I wouldn’t have gone that far.

But I’m curious if keeping an array of memory offsets where a glyph’s
pixels exist would actually beat SDL’s blitters…the method we’re
discussing writes to less memory that a default surface blit, but it
doesn’t take advantage of the possibility of blitting between hardware
surfaces, or that pixel formats, alpha blending, and conversions can be
handled without cluttering your application and possibly with
MMX/Altivec/SSE behind the scenes.

Colorkey blitting with RLE encoding might achieve close to the same
speed inside SDL for about the same memory usage, too.

(completed untested) I’d have probably kept a table of SDL_Surfaces, one
glyph per surface, hashed by their codepoint and created/cached as
needed, and just blitted from them as I needed to draw glyphs to a
surface. If I wanted to gamble, I could try for things like colorkey
blits and hardware surfaces.

…but I’d probably just build OpenGL display lists from the vertex data
that Freetype can provide and get hardware-accelerated manipulation of
the glyphs for “blitting” or a million other things, which is going to
smoke any possible approach we take here in terms of performance,
features, and application ease-of-use.

It might be nice to add a function to SDL_ttf to get a vertex buffer
back instead of an SDL_Surface…

–ryan.

It looks rather horrifying.

I wouldn’t have gone that far.

But I’m curious if keeping an array of memory offsets where a glyph’s
pixels exist would actually beat SDL’s blitters…the method we’re
discussing writes to less memory that a default surface blit, but it
doesn’t take advantage of the possibility of blitting between hardware
surfaces, or that pixel formats, alpha blending, and conversions can be
handled without cluttering your application and possibly with
MMX/Altivec/SSE behind the scenes.

Colorkey blitting with RLE encoding might achieve close to the same
speed inside SDL for about the same memory usage, too.

(completed untested) I’d have probably kept a table of SDL_Surfaces, one
glyph per surface, hashed by their codepoint and created/cached as
needed, and just blitted from them as I needed to draw glyphs to a
surface. If I wanted to gamble, I could try for things like colorkey
blits and hardware surfaces.

…but I’d probably just build OpenGL display lists from the vertex data
that Freetype can provide and get hardware-accelerated manipulation of
the glyphs for “blitting” or a million other things, which is going to
smoke any possible approach we take here in terms of performance,
features, and application ease-of-use.

It might be nice to add a function to SDL_ttf to get a vertex buffer
back instead of an SDL_Surface…

That would be really nice, it would let you do a lot of nice special 3D
text effects that currently require the use of other libraries like
ftgl.

Back to the origianl question, I have posted a very fast, very simple
OpenGL glyph rendering class to this list several times in the past. You
can search the archive to see find it. I also have a very complex and
rather heavy weight, but pretty fast text rendering system for SDL
with/or without OpenGL, on my web site at gameprogrammer.com.

As Ryan and other have said, proper use of caching, either for
prerendered glyphs or of display list yields very fast text rendering.

	Bob PendletonOn Sun, 2007-04-15 at 05:20 -0400, Ryan C. Gordon wrote:

–ryan.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±-------------------------------------+

Wow man Bob! Great work man.
Thats the best way to render fonts

2007/4/15, Bob Pendleton :>

On Sun, 2007-04-15 at 05:20 -0400, Ryan C. Gordon wrote:

It looks rather horrifying.

I wouldn’t have gone that far.

But I’m curious if keeping an array of memory offsets where a glyph’s
pixels exist would actually beat SDL’s blitters…the method we’re
discussing writes to less memory that a default surface blit, but it
doesn’t take advantage of the possibility of blitting between hardware
surfaces, or that pixel formats, alpha blending, and conversions can be
handled without cluttering your application and possibly with
MMX/Altivec/SSE behind the scenes.

Colorkey blitting with RLE encoding might achieve close to the same
speed inside SDL for about the same memory usage, too.

(completed untested) I’d have probably kept a table of SDL_Surfaces, one
glyph per surface, hashed by their codepoint and created/cached as
needed, and just blitted from them as I needed to draw glyphs to a
surface. If I wanted to gamble, I could try for things like colorkey
blits and hardware surfaces.

…but I’d probably just build OpenGL display lists from the vertex data
that Freetype can provide and get hardware-accelerated manipulation of
the glyphs for “blitting” or a million other things, which is going to
smoke any possible approach we take here in terms of performance,
features, and application ease-of-use.

It might be nice to add a function to SDL_ttf to get a vertex buffer
back instead of an SDL_Surface…

That would be really nice, it would let you do a lot of nice special 3D
text effects that currently require the use of other libraries like
ftgl.

Back to the origianl question, I have posted a very fast, very simple
OpenGL glyph rendering class to this list several times in the past. You
can search the archive to see find it. I also have a very complex and
rather heavy weight, but pretty fast text rendering system for SDL
with/or without OpenGL, on my web site at gameprogrammer.com.

As Ryan and other have said, proper use of caching, either for
prerendered glyphs or of display list yields very fast text rendering.

            Bob Pendleton

–ryan.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±-------------------------------------+


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Ryan C. Gordon wrote:

It looks rather horrifying.

I wouldn’t have gone that far.
[snip snip]

My apologies to Leon, that came out a bit harsh; it was very poor
phrasing from me. From the code, it appeared to me that he had
requirements that most other SDL apps wouldn’t have, plus given
the bit of “sangband” in the URL, I kept wondering if that kind of
performance, or effort in optimization, is really needed. It would
also be harder to maintain or upgrade.–
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Hi all,

Ryan C. Gordon wrote:

It looks rather horrifying.

I wouldn’t have gone that far.

But I’m curious if keeping an array of memory offsets where a glyph’s
pixels exist would actually beat SDL’s blitters…the method we’re
discussing writes to less memory that a default surface blit, but it
doesn’t take advantage of the possibility of blitting between hardware
surfaces, or that pixel formats, alpha blending, and conversions can be
handled without cluttering your application and possibly with
MMX/Altivec/SSE behind the scenes.

Colorkey blitting with RLE encoding might achieve close to the same
speed inside SDL for about the same memory usage, too.
[snip]

Here’s some data, at least, I hope the following will be helpful
in some way for people still targeting 2D modes. Timings generated
based on: Desktop is a Sempron 3000+ (power saving mode) with a
basic Nvidia (WinXP), while laptop is a old Celeron 766MHz with a
2D Trident (WinME). SDL 1.2.11 from libsdl’s MinGW developer’s
library, text_speed.c compiled using MinGW (gcc 3.4.2) on -O2.
Display modes are 32-bit, at least 1024x768.

Timings are for a single run, copied verbatim from stdout.txt
files. Timings are quite stable from run to run. For info, in the
first item, the screen is written 1000 times (actually about 1372
out of 1920 positions are written to).

No changes (desktop)--------------------

Time needed to display text pages: 15280.
- spent in text output: 984
- spent in screen refreshes: 14293
- spent in miscellaneous: 3

Time needed to display text lines: 20600.
- spent in text output: 931
- spent in screen refreshes: 19648
- spent in miscellaneous: 21

Time needed to display individual characters: 9858.
- spent in text output: 1037
- spent in screen refreshes: 8761
- spent in miscellaneous: 60

The bulk of the work is spent on SDL_UpdateRects. So one screen
update (the first item) is about 15.3ms (15280/1000).

No changes (laptop)

Time needed to display text pages: 41370.
- spent in text output: 14326
- spent in screen refreshes: 26158
- spent in miscellaneous: 886

Time needed to display text lines: 57253.
- spent in text output: 15748
- spent in screen refreshes: 40265
- spent in miscellaneous: 1240

Time needed to display individual characters: 43425.
- spent in text output: 13043
- spent in screen refreshes: 23522
- spent in miscellaneous: 6860

An old Celeron is much slower crunching the pixel array (41.4ms).
Sluggish, but turn-based games should survive.

No changes (desktop, directx)

Time needed to display text pages: 3967.
- spent in text output: 931
- spent in screen refreshes: 3028
- spent in miscellaneous: 8

Time needed to display text lines: 5366.
- spent in text output: 984
- spent in screen refreshes: 4362
- spent in miscellaneous: 20

Time needed to display individual characters: 40438.
- spent in text output: 898
- spent in screen refreshes: 39478
- spent in miscellaneous: 62

One screen update is done in about 4ms. However, for directx, the
time for individual characters is much slower. Is this due to some
kind of windib optimization?

Single UpdateRects of 800x480 (desktop)

Time needed to display text pages: 21451.
- spent in text output: 957
- spent in screen refreshes: 20490
- spent in miscellaneous: 4

This refreshes the entire screen area of 80x24 as implemented in
the code. Since more positions are drawn (1920 versus 1372), it is
slower, but a simple calculation confirms that the scaling of rect
update time is linear. So, to minimize redraw with windib, a
textmode buffer can minimize the number of changes required, if
that’s absolutely necessary.

Single UpdateRects of 800x480 (desktop)
Array of display SDL_Surface

Time needed to display text pages: 21962.
- spent in text output: 2037
- spent in screen refreshes: 19919
- spent in miscellaneous: 6

Draws the first test using surfaces generated using
SDL_DisplayFormat, using a single SDL_UpdateRects. The text output
section takes double the time of Leon’s method, but the overall
performance impact is small (22.0ms versus 21.5ms). However, if
there is a need to set many background and foreground colours,
some kind of caching will be desirable.

Single UpdateRects of 800x480 (desktop)
Array of 8-bit SDL_Surface

Time needed to display text pages: 22418.
- spent in text output: 2479
- spent in screen refreshes: 19935
- spent in miscellaneous: 4

Draws the first test using 8-bit surfaces generated by SDL_TTF,
using a single SDL_UpdateRects. This is slightly slower than
drawing SDL_DisplayFormat surfaces. Scaling from the 15.3ms in the
first results, for this method the same redraw would probably take
about 17ms. For this format, any fg/bg colour choice can be made,
with blit operation doing all the hard work. Plus, it is only
slightly slower than using opaque display-format surfaces.

Single UpdateRects of 800x480 (desktop)
Array of 8-bit SDL_Surface + SDL_SRCCOLORKEY|SDL_RLEACCEL

Time needed to display text pages: 22256.
- spent in text output: 2344
- spent in screen refreshes: 19907
- spent in miscellaneous: 5

Using SDL_SRCCOLORKEY|SDL_RLEACCEL with 8-bit surfaces is slightly
faster than without those flags. There is a single clear to
background call for each Term_text_sdl call; so if each position
can have its own fg/bg colour, this will be slower.

Single UpdateRects of 800x480 (desktop)
Array of display SDL_Surface + SDL_SRCCOLORKEY|SDL_RLEACCEL

Time needed to display text pages: 21481.
- spent in text output: 1456
- spent in screen refreshes: 20017
- spent in miscellaneous: 8

Using SDL_SRCCOLORKEY|SDL_RLEACCEL with SDL_DisplayFormat surfaces
is much faster than without those flags. There is a single clear
to background call for each Term_text_sdl call; so if each
position can have its own fg/bg colour, this will be slower.

It look like there’s not a whole lot to be gained from all the
effort optimizing this thingy for 2D video mode, especially if the
game is a turn-based one (just assuming) and windib is used.
Perhaps other graphic cards might give significantly different
results.


Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia