SDL blitting/refresh speed challenges

Some people are just never satisfied. Not content with getting cross-platform
bitmap font handling by grace of SDL and SDL_tff, I’m trying to figure out how
to make it FAST.

I’ve spent days on the problem, nothing seems to be working, and trying ideas
from publicly available SDL projects has gotten me nowhere. There are only
two lines of code that are bogging down my text-output module, both pure SDL
calls. SDL Gurus, can any of you help me figure out the problem?

Setup:

  • Windows XP, with pre-compiled SDL (SDL 1.2.11, SDL_tff 2.0.8) libraries
    and .dlls for Windows, using Visual C++ 6.0. Code is ANSI C.
  • Roughly 2 GHz machine with 1 GB RAM, Radeon 8500 display adapter with 64 MB
    memory, fully updated drivers. I know the problem isn’t with Windows or my
    machine, because the old version of the below code, using the Win 32 API, is
    about seven times as fast.

Objective:

  • Print lots of text, using bitmapped monospaced fonts, very quickly.

Problem:

  • Everything works … except the speed.

Comments:

  • I’ve profiled this code extensively. See the "THIS LINE OF CODE IS SLOW"
    comments near the bottom for the exact places needing optimization.

//
// Make an application window.
//
// Comments: I seem to be having trouble getting hardware surfaces and SDL
doesn’t
// find any video memory (possibly my video card isn’t recognized). But so
simple
// is what I try to do below that this really should not matter as much as it
does.
//

flags = SDL_FULLSCREEN | SDL_HWSURFACE | SDL_DOUBLEBUF | SDL_ANYFORMAT;

// Create an application surface using the current bit depth
Application_Surface = SDL_SetVideoMode(VideoInfo->current_w,
VideoInfo->current_h, 0, flags);

// Handle failure
if (!Application_Surface) quit(format(“Failed to create %d%d window at %d
bpp!”,
VideoInfo->current_h, VideoInfo->current_w,
VideoInfo->vfmt->BitsPerPixel));

// Render the glyphs

//
// Because “TTF_RenderText_Solid” is relatively slow, we cannot simply use it
// in “display_text()”. Instead, we pre-allocate a working glyph surface,
// filled with all possible characters of this font. This surface should be
// opaque by default.
//
// Comment: Yes, I started out trying to render text on the fly and quickly
// discovered that doing that for pages on end would bring my poor ol’ CPU to
// its knees!
//
// “Font” must contain a monospaced font. It is optional, but highly
// recommended, that all characters between 0 and 255, inclusive, be defined
// (they can be blank). Characters after 256 are ignored. XXX XXX
//
static SDL_Surface *render_glyphs(TTF_Font *font, int w, int h)
{
SDL_Surface *glyphs;
SDL_Surface *temp_glyph;
SDL_Rect dst_rect, src_rect;
int i;

// Build colors for text and background
SDL_Color bkgrnd_clr = { 255, 255, 255, 0 };
SDL_Color text_clr  = { 0, 0, 0, 0 };
u32b bkgrnd;
u32b white;


// Create an empty surface for the glyphs.  Prefer to use video memory
glyphs = SDL_CreateRGBSurface(SDL_HWSURFACE, w * 256, h, 
	8, 0, 0, 0, 0);

// Comments:  I tried using individual glyph surfaces but got no 

efficiency gains.

// Comments:  I could try using the same bit depth as the application 

surface for (hopefully) faster blitting but, when I tested this code with a
screen depth of 8 bits (256 colors, things actually went slower. So I don’t
think this is my problem…

// White is the first palette entry; black the second
SDL_SetPalette(glyphs, SDL_LOGPAL, &bkgrnd_clr, 0, 1);
SDL_SetPalette(glyphs, SDL_LOGPAL, &text_clr, 1, 1);


// Create colors for "background" and "black"
bkgrndColor = SDL_MapRGB(glyphs->format, 255, 255, 255);
textColor = SDL_MapRGB(glyphs->format, 0, 0, 0);

// Fill in the glyph surface with the background color
SDL_FillRect(glyphs, NULL, bkgrndColor);

// Request run-length acceleration (this actually makes no measurable 

difference on my machine)
SDL_SetColorKey(glyphs, SDL_RLEACCEL, 0);

// Select our first glyph cell
dst_rect.x = 0;
dst_rect.y = 0;
dst_rect.w = w;
dst_rect.h = h;

src_rect.x = 0;
src_rect.y = 0;
src_rect.w = w;
src_rect.h = h;

// Run along the glyph surface, filling it in with characters
for (i = 0; i < 256; i++, dst_rect.x += dst_rect.w)
{
	// Render this glyph in black
	temp_glyph = TTF_RenderGlyph_Solid(font, i, textColor);

	// Glyph not found -- skip it
	if (!temp_glyph) continue;

	// Blit the glyph to our surface
	SDL_BlitSurface(temp_glyph, &src_rect, glyphs, &dst_rect);

	// Erase it with the background color
	SDL_FillRect(temp_glyph, NULL, bkgrndColor);
}

// Free the temporary glyph surface
SDL_FreeSurface(temp_glyph);

// Return our surface
return (glyphs);

}

//
// Display a single line of text on screen. NEEDS EFFICIENCY WORK.
//
// (the code to allow centering, sizing, etc. has been removed for simplicity,
// as has the error-checking code)
//
static void display_text(various parameters)
{
// This is our display window. It contains sizing, font, and other
information
DISPLAY_WINDOW *window;

SDL_Rect dst_rect;
SDL_Rect src_rect;

int i;

// This is our pre-rendered glyph surface
// glyphs

// This is our application screen.  It needs to show lots of text
// Application_Surface

// Adjust the text color by modifying its palette entry
// (this line of code uses very little CPU time)
SDL_SetPalette(glyphs, SDL_LOGPAL, input_color, 1, 1);

// Remove color key, if any
if (glyphs->flags & (SDL_SRCCOLORKEY))
{
	// Remove any color key 
	SDL_SetColorKey(glyphs, SDL_RLEACCELOK, 0);
}


// Source rectangle (size of a single glyph)
src_rect.x = 0;
src_rect.y = 0;
src_rect.w = window->font_wid;
src_rect.h = window->font_hgt;

// Destination rectangle (size of a text cell)
dst_rect.x = window->window_left + window->border_left + 
	(col * window->cell_wid);
dst_rect.y = window->window_top  + window->border_top  + 
	(row * window->cell_hgt);
dst_rect.w = window->cell_wid;
dst_rect.h = window->cell_hgt;


// Advance along the string
for (i = 0; i < n; i++, dst_rect.x += dst_rect.w)
{
	// Jump to the glyph
	src_rect.x = ch * src_rect.w;

	// Blit the character to screen
	// THIS LINE OF CODE IS SLOW
	(void)SDL_BlitSurface(glyphs, &src_rect, Application_Surface, 
		&dst_rect);

	// Comments:  I've tried RLE acceleration and testing both 

hardware and software surfaces, to no avail
}

// Update the full text area
// THIS LINE OF CODE IS SLOW
SDL_UpdateRect(Application_Surface,
	window->window_left + win_ptr->border_left + 
	(col * window->cell_wid),
	window->window_top  + win_ptr->border_top  + 
	(row * window->cell_hgt),
	window->cell_wid * n,
	window->cell_hgt);

// Comments:  I tried to optimize the above by creating a queue of 

update rects and calling “SDL_UpdateRects” only when the queue is full (or the
text has finished printing), but amazingly, things got even slower. At that
point, I realized I had no clue how to optimize SDL and decided to ask for
help.
}

A.L.Moore wrote:

Some people are just never satisfied. Not content with getting cross-platform
bitmap font handling by grace of SDL and SDL_tff, I’m trying to figure out how
to make it FAST.

A few comments about your code:

  1. Try NOT to use HWSURFACES, they are slower for anything but simple
    (opaque) blits, colorkey blits and alpha blits require the blit code to
    read the destination surface, and if that’s an hardware surface it’s
    very slow.

  2. Use SDL_DisplayFormat() for the glyph surface, blit from 8 to 24bit
    requires at least a color lookup table, if you need to change the colors
    of your fonts and you use a limited set of colors maybe you’ll have more
    luck if you use N glyph surfaces where N is the number of colors you use :slight_smile:

  3. RLEACCEL is useful only for SWSURFACEs, so go to point 1 :)–
    Ing. Gabriele Greco, DARTS Engineering
    Tel: +39-0105761240 Fax: +39-0105760224
    s-mail: Via G.T. Invrea 14 - 16129 GENOVA (ITALY)

//
// Make an application window.
//
// Comments: I seem to be having trouble getting hardware surfaces and SDL
doesn’t
// find any video memory (possibly my video card isn’t recognized). But so
simple
// is what I try to do below that this really should not matter as much as it
does.
//

I belive SDL currently defaults to GDI and not directX because of poor
directX support in new drivers, search the mail achives for the exact
details. And GDI has no hardware surfaces…

flags = SDL_FULLSCREEN | SDL_HWSURFACE | SDL_DOUBLEBUF | SDL_ANYFORMAT;

Simple performance guideline: Only use hardware surfaces if blitting can
be described as a memcpy that does only row-by-row alignment and nothing
else.

// Create an empty surface for the glyphs. Prefer to use video memory
glyphs = SDL_CreateRGBSurface(SDL_HWSURFACE, w * 256, h,
8, 0, 0, 0, 0);

Use a software surface for this, and change it’s size to w, h*256. See
below for motivation.

// Request run-length acceleration (this actually makes no measurable
difference on my machine)
SDL_SetColorKey(glyphs, SDL_RLEACCEL, 0);

Never use RLEACCEL on hardware surfaces as this is CPU only blitting. And
only use it if the surface has large blocks of simillar colors.

// Run along the glyph surface, filling it in with characters
for (i = 0; i < 256; i++, dst_rect.x += dst_rect.w)
{

Change this to tile on increasing y coordinates instead.

//
// Display a single line of text on screen. NEEDS EFFICIENCY WORK.
//
// (the code to allow centering, sizing, etc. has been removed for simplicity,
// as has the error-checking code)
//
static void display_text(various parameters)
{
// This is our display window. It contains sizing, font, and other
information
DISPLAY_WINDOW *window;

SDL_Rect dst_rect;
SDL_Rect src_rect;

int i;

// This is our pre-rendered glyph surface
// glyphs

// This is our application screen. It needs to show lots of text
// Application_Surface

// Adjust the text color by modifying its palette entry
// (this line of code uses very little CPU time)
SDL_SetPalette(glyphs, SDL_LOGPAL, input_color, 1, 1);

// Advance along the string
for (i = 0; i < n; i++, dst_rect.x += dst_rect.w)
{
// Jump to the glyph
src_rect.x = ch * src_rect.w;

Change this to use y coordinates instead…

  // Blit the character to screen
  // THIS LINE OF CODE IS SLOW
  (void)SDL_BlitSurface(glyphs, &src_rect, Application_Surface,
  	&dst_rect);

… and this function will be somewhat faster, since you removed around
window->font_hgt cache misses for each call. And each time you have a
cache miss, your CPU is only running at 100-400 MHz…

Also note that this blit needs to do on-the-fly conversions, and this can
only be performed with the CPU. If you only need a few diffrent colors,
you can try using SDL_DisplaySurface() for each color as this will give a
huge boost.

Try downloading the SDL source and poke around in it. My gut feeling is
that the nececary transformations is the reason blitting is slow, and
there may be a benefit in writing a custom blitter for some cases.

// Update the full text area
// THIS LINE OF CODE IS SLOW
SDL_UpdateRect(Application_Surface,
window->window_left + win_ptr->border_left +
(col * window->cell_wid),
window->window_top + win_ptr->border_top +
(row * window->cell_hgt),
window->cell_wid * n,
window->cell_hgt);

// Comments: I tried to optimize the above by creating a queue of
update rects and calling “SDL_UpdateRects” only when the queue is full (or the
text has finished printing), but amazingly, things got even slower. At that
point, I realized I had no clue how to optimize SDL and decided to ask for
help.

This should only be called when you are done drawing all text. And
since you are using SDL_DOUBLEBUF you realy should be using SDL_Flip()
instead.On Thu, 22 Mar 2007, A.L.Moore wrote: