Better performance needed - am I doing this right?

I need to improve the drawing performance in my BASIC emulator which uses
SDL for it’s output (on a Linux system).

Because I don’t know what the foreground and background colours will be
until I actually come to draw the character I’m not keeping a set of memory
mapped character images and just blitting them. Instead, (partly for
historical reasons) I have a set of 8 byte pixel masks for each of the 8x8
characters.

My (extraneous logic stripped out) code looks like this:

if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER) < 0) {
fprintf(stderr, “Unable to init SDL: %s\n”, SDL_GetError());
return FALSE;
}

screen0 = SDL_SetVideoMode(SCREEN_WIDTH, SCREEN_HEIGHT, 32, 0);
if (!screen0) {
fprintf(stderr, “Failed to open screen: %s\n”, SDL_GetError());
return FALSE;
}

fontbuf = SDL_CreateRGBSurface(SDL_SWSURFACE, 8, 8, 32,
0xff000000, 0x00ff0000, 0x0000ff00, 0x000000ff);
sdl_fontbuf = SDL_ConvertSurface(fontbuf, screen0->format, 0);
/* copy surface to get same format as main windows */
SDL_FreeSurface(fontbuf);

The idea being that I will draw the character in fontbuf and then blit it
into screen0. The char drawing code looks like:

void sdlchar(char ch) {
int32 y, line;
place_rect.x = xtext * 8;
place_rect.y = ytext * 8;
SDL_FillRect(sdl_fontbuf, NULL, tb_colour);
for (y = 0; y < 8; y++) {
line = sysfont[ch-’ '][y];
if (line != 0) {
if (line & 0x80) ((Uint32)sdl_fontbuf->pixels + 0 +y8) = tf_colour;
if (line & 0x40) ((Uint32)sdl_fontbuf->pixels + 1 +y
8) = tf_colour;
if (line & 0x20) ((Uint32)sdl_fontbuf->pixels + 2 +y8) = tf_colour;
if (line & 0x10) ((Uint32)sdl_fontbuf->pixels + 3 +y
8) = tf_colour;
if (line & 0x08) ((Uint32)sdl_fontbuf->pixels + 4 +y8) = tf_colour;
if (line & 0x04) ((Uint32)sdl_fontbuf->pixels + 5 +y
8) = tf_colour;
if (line & 0x02) ((Uint32)sdl_fontbuf->pixels + 6 +y8) = tf_colour;
if (line & 0x01) ((Uint32)sdl_fontbuf->pixels + 7 +y
8) = tf_colour;
}
}
SDL_BlitSurface(sdl_fontbuf, &font_rect, screen0, &place_rect);
if (echo) SDL_Flip(screen0);
}

The “echo” flag is there so that I only do the buffer flip when I need to
actually see the output (so it’s clear when I’m outputting multiple chars
until the last one).

Comments and suggestions please.

Colin–
Colin Tuckley | @Colin_Tuckley | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

Banging your head against the wall uses 120 calories an hour.

I need to improve the drawing performance in my BASIC emulator which uses
SDL for it’s output (on a Linux system).

Because I don’t know what the foreground and background colours will be
until I actually come to draw the character I’m not keeping a set of
memory
mapped character images and just blitting them. Instead, (partly for
historical reasons) I have a set of 8 byte pixel masks for each of the 8x8
characters.

My (extraneous logic stripped out) code looks like this:

if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER) < 0) {
fprintf(stderr, “Unable to init SDL: %s\n”, SDL_GetError());
return FALSE;
}

screen0 = SDL_SetVideoMode(SCREEN_WIDTH, SCREEN_HEIGHT, 32, 0);
if (!screen0) {
fprintf(stderr, “Failed to open screen: %s\n”, SDL_GetError());
return FALSE;
}

fontbuf = SDL_CreateRGBSurface(SDL_SWSURFACE, 8, 8, 32,
0xff000000, 0x00ff0000, 0x0000ff00, 0x000000ff);
sdl_fontbuf = SDL_ConvertSurface(fontbuf, screen0->format, 0);
/* copy surface to get same format as main windows */
SDL_FreeSurface(fontbuf);

SDL_HWSURFACE normaly is faster then SDL_SWSURFACE.

The idea being that I will draw the character in fontbuf and then blit it

into screen0. The char drawing code looks like:

void sdlchar(char ch) {
int32 y, line;
place_rect.x = xtext * 8;
place_rect.y = ytext * 8;
SDL_FillRect(sdl_fontbuf, NULL, tb_colour);
for (y = 0; y < 8; y++) {
line = sysfont[ch-’ '][y];
if (line != 0) {
if (line & 0x80) ((Uint32)sdl_fontbuf->pixels + 0 +y8) =
tf_colour;
if (line & 0x40) ((Uint32)sdl_fontbuf->pixels + 1 +y
8) =
tf_colour;
if (line & 0x20) ((Uint32)sdl_fontbuf->pixels + 2 +y8) =
tf_colour;
if (line & 0x10) ((Uint32)sdl_fontbuf->pixels + 3 +y
8) =
tf_colour;
if (line & 0x08) ((Uint32)sdl_fontbuf->pixels + 4 +y8) =
tf_colour;
if (line & 0x04) ((Uint32)sdl_fontbuf->pixels + 5 +y
8) =
tf_colour;
if (line & 0x02) ((Uint32)sdl_fontbuf->pixels + 6 +y8) =
tf_colour;
if (line & 0x01) ((Uint32)sdl_fontbuf->pixels + 7 +y
8) =
tf_colour;
}
}
SDL_BlitSurface(sdl_fontbuf, &font_rect, screen0, &place_rect);
if (echo) SDL_Flip(screen0);

SDL_Flip works nice with doublebuffer active. If you dont have double buffer
active, try to use SDL_UpdateRects way.

}On 6/6/07, Colin Tuckley wrote:

The “echo” flag is there so that I only do the buffer flip when I need to
actually see the output (so it’s clear when I’m outputting multiple chars
until the last one).

Comments and suggestions please.

Colin


Colin Tuckley | colin at tuckley.org | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

Banging your head against the wall uses 120 calories an hour.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


R?ben L?cio Reis

Game Developer
Linux user #433535

R?ben L?cio wrote:

SDL_HWSURFACE normaly is faster then SDL_SWSURFACE.

Unfortunately that isn’t an option, I’m in an XWindows environment on Linux
and can’t use HWSURFACE.

SDL_Flip works nice with doublebuffer active. If you dont have double
buffer active, try to use SDL_UpdateRects way.

The docs say that calling SDL_Flip is equiv to calling
SDL_UpdateRect(screen, 0, 0, 0, 0) for non HW surfaces.

regards,

Colin–
Colin Tuckley | @Colin_Tuckley | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

It is well known that Discworld trolls loose intelligence as they warm up.
Does this mean that a particularly hot headed troll would be a lava lout?

R?ben L?cio wrote:

SDL_HWSURFACE normaly is faster then SDL_SWSURFACE.

Unfortunately that isn’t an option, I’m in an XWindows environment on
Linux
and can’t use HWSURFACE.

SDL_Flip works nice with doublebuffer active. If you dont have double
buffer active, try to use SDL_UpdateRects way.

The docs say that calling SDL_Flip is equiv to calling
SDL_UpdateRect(screen, 0, 0, 0, 0) for non HW surfaces.

Dont update all screen every time, update only what you change, that is the
way I’m doing in my current game, it do video speed up as well.

regards,On 6/6/07, Colin Tuckley wrote:

Colin


Colin Tuckley | colin at tuckley.org | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

It is well known that Discworld trolls loose intelligence as they warm up.
Does this mean that a particularly hot headed troll would be a lava lout?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


R?ben L?cio Reis

Game Developer
Linux user #433535

SDL_HWSURFACE normaly is faster then SDL_SWSURFACE.

Unfortunately that isn’t an option, I’m in an XWindows environment on Linux
and can’t use HWSURFACE.

Wasn’t a good idea anyhow, seeing how you modify the surface for each character you output.

SDL_Flip works nice with doublebuffer active. If you dont have double
buffer active, try to use SDL_UpdateRects way.

The docs say that calling SDL_Flip is equiv to calling
SDL_UpdateRect(screen, 0, 0, 0, 0) for non HW surfaces.

Maybe you should use a dirty-rectangle approach over flipping (use UpdateRects over the regions that changed only), but it depends on how much of the screen actually change (I get the impression that it isn’t much). You’d be helped to get some timings and figure out where the time is actually spent. I’d wager flips.

If we ignore screen updates for a minute, I’d probably try the glyph strip (vertical, for cache-reasons) using a paletted SWSURFACE and then use SDL_SetColors() to get the color right before using a normal surface-to-surface blit. Can’t swear it’d faster though, because we are talking about very small blits here, and you’d incur a conversion hit too.On Wed, 06 Jun 2007 19:18:53 +0200, Colin Tuckley wrote:


“There is no fitness function for ‘fun’” – John Hancock
Eddy L O Jansson | http://gazonk.org/~eloj

R?ben L?cio wrote:

Dont update all screen every time, update only what you change, that is
the way I’m doing in my current game, it do video speed up as well.

I’ve tried that too, I’ve replaced:

if (echo) SDL_Flip(screen0);

with

if (echo) SDL_UpdateRect(screen0, xtext * 8, ytext * 8, 8, 8);

so that only the 8x8 cell I’ve changed is updated. It doesn’t make any
difference.

My test program is a small BASIC program that counts to 100 printing each
number in the same place on the screen while keeping track of clock ticks.
So with cursor movement and drawing I’m drawing about 10 chars worth of
pixels for each count. The results are pretty consistent.

Colin–
Colin Tuckley | @Colin_Tuckley | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

It is well known that Discworld trolls loose intelligence as they warm up.
Does this mean that a particularly hot headed troll would be a lava lout?

Eddy L O Jansson wrote:

Wasn’t a good idea anyhow, seeing how you modify the surface for each character you output.

I don’t always modify the surface for each character, when printing a
number for example it only gets updated after all the digits have been drawn.

Maybe you should use a dirty-rectangle approach

As I said to R?ben, I’ve tried that.

You suggested timing things, How? I’m begining to suspect that it’s all the
pixel manipulation thats taking the time.

If we ignore screen updates for a minute, I’d probably try the glyph strip (vertical, for cache-reasons) using a paletted SWSURFACE and then use SDL_SetColors() to get the color right before using a normal surface-to-surface blit. Can’t swear it’d faster though, because we are talking about very small blits here, and you’d incur a conversion hit too.

Do you have an example somewhere of doing that which I could look at?

regards,

Colin–
Colin Tuckley | @Colin_Tuckley | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

It is well known that Discworld trolls loose intelligence as they warm up.
Does this mean that a particularly hot headed troll would be a lava lout?

Maybe you should use a dirty-rectangle approach

As I said to R?ben, I’ve tried that.

Even so, I’d be chocked if this wasn’t an update issue. Do you have just the one SDL_Flip in your code? Whatever the case, its placement is odd and probably bad design, unless merely temporary for debugging. It’s not clear to me that what you actually measured when you say it didn’t make a difference. (Sorry to be a doubter, but this “SDL is slow” thing has come up a “few” times here in the past)

You suggested timing things, How? I’m begining to suspect that it’s all the
pixel manipulation thats taking the time.

You can use Uint32 SDL_GetTicks(void) to track the relative time spent doing different tasks. Collect and compare the ticks spent doing pixel manipulations vs the time spent with SDL_Flip would be my first idea.

Do you have an example somewhere of doing that which I could look at?

No really, but it’s fairly straight forward: Prepare a surface with room for all your glyphs, get them on there, and then use just a SDL_Rect to slide to the correct y-position and blit to your destination surface.

One thing, if you are going to fiddle with ->pixels, don’t forget to add if( SDL_MUSTLOCK(surface) ) SDL_LockSurface(surface) and the corresponding SDL_UnlockSurface() when you’re done.On Wed, 06 Jun 2007 19:49:02 +0200, Colin Tuckley wrote:

Hello,

I also use a bitmapped font. I use the putpixel example from the SDL
documentation to draw directly to the software surface.
(SDL Guide / 2. Graphics and Video)

/* draws the raw char - with no interpretation */
static void
avatar_drawchar (wint_t ch)
{
int lx, ly;
size_t font_offset;

font_offset = get_font_offset (ch);

SDL_LockSurface (screen);
for (ly = 0; ly < FONTHEIGHT; ly++)
for (lx = 0; lx < FONTWIDTH; lx++)
{
if (font[font_offset + ly] & (1 << (7 - lx)))
putpixel (screen, cursor.x + lx, cursor.y + ly, textcolor);
}
SDL_UnlockSurface (screen);
}

As you can see, I don’t draw the backgroud pixels. But you only have to
add an “else”-tree to do that.

Maybe you could use the screen locks in a larger text-context.
But in my program I really want the text to be shown letter by letter,
with a SDL_Delay in between. ;-)–
AKFoerster

if (font[font_offset + ly] & (1 << (7 - lx)))

Upps, 7 is the FONTWIDTH.
Yes, it is really 7, not 8.Am Wednesday, dem 06. Jun 2007 schrieb Andreas K. Foerster:


AKFoerster

Eddy L O Jansson wrote:

Even so, I’d be chocked if this wasn’t an update issue. Do you have just the one SDL_Flip in your code?

I’ve got several SDL_Flip calls, one in each logical screen update section.
I’ve done a bit more investigation and found that the time hog was the
cursor undraw/draw routine. I’ve changed the SDL_Flip in that to an
SDL_UpdateRect of the correct size and that has helped a lot.

It was only doing one SDL_Flip for drawing all the digits of the number, but
it was doing 4 more for the cursor moves.

You can use Uint32 SDL_GetTicks(void) to track the relative time spent doing different tasks.

I just tried this, it’s too coarse, I get the same tick number right through
drawing a character, bliting and updating.

One thing, if you are going to fiddle with ->pixels,
don’t forget to add if( SDL_MUSTLOCK(surface) ) SDL_LockSurface(surface)
and the corresponding SDL_UnlockSurface() when you’re done.

Is this really necessary if I’m not using Hardware surfaces?

regards,

Colin–
Colin Tuckley | @Colin_Tuckley | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

It is well known that Discworld trolls loose intelligence as they warm up.
Does this mean that a particularly hot headed troll would be a lava lout?

Eddy L O Jansson wrote:

Even so, I’d be chocked if this wasn’t an update issue. Do you have just the one SDL_Flip in your code?

I’ve got several SDL_Flip calls, one in each logical screen update section.
I’ve done a bit more investigation and found that the time hog was the
cursor undraw/draw routine. I’ve changed the SDL_Flip in that to an
SDL_UpdateRect of the correct size and that has helped a lot.

Usually what people do is do all the drawing for a single frame and then
when they’re all done and want to flush the changes so they are visible,
then they call SDL_Flip(). You definitely don’t want to do it when you
are still busy drawing things like the cursor.

See ya!
-Sam Lantinga, Lead Software Engineer, Blizzard Entertainment

Sam Lantinga wrote:

Usually what people do is do all the drawing for a single frame and then
when they’re all done and want to flush the changes so they are visible,
then they call SDL_Flip(). You definitely don’t want to do it when you
are still busy drawing things like the cursor.

I realise this, however this isn’t a game, it’s a BASIC interpreter. There
are times when the user is doing screen editing for example that he needs to
see the cursor movement for every keypress. It makes the logic
complicated, I try to work out how interactive the current situation is and
adjust the update/flush accordingly.

regards,

Colin–
Colin Tuckley | @Colin_Tuckley | PGP/GnuPG Key Id
+44(0)1903 236872 | +44(0)7799 143369 | 0x1B3045CE

It is well known that Discworld trolls loose intelligence as they warm up.
Does this mean that a particularly hot headed troll would be a lava lout?

Usually, even if you do a full screen redraw every frame
(with “reasonable” resolutions), you get frame rates at least twice
as high as the fastest keyboard repeat rates normally available.
There should be only two cases where you really need to update some
8x8 pixels or so at a time:

1. You want to use as little CPU time as possible.

2. You need *insane* frame rates (1000+ fps) to
   get ultra smooth, tearing free animation without
   retrace sync.

The first one is always a good idea, of course - although usually, you
also need to get the job done, so “First make it work, then make it
work fast.”

The second one is really only an option for games with mostly still
screens and a few small moving objects, and it’s just a last resort
if you can’t do it properly - that is, using a retrace synchronized
double (or triple) buffered display. (The higher the frame rate, the
smaller the difference between frames, and thus, the less tearing and
refresh/frame rate interfecence.)

//David Olofson - Programmer, Composer, Open Source Advocate

.------- http://olofson.net - Games, SDL examples -------.
| http://zeespace.net - 2.5D rendering engine |
| http://audiality.org - Music/audio engine |
| http://eel.olofson.net - Real time scripting |
’-- http://www.reologica.se - Rheology instrumentation --'On Wednesday 06 June 2007, Colin Tuckley wrote:

Sam Lantinga wrote:

Usually what people do is do all the drawing for a single frame
and then when they’re all done and want to flush the changes so
they are visible, then they call SDL_Flip(). You definitely don’t
want to do it when you are still busy drawing things like the
cursor.

I realise this, however this isn’t a game, it’s a BASIC interpreter.
There are times when the user is doing screen editing for example
that he needs to see the cursor movement for every keypress. It
makes the logic complicated, I try to work out how interactive the
current situation is and adjust the update/flush accordingly.

David Olofson wrote:> On Wednesday 06 June 2007, Colin Tuckley wrote:

Sam Lantinga wrote:

Usually what people do is do all the drawing for a single frame
and then when they’re all done and want to flush the changes so
they are visible, then they call SDL_Flip(). You definitely don’t
want to do it when you are still busy drawing things like the
cursor.

I realise this, however this isn’t a game, it’s a BASIC interpreter.
There are times when the user is doing screen editing for example
that he needs to see the cursor movement for every keypress. It
makes the logic complicated, I try to work out how interactive the
current situation is and adjust the update/flush accordingly.

Usually, even if you do a full screen redraw every frame
(with “reasonable” resolutions), you get frame rates at least twice
as high as the fastest keyboard repeat rates normally available.
There should be only two cases where you really need to update some
8x8 pixels or so at a time:

  1. You want to use as little CPU time as possible.

  2. You need insane frame rates (1000+ fps) to
    get ultra smooth, tearing free animation without
    retrace sync.

The first one is always a good idea, of course - although usually, you
also need to get the job done, so “First make it work, then make it
work fast.”

The second one is really only an option for games with mostly still
screens and a few small moving objects, and it’s just a last resort
if you can’t do it properly - that is, using a retrace synchronized
double (or triple) buffered display. (The higher the frame rate, the
smaller the difference between frames, and thus, the less tearing and
refresh/frame rate interfecence.)

Well that might be interesting if for example only a few things change
at a time. Think about some vi-like editor, not many chars change during
editing. And from what I recall from my Basic days, it was so slow that
not many things were able to move at once (granted, it was Basic on a
4Mhz cpu).

A nice side effect is that this makes the application totally suitable
for rendering remotely over the X11 protocol.

Stephane