The seeming incredible cost of SDL_UpdateRect

I am having difficulty getting my SDL application to run at a decent
speed. My application is an emulator, and it draws 50 frames per second.
I draw straight to the surface returned by SDL_SetVideoMode, then use
UpdateRect. This is a windowed application under Windows 2000. When
calling SetVideoMode, I use the flag SDL_ANYFORMAT and specify a bpp of
0, in order to use the native pixel format of my display.

Per frame, my program operation is to do emulation of the system for the
time duration of the frame, then to lock my surface, draw any changes to
it, unlock the surface and UpdateRect. This happens at most 50 times a
second - a combination of GetTickCount and Delay making sure of that. My
display is running at 75Hz.

My card supports hardware acceleration, including hardware and software
surface blitting.

I find that if I don’t use UpdateRect, then my image data never appears,
seeming to hint strongly that SDL is not so Direct as its name might
like to pretend.

Compared to an earlier version of my program which ran natively under
DirectX, I found my SDL version to be much slower. So I profiled it, and
incredibly UpdateRect is taking 78% of the processing time. 78%! For a
function that by rights should do nothing.

I’ve checked the pixelformat of the SDL_Surface returned by SetVideoMode
and it seems to match that of my display. I don’t know my byte ordering
offhand, but SDL certainly comes up with a 24bpp pixel format without
any clues from me.

So, what’s going on? Why is UpdateRect so expensive, and is there any
way I can cut its cost? If not then I guess I cannot justify the use of SDL.

-Thomas

Compared to an earlier version of my program which ran natively under
DirectX, I found my SDL version to be much slower. So I profiled it, and
incredibly UpdateRect is taking 78% of the processing time. 78%! For a
function that by rights should do nothing.

Have you been able to step into the function and see what’s going on?
If you want, you can send me your code (and your DirectX code) and I’ll
take a look too.

You might pass SDL_HWSURFACE to SDL_SetVideoMode() as well, and see if
that improves things, but you’ll have to lock the video surface before
you write to it.

See ya!
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Perhaps you should post some code, so that the SDL uber-gurus (which I am not)
can give you a hand. If you’re using SDL_UpdateRect a lot, try using
SDL_UpdateRects instead. Or try double-buffering (in which case neither
SDL_UpdateRect nor SDL_UpdateRects would be needed, just a call to SDL_Flip).

You said your card supports hardware acceleration, but is it actually getting
used? Some cards can only do it when in full screen mode. You might try this
C code:
if((screen->flags & SDL_HWSURFACE) == 0)
fprintf(stderr,“Warning: can’t get hardware surface\n”);

You’ll need to start your program from the console to see that message
(Start->Programs->Accessories->Command Prompt). Using software surfaces is
much slower than hardware surfaces.

Also, why do you assume that SDL_UpdateRect should do nothing?

-Sean Ridenour> I am having difficulty getting my SDL application to run at a decent

speed. My application is an emulator, and it draws 50 frames per second.
I draw straight to the surface returned by SDL_SetVideoMode, then use
UpdateRect. This is a windowed application under Windows 2000. When
calling SetVideoMode, I use the flag SDL_ANYFORMAT and specify a bpp of
0, in order to use the native pixel format of my display.

Per frame, my program operation is to do emulation of the system for the
time duration of the frame, then to lock my surface, draw any changes to
it, unlock the surface and UpdateRect. This happens at most 50 times a
second - a combination of GetTickCount and Delay making sure of that. My
display is running at 75Hz.

My card supports hardware acceleration, including hardware and software
surface blitting.

I find that if I don’t use UpdateRect, then my image data never appears,
seeming to hint strongly that SDL is not so Direct as its name might
like to pretend.

Compared to an earlier version of my program which ran natively under
DirectX, I found my SDL version to be much slower. So I profiled it, and
incredibly UpdateRect is taking 78% of the processing time. 78%! For a
function that by rights should do nothing.

I’ve checked the pixelformat of the SDL_Surface returned by SetVideoMode
and it seems to match that of my display. I don’t know my byte ordering
offhand, but SDL certainly comes up with a 24bpp pixel format without
any clues from me.

So, what’s going on? Why is UpdateRect so expensive, and is there any
way I can cut its cost? If not then I guess I cannot justify the use of
SDL.

-Thomas

Sean Ridenour:

If you’re using SDL_UpdateRect a lot, try using
SDL_UpdateRects instead. Or try double-buffering (in which case neither
SDL_UpdateRect nor SDL_UpdateRects would be needed, just a call to
SDL_Flip).

I’m using SDL_UpdateRect exactly once per frame, after I finish writing to
the frame buffer. Loop is, in psuedo:

FrameBuffer = SDL_SetVideoMode
while(looping)
{
PerformEmulation();

Lock(FrameBuffer);
draw all altered pixels;
Unlock(FrameBuffer);
UpdateRect(FrameBuffer, 0, 0, 0, 0);

NewTime = GetTicks;
Difference = (Uint32)(NewTime - OldTime);
if(Difference < 20 && Difference > 10)
Delay(Difference);
OldTime += 20;
}

The GetTick/Delay logic is slightly more complicated than that with respect
to allowing for being hoplessly behind, but that is the gist. To put things
entirely in perspective, this is a Celeron 533 with integrated graphics
based on the intel i742 chipset (which has accelerated 2d/3d under both
Windows and X). If I disable UpdateRect then I get no display, but the “draw
all altered pixels” code starts to occupy about 40% of my time, and other
segments of my code scale accordingly.

You said your card supports hardware acceleration, but is it actually
getting
used?

I based my assertion on the contents of the SDL VideoInfo struct, but will
try the test you suggested.

Also, why do you assume that SDL_UpdateRect should do nothing?

I’m locking the front surface - that returned by SDL_SetVideoMode - then
drawing to it in its native format and unlocking it. The native format of
the surface is picked by SDL to match the video hardware, so the bytes I
write are identical to the bytes that end up in the framebuffer. Therefore,
SDL has to do no processing on my data between me writing it appearing on
screen.

At the very most I’d expect SDL to be keeping a secondary HW surface and
blitting that on UpdateRect, allowing for clipping rectangles due to any
obscuring windows I have in the foreground or whatever. However I would be
highly surprised if that blit took 78% of my processing time.

Sean Ridenour:

Perhaps you should post some code, so that the SDL uber-gurus (which I am
not)
can give you a hand.
Sam Lantinga:
If you want, you can send me your code (and your DirectX code) and I’ll
take a look too.

I’m moving today (was hoping that I was making a common or obvious mistake
and so there might just be a solution waiting when I moved… hoping being
the operative), will slice my code up for posting once settled at new
address - shouldn’t be too long.

Thanks for the tips so far!

-Thomas

I’m moving today (was hoping that I was making a common or obvious mistake
and so there might just be a solution waiting when I moved… hoping being
the operative), will slice my code up for posting once settled at new
address - shouldn’t be too long.

I’ve found some time now.

And, thanks to Sean Ridenour, I found out what is going on. For whatever
reason, SDL is unable to create a hardware surface for my ‘FrameBuffer’
(w.r.t. the code below), so the 78% must be the cost of trying to throw
about 47mb (in the 24bpp mode my card defaults to) across the bus every
second.

Specifically, I find that the flags member of my front surface does not have
the SDL_HWSURFACE flag set, even though SetVideoMode succeeds with the
SDL_HWSURFACE flag. Which I hadn’t thought to check.

The significant parts of my SDL code follow below. As I say, on my Celeron
533, UpdateRect is given as taking 78% of processing time. Trying the same
thing here on a Celeron 1.4 with pretty much the same graphics card gives
pretty much the same results. DXDiag claims that the graphics card on the
1.4Ghz system has 32mb “Approx. Total Memory”, and it is running in 1024x768
24bit. I therefore have no idea why I cannot get a video buffer. The card in
my 533Mhz is in the same mode but apparently only has 4mb “Approx. Total
Memory”.

I have, in the short term, removed my DirectX code entirely from the project
having had nothing but excellent reports on the Linux related SDL port of my
project. I will endeavour dearchive it and post it in the near future.

Code:

CDisplay::CDisplay()
{
DispFlags = SDL_FULLSCREEN | SDL_ANYFORMAT;
ToggleFullScreen();
}

void CDisplay::ToggleFullScreen()
{
do
{
DispFlags ^= SDL_FULLSCREEN;

if(DispFlags&SDL_FULLSCREEN)
{
FrameBuffer = SDL_SetVideoMode(800, 600, 8, DispFlags | SDL_HWPALETTE |
SDL_HWSURFACE);
if(!FrameBuffer) FrameBuffer = SDL_SetVideoMode(800, 600, 8, DispFlags |
SDL_HWPALETTE);
if(!FrameBuffer) FrameBuffer = SDL_SetVideoMode(800, 600, 8, DispFlags);
SDL_ShowCursor(SDL_DISABLE);
}
else
{
/* NB: this first DispFlags | SDL_HWSURFACE succeeds on my hardware */
FrameBuffer = SDL_SetVideoMode(640, 512, 0, DispFlags | SDL_HWSURFACE);
if(!FrameBuffer) = SDL_SetVideoMode(640, 512, 0, DispFlags);
SDL_ShowCursor(SDL_ENABLE);

/* This does happen! */
if(!(FrameBuffer->flags & SDL_HWSURFACE))
fprintf(stderr,“Warning: can’t get hardware surface\n”);

}
}
while(!FrameBuffer);

if(FrameBuffer->format->BytesPerPixel == 1)
{
SDL_Color Pal[16];

int c = 8;
while(c–)
{
Pal[c^7].r = (c&4) ? 0xff : 0;
Pal[c^7].g = (c&2) ? 0xff : 0;
Pal[c^7].b = (c&1) ? 0xff : 0;
}

SDL_SetColors(FrameBuffer, Pal, 0, 16);
}
}

void CDisplay::DrawDisplay()
{
if(!SDL_LockSurface(FrameBuffer))
{
/* draw some pixels here */
SDL_UnlockSurface(FrameBuffer);
SDL_UpdateRect(FrameBuffer, XOffset, YOffset, XOffset+640, YOffset+512);
}
}

void Go()
{
SDL_Init(SDL_INIT_EVERYTHING);
CDisplay *Disp = new CDisplay();

Uint32 FrameStart = SDL_GetTicks();
bool Quit = false;
while(!Quit)
{
/* (paired down) event management */
SDL_Event ev;
while(SDL_PollEvent(&ev))
{
switch(ev.type)
{
case SDL_QUIT : Quit = true; break;
}
}

/* there is actually processing between these in the real program, but the
point is that two frames are expected in 40ms */
Disp->DrawDisplay();
Disp->DrawDisplay();

/* timing */
Uint32 Difference = SDL_GetTicks();
Difference -= FrameStart;

Difference -= 40;
FrameStart += 40;

if(Difference < 40)
SDL_Delay(Difference);
else
if(Difference > 460)
FrameStart = SDL_GetTicks();
}
}

-Thomas

I’m moving today (was hoping that I was making a common or obvious mistake
and so there might just be a solution waiting when I moved… hoping being
the operative), will slice my code up for posting once settled at new
address - shouldn’t be too long.

I’ve found some time now.

And, thanks to Sean Ridenour, I found out what is going on. For whatever
reason, SDL is unable to create a hardware surface for my ‘FrameBuffer’
(w.r.t. the code below), so the 78% must be the cost of trying to throw
about 47mb (in the 24bpp mode my card defaults to) across the bus every
second.

Try turning the #if 0 at line 1397 in SDL_dx5video.c to #if 1 and see if
that works for you.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Try double buffering. According to the SDL docs, it only works with hardware
surfaces, but if a hardware surface isn’t available then it will just
SDL_UpdateRect() the whole screen for you. The lines marked with <--------
are the ones you need to change.

There’s a nice article on O’Reilly about SDL hardware surfaces. The URL is
http://linux.oreillynet.com/pub/a/linux/2003/08/07/sdl_anim.html

To use double buffering, do something like this:

CDisplay::CDisplay()
{
DispFlags = SDL_FULLSCREEN | SDL_ANYFORMAT | SDL_DOUBLEBUF; <--------
ToggleFullScreen();
}
void CDisplay::ToggleFullScreen()
{
do
{
DispFlags ^= SDL_FULLSCREEN;

if(DispFlags&SDL_FULLSCREEN)
{
FrameBuffer = SDL_SetVideoMode(800, 600, 8, DispFlags | SDL_HWPALETTE |
SDL_HWSURFACE);
if(!FrameBuffer) FrameBuffer = SDL_SetVideoMode(800, 600, 8, DispFlags |
SDL_HWPALETTE);
if(!FrameBuffer) FrameBuffer = SDL_SetVideoMode(800, 600, 8, DispFlags);
SDL_ShowCursor(SDL_DISABLE);
}
else
{
/* NB: this first DispFlags | SDL_HWSURFACE succeeds on my hardware */
FrameBuffer = SDL_SetVideoMode(640, 512, 0, DispFlags | SDL_HWSURFACE);
if(!FrameBuffer) = SDL_SetVideoMode(640, 512, 0, DispFlags);
SDL_ShowCursor(SDL_ENABLE);

/* This does happen! */
if(!(FrameBuffer->flags & SDL_HWSURFACE))
fprintf(stderr,“Warning: can’t get hardware surface\n”);
}
}
while(!FrameBuffer);

if(FrameBuffer->format->BytesPerPixel == 1)
{
SDL_Color Pal[16];

int c = 8;
while(c–)
{
Pal[c^7].r = (c&4) ? 0xff : 0;
Pal[c^7].g = (c&2) ? 0xff : 0;
Pal[c^7].b = (c&1) ? 0xff : 0;
}

SDL_SetColors(FrameBuffer, Pal, 0, 16);
}
}

void CDisplay::DrawDisplay()
{
if(!SDL_LockSurface(FrameBuffer))
{
/* draw some pixels here */
SDL_UnlockSurface(FrameBuffer);
SDL_Flip(FrameBuffer); <--------
}
}

void Go()
{
SDL_Init(SDL_INIT_EVERYTHING);
CDisplay *Disp = new CDisplay();

Uint32 FrameStart = SDL_GetTicks();
bool Quit = false;
while(!Quit)
{
/* (paired down) event management */
SDL_Event ev;
while(SDL_PollEvent(&ev))
{
switch(ev.type)
{
case SDL_QUIT : Quit = true; break;
}
}

/* there is actually processing between these in the real program, but
the point is that two frames are expected in 40ms */
Disp->DrawDisplay();
Disp->DrawDisplay();

/* timing */
Uint32 Difference = SDL_GetTicks();
Difference -= FrameStart;

Difference -= 40;
FrameStart += 40;

if(Difference < 40)
SDL_Delay(Difference);
else
if(Difference > 460)
FrameStart = SDL_GetTicks();
}
}

-Sean Ridenour