SDL sound under Mac OS X and Linux

Fredrick_Meunier · November 21, 2002, 1:07pm

Hi,
I am porting an emulator to OS X using SDL and I have two problems that
I am having some problem working out:

I write unsigned 8 bit samples to my sound card (padded to 16bit if
that’s all that is available). Using OSS to output the samples sounds
fine, but using SDL under Linux or OS X sounds clipped when I am
outputting maximum volume (maximum value=128 ± 124). Is this a known
problem with SDL sound output, should I be reducing the max amplitude of
my samples?
I generate 1/50th of a second of audio at a time and write it to a
ring buffer, the SDL callback then reads the data writes it to the sound
card. Under Linux this works well, CPU usage is low and there are
practically no sound glitches. The same code on Mac OS X mostly works,
but if there is any movement in other windows the sound from my app
glitches. If I minimise my app or go fullscreen the glitches dissapear.
Is anyone else seeing this and are there any suggestions about fixing
the problem?

Thanks,
Fred

Darrell_Walisser · November 22, 2002, 7:42am

Hi,
I am porting an emulator to OS X using SDL and I have two problems
that I am having some problem working out:

I write unsigned 8 bit samples to my sound card (padded to 16bit if
that’s all that is available). Using OSS to output the samples sounds
fine, but using SDL under Linux or OS X sounds clipped when I am
outputting maximum volume (maximum value=128 ± 124). Is this a known
problem with SDL sound output, should I be reducing the max amplitude
of my samples?

I’ve also noticed this in the past. I’m not sure what the cause is. It
might be a bug in mixing of 8 bit samples, or conversion of 8 bit to 16
bit samples. Does initializing SDL with 16 bit samples and converting
your samples to 16 bit yourself fix the problem?

This is pure speculation, but I think it’s possible that after SDL
hands off the samples they are processed again by the sound daemon (or
entity that controls the system sound volume) and at that point there
might be some distortion. In this case, using a sound format that
mirrors the hardware/OS setting might solve the problem.

I generate 1/50th of a second of audio at a time and write it to a
ring buffer, the SDL callback then reads the data writes it to the
sound card. Under Linux this works well, CPU usage is low and there
are practically no sound glitches. The same code on Mac OS X mostly
works, but if there is any movement in other windows the sound from my
app glitches. If I minimise my app or go fullscreen the glitches
dissapear. Is anyone else seeing this and are there any suggestions
about fixing the problem?

I’ve heard of this before. It might be because when you drag the
application window (I presume dragging other application windows
doesn’t have this problem), the main thread is blocked temporarily (I
don’t really know why it would be, though). If you are using
SDL_LockAudio() and copying to the ring buffer in the main thread, you
might have breakups if your audio buffers are too small. See if using a
larger audio buffer decreases this effect.

Presuming I’m correct in my diagnosis, I don’t think simply using a
separate thread to copy data to the ring buffer will be sufficient to
eliminate the problem. Whatever system you choose, it may be difficult
to keep hiccups in the main thread from interrupting your sound.On Thursday, November 21, 2002, at 04:06 PM, Fredrick Meunier wrote:

Stephane_Marchesin · November 22, 2002, 2:56pm

Darrell Walisser wrote:

Hi,
I am porting an emulator to OS X using SDL and I have two problems
that I am having some problem working out:

I write unsigned 8 bit samples to my sound card (padded to 16bit if
that’s all that is available). Using OSS to output the samples sounds
fine, but using SDL under Linux or OS X sounds clipped when I am
outputting maximum volume (maximum value=128 ± 124). Is this a known
problem with SDL sound output, should I be reducing the max amplitude
of my samples?

I’ve also noticed this in the past. I’m not sure what the cause is. It
might be a bug in mixing of 8 bit samples,
I don’t think so, this code is simple.

or conversion of 8 bit to 16
If you want to know if some sound conversion is going on, edit
SDL12/src/audio/SDL_audiocvt.c and #define DEBUG_CONVERT at the begging,
then rebuild sdl. You’ll have conversion messages on stdout.

Stephane> On Thursday, November 21, 2002, at 04:06 PM, Fredrick Meunier wrote:

Fredrick_Meunier · November 23, 2002, 3:42am

Hi Darell,

Darrell Walisser wrote:

I write unsigned 8 bit samples to my sound card (padded to 16bit if
that’s all that is available). Using OSS to output the samples sounds
fine, but using SDL under Linux or OS X sounds clipped when I am
outputting maximum volume (maximum value=128 ± 124). Is this a known
problem with SDL sound output, should I be reducing the max amplitude
of my samples?

I’ve also noticed this in the past. I’m not sure what the cause is. It
might be a bug in mixing of 8 bit samples, or conversion of 8 bit to 16
bit samples. Does initializing SDL with 16 bit samples and converting
your samples to 16 bit yourself fix the problem?

No, converting the samples to 16 bit myself still leaves the (loudest)
samples sounding distorted (Linux and Mac OS X), and sounding perfect on
Linux using OSS.

This is pure speculation, but I think it’s possible that after SDL hands
off the samples they are processed again by the sound daemon (or entity
that controls the system sound volume) and at that point there might be
some distortion. In this case, using a sound format that mirrors the
hardware/OS setting might solve the problem.

I’ll try a few different combinations and see if it helps.

I generate 1/50th of a second of audio at a time and write it to a
ring buffer, the SDL callback then reads the data writes it to the
sound card. Under Linux this works well, CPU usage is low and there
are practically no sound glitches. The same code on Mac OS X mostly
works, but if there is any movement in other windows the sound from my
app glitches. If I minimise my app or go fullscreen the glitches
dissapear. Is anyone else seeing this and are there any suggestions
about fixing the problem?

I’ve heard of this before. It might be because when you drag the
application window (I presume dragging other application windows doesn’t
have this problem), the main thread is blocked temporarily (I don’t
really know why it would be, though). If you are using SDL_LockAudio()
and copying to the ring buffer in the main thread, you might have
breakups if your audio buffers are too small. See if using a larger
audio buffer decreases this effect.

Dragging any window or having the dock pop up will cause glitching.
I think the main thread is not so much blocked from running as starved
of CPU as the demands of the window system or other programs increases.
I guess there are two fixes:

Make my program less CPU-hungry
Make the emulator run with real-time priority? Is there a real-time
scheduling priority available on Mac OS X?

I get much higher CPU load than I would get on Linux (Linux 6% CPU, Mac
OS X 25-50% CPU as reported by top).

I think that the remaining glitching is mostly caused by my graphics
routines not being as fast as they could be.

I create my video mode attempting to match the users display like this:

vidinfo = SDL_GetVideoInfo();
gc = SDL_SetVideoMode( width, height, vidinfo->vfmt->BitsPerPixel,
SDL_HWSURFACE|SDL_ANYFORMAT|SDL_RESIZABLE );

I have an offscreen surface that I create with:

image = SDL_CreateRGBSurface( SDL_SWSURFACE, width, height,
gc->format->BitsPerPixel, gc->format->Rmask, gc->format->Gmask,
gc->format->Bmask, gc->format->Amask );

That I draw to on a pixel basis every frame. I blit the relevant changed
areas from this image to the screen with:

SDL_BlitSurface( image, &updated_rects[num_rects], gc,
&updated_rects[num_rects] );

(as I am emulating the sweep of the video beam over the screen of the
emulated hardware there are changes in the image that should not show up
on screen until the next frame). I keep track of the areas dirtied and
then do a single call to SDL_UpdateRects every frame. This seems plenty
fast on Linux, but is there a better way to do it on Mac OS X?

Thanks,
Fred> On Thursday, November 21, 2002, at 04:06 PM, Fredrick Meunier wrote:

Darrell_Walisser · November 23, 2002, 6:24am

Hi Darell,

Darrell Walisser wrote:

I write unsigned 8 bit samples to my sound card (padded to 16bit
if that’s all that is available). Using OSS to output the samples
sounds fine, but using SDL under Linux or OS X sounds clipped when I
am outputting maximum volume (maximum value=128 ± 124). Is this a
known problem with SDL sound output, should I be reducing the max
amplitude of my samples?
I’ve also noticed this in the past. I’m not sure what the cause is.
It might be a bug in mixing of 8 bit samples, or conversion of 8 bit
to 16 bit samples. Does initializing SDL with 16 bit samples and
converting your samples to 16 bit yourself fix the problem?

No, converting the samples to 16 bit myself still leaves the (loudest)
samples sounding distorted (Linux and Mac OS X), and sounding perfect
on Linux using OSS.

This is pure speculation, but I think it’s possible that after SDL
hands off the samples they are processed again by the sound daemon
(or entity that controls the system sound volume) and at that point
there might be some distortion. In this case, using a sound format
that mirrors the hardware/OS setting might solve the problem.

I’ll try a few different combinations and see if it helps.

I generate 1/50th of a second of audio at a time and write it to
a ring buffer, the SDL callback then reads the data writes it to the
sound card. Under Linux this works well, CPU usage is low and there
are practically no sound glitches. The same code on Mac OS X mostly
works, but if there is any movement in other windows the sound from
my app glitches. If I minimise my app or go fullscreen the glitches
dissapear. Is anyone else seeing this and are there any suggestions
about fixing the problem?

I’ve heard of this before. It might be because when you drag the
application window (I presume dragging other application windows
doesn’t have this problem), the main thread is blocked temporarily (I
don’t really know why it would be, though). If you are using
SDL_LockAudio() and copying to the ring buffer in the main thread,
you might have breakups if your audio buffers are too small. See if
using a larger audio buffer decreases this effect.

Dragging any window or having the dock pop up will cause glitching.
I think the main thread is not so much blocked from running as starved
of CPU as the demands of the window system or other programs
increases. I guess there are two fixes:

Make my program less CPU-hungry

Make the emulator run with real-time priority? Is there a real-time
scheduling priority available on Mac OS X?

I get much higher CPU load than I would get on Linux (Linux 6% CPU,
Mac OS X 25-50% CPU as reported by top).

I think that the remaining glitching is mostly caused by my graphics
routines not being as fast as they could be.

I create my video mode attempting to match the users display like this:

vidinfo = SDL_GetVideoInfo();
gc = SDL_SetVideoMode( width, height, vidinfo->vfmt->BitsPerPixel,
SDL_HWSURFACE|SDL_ANYFORMAT|SDL_RESIZABLE );

I have an offscreen surface that I create with:

image = SDL_CreateRGBSurface( SDL_SWSURFACE, width, height,
gc->format->BitsPerPixel, gc->format->Rmask, gc->format->Gmask,
gc->format->Bmask, gc->format->Amask );

That I draw to on a pixel basis every frame. I blit the relevant
changed areas from this image to the screen with:

SDL_BlitSurface( image, &updated_rects[num_rects], gc,
&updated_rects[num_rects] );

(as I am emulating the sweep of the video beam over the screen of the
emulated hardware there are changes in the image that should not show
up on screen until the next frame). I keep track of the areas dirtied
and then do a single call to SDL_UpdateRects every frame. This seems
plenty fast on Linux, but is there a better way to do it on Mac OS X?

There is a better way. Don’t use SDL_HWSURFACE, and don’t create an
additional software buffer of the same size as the screen. Mac OS X
doesn’t support a hardware screen surface in windowed mode, and windows
in Mac OS X are already double-buffered (note: this doesn’t mean you
should use SDL_Flip(), that’s why SDL_DOUBLEBUF isn’t set on the
surface).

In fullscreen mode Mac OS X uses a hardware, single-buffered surface.
But you would be better off to use a software surface, as it will
likely be faster.On Saturday, November 23, 2002, at 06:03 AM, Fredrick Meunier wrote:

On Thursday, November 21, 2002, at 04:06 PM, Fredrick Meunier wrote:

Fredrick_Meunier · November 25, 2002, 3:33am

Darrell Walisser wrote:

I get much higher CPU load than I would get on Linux (Linux 6% CPU,
Mac OS X 25-50% CPU as reported by top).

I think that the remaining glitching is mostly caused by my graphics
routines not being as fast as they could be.

I create my video mode attempting to match the users display like this:

vidinfo = SDL_GetVideoInfo();
gc = SDL_SetVideoMode( width, height, vidinfo->vfmt->BitsPerPixel,
SDL_HWSURFACE|SDL_ANYFORMAT|SDL_RESIZABLE );

I have an offscreen surface that I create with:

image = SDL_CreateRGBSurface( SDL_SWSURFACE, width, height,
gc->format->BitsPerPixel, gc->format->Rmask, gc->format->Gmask,
gc->format->Bmask, gc->format->Amask );

That I draw to on a pixel basis every frame. I blit the relevant
changed areas from this image to the screen with:

SDL_BlitSurface( image, &updated_rects[num_rects], gc,
&updated_rects[num_rects] );

(as I am emulating the sweep of the video beam over the screen of the
emulated hardware there are changes in the image that should not show
up on screen until the next frame). I keep track of the areas dirtied
and then do a single call to SDL_UpdateRects every frame. This seems
plenty fast on Linux, but is there a better way to do it on Mac OS X?

There is a better way. Don’t use SDL_HWSURFACE, and don’t create an
additional software buffer of the same size as the screen. Mac OS X
doesn’t support a hardware screen surface in windowed mode, and windows
in Mac OS X are already double-buffered (note: this doesn’t mean you
should use SDL_Flip(), that’s why SDL_DOUBLEBUF isn’t set on the surface).

Fist of all thanks for this suggestion, moving to a SDL_SWSURFACE for my
screen halved my average CPU requirements.

The additional software buffer I have is not used to attempt any
double-buffering. It is neccesary as a scratch pad for screen updates
from the emulator, only some of the updates to this surface go to the
actual display every frame. Are you suggesting that it would be better
to not use an SDL_Surface for this purpose?

In fullscreen mode Mac OS X uses a hardware, single-buffered surface.
But you would be better off to use a software surface, as it will likely
be faster.

I’ll stick with the software surface when I finally add a fullscreen
mode. In testing I have found that fullscreen is so much faster than
windowed mode that performance isn’t a problem even with unoptimised code.

Thanks,
Fred> On Saturday, November 23, 2002, at 06:03 AM, Fredrick Meunier wrote:

slouken · November 25, 2002, 6:50am

Fist of all thanks for this suggestion, moving to a SDL_SWSURFACE for my
screen halved my average CPU requirements.

The additional software buffer I have is not used to attempt any
double-buffering. It is neccesary as a scratch pad for screen updates
from the emulator, only some of the updates to this surface go to the
actual display every frame. Are you suggesting that it would be better
to not use an SDL_Surface for this purpose?

Actually, I would recommend using the display surface as your scratch
surface, and then use your code which calculates the updates to figure
out what rects you need to pass to SDL_UpdateRects(). That should buy
you increased performance and be safe since you’re using an SDL software
surface.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Fredrick_Meunier · November 26, 2002, 3:45am

An embedded and charset-unspecified text was scrubbed…
Name: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20021126/06a9a6b8/attachment.txt

Darrell_Walisser · November 26, 2002, 8:06am

Quoting Sam Lantinga :

Fist of all thanks for this suggestion, moving to a SDL_SWSURFACE
for my
screen halved my average CPU requirements.

The additional software buffer I have is not used to attempt any
double-buffering. It is neccesary as a scratch pad for screen updates
from the emulator, only some of the updates to this surface go to the
actual display every frame. Are you suggesting that it would be
better
to not use an SDL_Surface for this purpose?

Actually, I would recommend using the display surface as your scratch
surface, and then use your code which calculates the updates to figure
out what rects you need to pass to SDL_UpdateRects(). That should buy
you increased performance and be safe since you’re using an SDL
software
surface.

I gave this a try and my frame rate dropped by 10%

I guess this is a consequence of either Mac OS X windowed mode
peculiarities or
my access pattern to the display surface (1-4 pixel writes/surface
lock if
required, maybe 6000-18000 pixels changing on the average frame).

If you are running Quartz Extreme, the window buffer is locked while
being transferred to the video card. Since the buffer is locked, you
can’t copy to it during this DMA transfer. The first time you lock
after SDL_UpdateRects() you’ll have to wait for the DMA transfer to
complete, unless the DMA transfer is already completed at that time.
That’s probably what’s happening to you - you’ll probably also notice
that your CPU utilization is also lower as a result of spending more
time waiting on the transfer.

To get the best performance, you have to get the most overlap of your
code with the asynchronous DMA transfer. An additional software buffer
is an easy way to increase overlap. To increase it even more (if CPU
usage is still less than 100%), add more buffers/buffering or try to
delay blits to the display surface for as long as possible (but don’t
use SDL_Delay(), duh!). Maybe this would mean running the emulator loop
a couple of times before updating the screen.

So, you’re probably better off using an additional software buffer,
since that gives you the most overlap with the DMA transfer. You can
also achieve the affects of an additional software buffer by using an
8-bit or 16-bit surface. In this case, SDL’s shadow surface serves as
the additional buffer. This method has the advantage of being more
memory efficient (since the buffer is smaller your blits will be
faster).On Tuesday, November 26, 2002, at 06:44 AM, fredm at spamcop.net wrote:

Bob_Pendleton · November 26, 2002, 8:34pm

Quoting Sam Lantinga :

Fist of all thanks for this suggestion, moving to a SDL_SWSURFACE for my
screen halved my average CPU requirements.

The additional software buffer I have is not used to attempt any
double-buffering. It is neccesary as a scratch pad for screen updates
from the emulator, only some of the updates to this surface go to the
actual display every frame. Are you suggesting that it would be better
to not use an SDL_Surface for this purpose?

Actually, I would recommend using the display surface as your scratch
surface, and then use your code which calculates the updates to figure
out what rects you need to pass to SDL_UpdateRects(). That should buy
you increased performance and be safe since you’re using an SDL software
surface.

I gave this a try and my frame rate dropped by 10%

I guess this is a consequence of either Mac OS X windowed mode peculiarities or
my access pattern to the display surface (1-4 pixel writes/surface lock if
required, maybe 6000-18000 pixels changing on the average frame).

Probably not relevant anymore, but IIRC some of the older PowerPC Macs
had an odd/even cache so that an access pattern that went odd odd odd or
even even even was VERY slow while a “normal” access pattern of odd even
odd even was blazingly fast. This showed up during the porting of DOOM
to the Mac because it drew some things in vertical stripes so it was
hitting even even even then odd odd odd. It was fixed by adding 1 byte
to the pitch of the software back buffers.

	Bob PendletonOn Tue, 2002-11-26 at 05:44, fredm at spamcop.net wrote:

Fred

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl
–
±-----------------------------------+

Bob Pendleton, consultant for hire +
Resume: http://www.jump.net/~bobp +
Email: @Bob_Pendleton +
±-----------------------------------+

Neil_Bradley · November 26, 2002, 8:56pm

Actually, I would recommend using the display surface as your scratch
surface, and then use your code which calculates the updates to figure
out what rects you need to pass to SDL_UpdateRects(). That should buy
you increased performance and be safe since you’re using an SDL software
surface.
I gave this a try and my frame rate dropped by 10%

That’s expected, and it’s also a bad idea.

I guess this is a consequence of either Mac OS X windowed mode peculiarities or
my access pattern to the display surface (1-4 pixel writes/surface lock if
required, maybe 6000-18000 pixels changing on the average frame).

Actually, it’s because writes over PCI/AGP is slooooooooow compared to
main system memory.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

Patrick_McFarland · November 26, 2002, 9:32pm

No no, writes to vram are fast, its reads that are slow.On 26-Nov-2002, Neil Bradley wrote:

Actually, it’s because writes over PCI/AGP is slooooooooow compared to
main system memory.

–
Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." --Kristian Wilson, Nintendo, Inc, 1989
-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20021126/b74cbb71/attachment.pgp

Neil_Bradley · November 26, 2002, 9:41pm

Uh… no.

Remember that RAM on the video card still require the standard PCI
bus arbitration (yes, this is true for AGP as well) which is a minimum of
5 clocks to get on and off the PCI bus - even for a single 32 bit
transaction.

Writes to main system memory do not have this problem, and thusly much
much faster. Don’t believe me? Write code that writes to a video card
surface and main system memory. Guess which one is faster.

–>Neil> On 26-Nov-2002, Neil Bradley wrote:

Actually, it’s because writes over PCI/AGP is slooooooooow compared to
main system memory.
No no, writes to vram are fast, its reads that are slow.

Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

Patrick_McFarland · November 27, 2002, 12:32am

Who said anything about system memory being faster than video memory?
Im just saying writing to video memory isnt slow.On 26-Nov-2002, Neil Bradley wrote:

Remember that RAM on the video card still require the standard PCI
bus arbitration (yes, this is true for AGP as well) which is a minimum of
5 clocks to get on and off the PCI bus - even for a single 32 bit
transaction.

Writes to main system memory do not have this problem, and thusly much
much faster. Don’t believe me? Write code that writes to a video card
surface and main system memory. Guess which one is faster.

–
Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." --Kristian Wilson, Nintendo, Inc, 1989
-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20021127/7a1717a8/attachment.pgp

Neil_Bradley · November 27, 2002, 12:44am

Remember that RAM on the video card still require the standard PCI
bus arbitration (yes, this is true for AGP as well) which is a minimum of
5 clocks to get on and off the PCI bus - even for a single 32 bit
transaction.

Writes to main system memory do not have this problem, and thusly much
much faster. Don’t believe me? Write code that writes to a video card
surface and main system memory. Guess which one is faster.

Who said anything about system memory being faster than video memory?
Im just saying writing to video memory isnt slow.

Considering that the context of the original message was an in-memory
surface vs. a video memory surface, by comparison video memory is MUCH
slower to write to that main system memory.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

Patrick_McFarland · November 27, 2002, 12:57am

Considering that the context of the original message was an in-memory
surface vs. a video memory surface, by comparison video memory is MUCH
slower to write to that main system memory.

–>Neil

Then try to bullshit why hwsurfaces are faster than swsurfaces? hmm?–
Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." --Kristian Wilson, Nintendo, Inc, 1989
-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20021127/e65355b5/attachment.pgp

Neil_Bradley · November 27, 2002, 1:14am

Considering that the context of the original message was an in-memory
surface vs. a video memory surface, by comparison video memory is MUCH
slower to write to that main system memory.
Then try to bullshit why hwsurfaces are faster than swsurfaces? hmm?

The original poster had a performance problem. He was blitting from a
software surface to a hardware surface. Someone suggested moving his
software surface to the hardware surface. This yielded a 10% performance
drop.

So how exactly is it that hardware surfaces are faster than software
surfaces in this case? This just looks like you’re trying to find a way to
make me look bad/wrong to make yourself look better.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

Kylotan · November 27, 2002, 5:29am

Who said anything about system memory being faster than video memory?

Well, Neil did, in the message you originally responded to and disputed!

(“Actually, it’s because writes over PCI/AGP is slooooooooow compared
to main system memory”)

Sure, hardware surfaces will be faster as long as you’re using them
exclusively, as they are optimised for blits to other hardware surfaces
on the gfx card. When you have to start mixing them with software
surfaces or pixel-editing, you will often find that it may just be
better to use software surfaces exclusively instead.From: unknown@panax.com (Patrick McFarland)
Subject: Re: [SDL] SDL sound under Mac OS X and Linux

–
Kylotan
http://pages.eidosnet.co.uk/kylotan

slouken · November 27, 2002, 5:32am

Considering that the context of the original message was an in-memory
surface vs. a video memory surface, by comparison video memory is MUCH
slower to write to that main system memory.

–>Neil

Then try to bullshit why hwsurfaces are faster than swsurfaces? hmm?

Good grief. This is very simple:

Writing data to video memory is slower than writing data to system memory.
Copying data between video memory is much faster than writing data to video
memory.

There are four approaches to take:

Composite your scene in system memory, and upload the final image to
video memory.
Composite your scene by writing pixels directly to video memory.
Upload your images to video memory and composite your scene using the
video hardware to manipulate the images you uploaded.
A combination of the above.

SDL lets you do all four of those, but guess which one is the slowest?
That’s right, number 2, writing individual pixels over the bus. Guess
which option most people try to make their programs go faster? That’s
right, option number 2, because video memory is faster than system
memory, right? Right, but only if you don’t have to read or write to it.

So, why is option 1 faster than option 2? Because block copies of data
to video memory is considerably faster than individual pixel accesses.
Not only can you take full advantage of the width of the bus data path,
but most hardware can queue up DMA block transfers that execute in parallel
with the main CPU.

Option 3 is the fastest, and the one that the next version of the SDL API
is going to be optimized for. Why is this fastest? Because you can get
all of your data on the video card before you need it, and compositing
your scene is a simple matter of sending a few commands to the video
hardware, waiting (or not) for the hardware to finish, then sending a
command to display the finished product. This is the codepath that 3D
hardware is designed to be most efficient at, and by having a rich
command set, able to make some truly amazing visual effects.

So, unless you know exactly what you’re doing and know exactly what
hardware platforms and driver configurations you are targeting, the
most efficient way to use SDL is to set a video mode using whatever
the optimal video depth is available and with no hardware surfaces.
This will set up SDL to do all blitting in system memory to a single
back buffer and then copy the contents of this buffer with no conversion
to the screen when you call SDL_UpdateRects(). This means that you
need to be able to handle any of 8 bpp, 15 bpp, 16 bpp, and 32 bpp.
The easiest way to handle this is just to call SDL_DisplayFormat() on
your artwork to get it to the current display format so blits don’t
need to do any conversion. Note that if the display hardware is at
8 bpp, you may want to dither to a specific palette yourself since
SDL’s color conversion routines are designed for speed, and do not
do any dithering.

One of the problems with this approach is that you have no control over
when the scene displays, with respect to the refresh rate. I’ll let you
in on a dirty little secret. There isn’t anything you can do about this
unless you’re running in fullscreen mode. However, if you know what you
are doing, and are running in fullscreen mode, you can request that SDL
give you a page flipped display surface in video memory, by passing the
(SDL_FULLSCREEN|SDL_DOUBLEBUF) flags to SDL when you set the video mode.
If you successfully set these flags, you get two video memory buffers
that are alternately displayed when you call SDL_Flip(). Where possible
this flip is synchronized with the vertical blank, to avoid tearing.
Now as soon as you do this, you’re in video memory land and need to
create as many surfaces in video memory as possible. Conveniently,
SDL_DisplayFormat() will put your surfaces in video memory if the display
surface is also in video memory. This will make blits between the hardware
surface and the screen very fast. HOWEVER since no 2D blitters support
alpha blending in hardware, this means that alpha blending will be really
really slow: read a pixel from the source surface, read a pixel from the
destination surface, perform the blend in system memory, write the pixel
back out to video memory. Reads from video memory are even slower than
writes, so you’ll get terrible performance if you do this.

So, to sum up: Stick to software surfaces unless you really know what
you’re doing; they’re supported on every platform and they’re fairly fast.
If you really know what you’re doing, you can get page-flipped video memory
on some hardware/driver combinations, but you’ll need to be able to fall
back to a software back buffer in the cases where you can’t get directly
to video memory. If you’re not changing the entire screen every frame,
and you’re using a software display surface, try using SDL_UpdateRects()
to only update the portions of the screen which have changed.
Use SDL_DisplayFormat() and SDL_DisplayFormatAlpha() whenever possible.
If you’re doing alpha blending, always use a 3D API or software memory
for 2D work. If you are using software memory and have alpha channels or
colorkeys in your images, use SDL_RLEACCEL - it speeds up blits immensely
by encoding the operations needed to get your image on the screen without
having to do expensive pixel-by-pixel checks at blit time.

Finally, if you know you’re only going to run on 3D hardware, and want to
do lots of fancy visual effects, consider using OpenGL instead of 2D blits.
SDL does provide an API for setting up an OpenGL context and swapping the
video buffers, and the input handling doesn’t change at all. You can even
convert SDL surfaces to textures and display them using OpenGL commands.
Example code for this is provided in the testgl.c file in the SDL source
archive.

Whew, I should write this up and stick it on the website - it’s really a FAQ.

Questions are welcome, and I’ll be able to answer them next week.

-Sam Lantinga, Software Engineer, Blizzard Entertainment

EvilTypeGuy · November 27, 2002, 7:37am

I’m not worthy! I’m not worthy!

-shawnOn Wed, 2002-11-27 at 07:31, Sam Lantinga wrote:

Whew, I should write this up and stick it on the website - it’s really a FAQ.

Questions are welcome, and I’ll be able to answer them next week.

-Sam Lantinga, Software Engineer, Blizzard Entertainment