Performance issues in MacOSX, profiler blames AddTimer?

Hi there,

I’m porting an SDL app from Linux to MacOSX, and I just ran it through
a profiler and it has given the most interesting results.

The most relevant is this: 73’8% of cpu time is being spent on a
function called SDL_AddTimer, when it is called by SDL_LowerBlit,
called from the one SDL_Flip my app does per frame. And I’m afraid I
don’t understand how could this possibly be :-S

A method (SDL_LowerBlit) used to copy pixels from one buffer to
another, is wasting most of its time creating timers??

I’m concerned about this because this app is going to be running in
laptops, in wich battery time is very dependant on cpu usage. If I
could minimize that 73’8%, even by a small amount, that would surely
mean a longer battery life.

Does anybody have any ideas of how to make SDL_LowerBlit be more
efficient in terms of cpu usage?

thanks in advance

bye

The most relevant is this: 73’8% of cpu time is being spent on a
function called SDL_AddTimer, when it is called by SDL_LowerBlit,
called from the one SDL_Flip my app does per frame. And I’m afraid I
don’t understand how could this possibly be :-S

Shark (and gdb, etc) sometimes show the wrong symbol when using an
optimized binary.

Does anybody have any ideas of how to make SDL_LowerBlit be more
efficient in terms of cpu usage?

It will always be the most CPU-heavy part of an app. The best you can do
is do less of it. :slight_smile:

Update only parts of the screen when you know the rest doesn’t need
updating. Deliberately reduce your framerate if CPU time is the most
important thing.

Get your surfaces into the screen format. Then blitting is mostly just a
collection of memcpy() calls.

–ryan.

I must admit, I don’t know any of the profilers or their functioning in
general, so the following is just a thought.

If the profiler measures time instead of real CPU instructions this may
relate to vsync or something like that, so that SDL_Flip simply waits
for the vertical sync.

Matthias

Ryan C. Gordon wrote:>> The most relevant is this: 73’8% of cpu time is being spent on a

function called SDL_AddTimer, when it is called by SDL_LowerBlit,
called from the one SDL_Flip my app does per frame. And I’m afraid I
don’t understand how could this possibly be :-S

Shark (and gdb, etc) sometimes show the wrong symbol when using an
optimized binary.

Does anybody have any ideas of how to make SDL_LowerBlit be more
efficient in terms of cpu usage?

It will always be the most CPU-heavy part of an app. The best you can do
is do less of it. :slight_smile:

Update only parts of the screen when you know the rest doesn’t need
updating. Deliberately reduce your framerate if CPU time is the most
important thing.

Get your surfaces into the screen format. Then blitting is mostly just a
collection of memcpy() calls.

–ryan.


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Hi again,

The most relevant is this: 73’8% of cpu time is being spent on a
function called SDL_AddTimer, when it is called by SDL_LowerBlit,
called from the one SDL_Flip my app does per frame. And I’m afraid I
don’t understand how could this possibly be :-S

Shark (and gdb, etc) sometimes show the wrong symbol when using an
optimized binary.

I just recompiled it without any optimizations, and Shark still blames
AddTimer for all that cpu time.

Update only parts of the screen when you know the rest doesn’t need
updating.

Yes but, unfortunately, the whole screen must be updated 90% of the
time. I’ll work on it nonetheless.

Get your surfaces into the screen format. Then blitting is mostly just a
collection of memcpy() calls.

Hummm… right now it just defaults to 16 bits regardless of the
system colordepth, so when in windowed mode, if the system is running
in 32 bits of color, there must be some costly calculations there.
I’ll take a look at that.

Thank you very much

byeOn 8/25/06, Ryan C. Gordon wrote:

Hi againOn 8/25/06, Matthias Weigand wrote:

I must admit, I don’t know any of the profilers or their functioning in
general, so the following is just a thought.

If the profiler measures time instead of real CPU instructions this may
relate to vsync or something like that, so that SDL_Flip simply waits
for the vertical sync.

Hummm yes, that is the most likely explanation I’ve found so far. I’ll
try to disable vsync, and see what happens.

Humm… I just remembered that in this app I had to remove a call to
SDL_Delay and substitute it with a usleep call, because SDL_Delay used
all the available cpu power while waiting. I wonder if something
similar happens inside AddTimer. Is it just wasting cpu time in a
"while(!vsyncing());" loop? Couldn’t it be made to sleep until the
time for syncing comes? I guess there could be some platform
dependency issues, but compared with the ones already faced by the
rest of the library, those should be a piece of cake…

Just my 2 cents

By the way, thank you very much for that idea.

Bye

The Right Thing ™ to do is letting SDL select color depth (ie. set
bitsperpixel to 0 in SDL_SetVideoMode()), then convert all surfaces to
that depth with SDL_DisplayFormat(). Anything else will cause nasty
overhead if the environment does not match perfectly. You should also
check if you’re using hardware surfaces correctly, if any.On 8/25/06, Juanval wrote:

Hummm… right now it just defaults to 16 bits regardless of the
system colordepth, so when in windowed mode, if the system is running
in 32 bits of color, there must be some costly calculations there.
I’ll take a look at that.

  • SR

Hi againOn 8/26/06, Simon Roby <simon.roby at gmail.com> wrote:

On 8/25/06, Juanval <@Jon_Valdes> wrote:

Hummm… right now it just defaults to 16 bits regardless of the
system colordepth, so when in windowed mode, if the system is running
in 32 bits of color, there must be some costly calculations there.
I’ll take a look at that.

The Right Thing ™ to do is letting SDL select color depth (ie. set
bitsperpixel to 0 in SDL_SetVideoMode()), then convert all surfaces to
that depth with SDL_DisplayFormat(). Anything else will cause nasty
overhead if the environment does not match perfectly. You should also
check if you’re using hardware surfaces correctly, if any.

Wow!! cpu usage was cut by half just by doing that. Thank you very
very much for that one :smiley:

bye

Hello Juanval,

Saturday, August 26, 2006, 12:43:10 AM, you wrote:

Humm… I just remembered that in this app I had to remove a call to
SDL_Delay and substitute it with a usleep call, because SDL_Delay used
all the available cpu power while waiting.

That sounds like something in SDL is broken. All SDL_Delay() should be
doing is calling usleep() or similar.–
Best regards,
Peter mailto:@Peter_Mulholland

I just recompiled it without any optimizations, and Shark still blames
AddTimer for all that cpu time.

Is this PowerPC or x86? It’s possible it’s landing in a system library,
or it’s possible it’s actually some assembly code in SDL which doesn’t
contain correct debug information, so Shark is making a best guess.

At any case, it’s definitely not really spending time in SDL_AddTimer.

Hummm… right now it just defaults to 16 bits regardless of the
system colordepth, so when in windowed mode, if the system is running
in 32 bits of color, there must be some costly calculations there.
I’ll take a look at that.

There is an Altivec code path for the conversion, but I’m not sure
where this lands on MacOSX/x86 right now, but yeah, avoiding the
conversion at all is the real win.

–ryan.

Hello Ryan,On 8/26/06, Ryan C. Gordon wrote:

I just recompiled it without any optimizations, and Shark still blames
AddTimer for all that cpu time.

Is this PowerPC or x86? It’s possible it’s landing in a system library,
or it’s possible it’s actually some assembly code in SDL which doesn’t
contain correct debug information, so Shark is making a best guess.

At any case, it’s definitely not really spending time in SDL_AddTimer.

It’s an x86 processor.

As for the 2 things you point out here, they go far beyond my current
knowledge of both Shark and SDL internals, so I have no idea which one
could possibly be :-S

Bye

Hi again,On 8/26/06, Peter Mulholland wrote:

Hello Juanval,

Saturday, August 26, 2006, 12:43:10 AM, you wrote:

Humm… I just remembered that in this app I had to remove a call to
SDL_Delay and substitute it with a usleep call, because SDL_Delay used
all the available cpu power while waiting.

That sounds like something in SDL is broken. All SDL_Delay() should be
doing is calling usleep() or similar.

This is the SDL_Delay code for MacOS (in SDL 1.2.11)

void SDL_Delay(Uint32 ms)
{
Uint32 stop, now;

    stop = SDL_GetTicks() + ms;
    do {
        #if TARGET_API_MAC_CARBON
            MPYield();
        #else
            SystemTask();
        #endif

            now = SDL_GetTicks();

    } while ( stop > now );

}

So, if it has been compiled with Carbon, it just yields the processor
to another process (until the OS gives it back to the app).

If it’s not compiled with Carbon, then it calls an API function called
SystemTask, of wich the API reference says this: “Gives time to each
open desk accessory or driver to perform any periodic action”. That
seems very similar to yielding the processor…

So, shouldn’t this be changed to something not unlike this?:

void SDL_Delay(Uint32 ms)
{
if(ms>0)
usleep(ms*1000);
}

I’m not claiming this to be correct, but I’m using something similar
in my app, and seems to work flawlessly.

Bye

        #if TARGET_API_MAC_CARBON
            MPYield();
        #else
            SystemTask();
        #endif

Yikes, it’s probably time to #ifdef that for Mac OS 9, and switch OS X
builds to use the POSIX routines.

–ryan.

Yikes, it’s probably time to #ifdef that for Mac OS 9, and switch OS X
builds to use the POSIX routines.

Now noted in Bugzilla:
http://bugzilla.libsdl.org/show_bug.cgi?id=309

–ryan.

        #if TARGET_API_MAC_CARBON
            MPYield();
        #else
            SystemTask();
        #endif

Yikes, it’s probably time to #ifdef that for Mac OS 9, and switch OS X
builds to use the POSIX routines.

Yep, definitely!

-Sam Lantinga, Senior Software Engineer, Blizzard Entertainment

There is an Altivec code path for the conversion, but I’m not sure
where this lands on MacOSX/x86 right now, but yeah, avoiding the
conversion at all is the real win.

Altivec isn’t used on MacOSX/x86, since it ends up in Rosetta emulation. :slight_smile:

-Sam Lantinga, Senior Software Engineer, Blizzard Entertainment

Hello Ryan,

Saturday, August 26, 2006, 6:12:13 PM, you wrote:

Yikes, it’s probably time to #ifdef that for Mac OS 9, and switch OS X
builds to use the POSIX routines.

That’s exactly what I thought :slight_smile:

For SDL 1.3, I guess Carbon is being eliminated totally?–
Best regards,
Peter mailto:@Peter_Mulholland

Juanval wrote:

This is the SDL_Delay code for MacOS (in SDL 1.2.11)

void SDL_Delay(Uint32 ms)
{
Uint32 stop, now;

    stop = SDL_GetTicks() + ms;
    do {
        #if TARGET_API_MAC_CARBON
            MPYield();
        #else
            SystemTask();
        #endif

            now = SDL_GetTicks();

    } while ( stop > now );

}

Are you sure about this? This is from src/timer/macos/SDL_systimer.c,
which is used on Mac OS 9. In my SDL 1.2 Xcode project (which hasn’t
been updated from SVN for a few weeks, but I don’t think this has
changed), src/timer/unix/SDL_systimer.c is used, with the following code
(of which I haven’t made heads or tails yet):

void SDL_Delay (Uint32 ms)
{
#if SDL_THREAD_PTH
pth_time_t tv;
tv.tv_sec = ms/1000;
tv.tv_usec = (ms%1000)*1000;
pth_nap(tv);
#else
int was_error;

#if HAVE_NANOSLEEP
struct timespec elapsed, tv;
#else
struct timeval tv;
Uint32 then, now, elapsed;
#endif

/* Set the timeout interval */

#if HAVE_NANOSLEEP
elapsed.tv_sec = ms/1000;
elapsed.tv_nsec = (ms%1000)*1000000;
#else
then = SDL_GetTicks();
#endif
do {
errno = 0;

#if HAVE_NANOSLEEP
tv.tv_sec = elapsed.tv_sec;
tv.tv_nsec = elapsed.tv_nsec;
was_error = nanosleep(&tv, &elapsed);
#else
/* Calculate the time interval left (in case of interrupt) */
now = SDL_GetTicks();
elapsed = (now-then);
then = now;
if ( elapsed >= ms ) {
break;
}
ms -= elapsed;
tv.tv_sec = ms/1000;
tv.tv_usec = (ms%1000)*1000;

	was_error = select(0, NULL, NULL, NULL, &tv);

#endif /* HAVE_NANOSLEEP /
} while ( was_error && (errno == EINTR) );
#endif /
SDL_THREAD_PTH */
}

-Christian

Crap, I’ve just checked if it still took the whole cpu when using
SDL_Delay, and the answer is no. It doesn’t. It stays at roughly the
same cpu usage as my hacked up solution.

It might have had something to do with a previous version of SDL I had
running around here when I coded that, or it might have been just my
complete and utter stupidity.

So, this all boils down to this: I was wrong, MacOSX implementation of
SDL_Delay works perfectly in SDL 1.2.11 This means the bug filed
yesterday can be checked as being fixed (well, nonexistant, really).

Sorry about all the fuss :frowning: I’m definitely a complete idiot.

Anyway, thank you very much for all the thought and effort you’ve put into this

See yaOn 8/27/06, Christian Walther wrote:

Juanval wrote:

This is the SDL_Delay code for MacOS (in SDL 1.2.11)

void SDL_Delay(Uint32 ms)
{
Uint32 stop, now;

    stop = SDL_GetTicks() + ms;
    do {
        #if TARGET_API_MAC_CARBON
            MPYield();
        #else
            SystemTask();
        #endif

            now = SDL_GetTicks();

    } while ( stop > now );

}

Are you sure about this? This is from src/timer/macos/SDL_systimer.c,
which is used on Mac OS 9. In my SDL 1.2 Xcode project (which hasn’t
been updated from SVN for a few weeks, but I don’t think this has
changed), src/timer/unix/SDL_systimer.c is used, with the following code
(of which I haven’t made heads or tails yet):

There is an Altivec code path for the conversion, but I’m not sure
where this lands on MacOSX/x86 right now, but yeah, avoiding the
conversion at all is the real win.

Altivec isn’t used on MacOSX/x86, since it ends up in Rosetta emulation. :slight_smile:

I meant "I’m not sure if we hooked up the hermes blitters on Mac OS X,"
but yeah. :slight_smile:

–ryan.