OSX rendering issues and segfault on exit

Hi all,

I’ve been debugging a problem with SDL on OSX for quite a while,
without getting much further. Maybe somebody here can shed some light
on the issue.

The code in question runs fine on Linux and Windows. It also runs fine
if I set the SDL_VIDEODRIVER to X11 on OSX. So I would assume it is
not necessarily a problem in the code, but something that is related
to SDL on OSX specifically. That aside, it’s occurring on both my
iBook G3 running 10.4.11 and my MacBook Pro 3.1 running 10.5.6 under
SDL 1.2.12, 1.2.13 and recent SVN. Haven’t tried anything earlier.

There are two shots that show how the screen is supposed to look like
and what I actually get on the Mac:

http://adonthell.berlios.de/linux.png
http://adonthell.berlios.de/mac.png

On the Mac it’s also flickering like crazy and when I quit the program
it segfaults:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0x020317d8 in SDL_FreeSurface (surface=0x1551960) at
./src/video/SDL_surface.c:915
#2 0x02033fac in SDL_VideoQuit () at ./src/video/SDL_video.c:1354
#3 0x02009300 in SDL_QuitSubSystem (flags=32) at ./src/SDL.c:202
#4 0x002d778c in gfx::cleanup () at …/…/…/adonthell/src/gfx/gfx.cc:172
#5 0x00665700 in adonthell::app::cleanup (this=0x7260) at
…/…/…/adonthell/src/main/adonthell.cc:249
#6 0x007d7b94 in -[SDLMain applicationDidFinishLaunching:] ()
#7 0x90b04e1c in _nsnote_callback ()
#8 0x902d8ec0 in __CFXNotificationPost ()
#9 0x902d0f20 in _CFXNotificationPostNotification ()
#10 0x90aef224 in -[NSNotificationCenter
postNotificationName:object:userInfo:] ()
#11 0x9ea40be8 in -[NSApplication _postDidFinishNotification] ()
#12 0x9ea40ad4 in -[NSApplication _sendFinishLaunchingNotification] ()
#13 0x9ea4061c in -[NSApplication(NSAppleEventHandling) _handleAEOpen:] ()
#14 0x9ea401c4 in -[NSApplication(NSAppleEventHandling)
_handleCoreEvent:withReplyEvent:] ()
#15 0x90b05e28 in -[NSAppleEventManager
dispatchRawAppleEvent:withRawReply:handlerRefCon:] ()
#16 0x90b05c88 in _NSAppleEventManagerGenericHandler ()
#17 0x964ae960 in aeDispatchAppleEvent ()
#18 0x964ae7fc in dispatchEventAndSendReply ()
#19 0x964ae654 in aeProcessAppleEvent ()
#20 0x9e5e32e0 in AEProcessAppleEvent ()
#21 0x9ea3e90c in _DPSNextEvent ()
#22 0x9ea3e3f8 in -[NSApplication
nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#23 0x9ea3a93c in -[NSApplication run] ()
#24 0x007d7ae4 in CustomApplicationMain ()
#25 0x007d7bbc in main_init ()
#26 0x006663ac in adonthell::app::init (this=0x7260) at
…/…/…/adonthell/src/main/adonthell.cc:237
#27 0x00666418 in main (argc=3, argv=Cannot access memory at address 0x0
Cannot access memory at address 0x0
0x1551960) at …/…/…/adonthell/src/main/main.cc:81

The problem is, despite the seemingly trivial output, the program
behind it is pretty big already and all attempts at providing a small,
self-contained test case to reproduce the issue failed. That’s why I
am hoping that somebody can possibly figure out the problem by looking
at the symptoms.

If somebody really wants to experience the issue firsthand, get the
code by following these instructions:
http://adonthell.berlios.de/doc/index.php/Development:QuickStart
Instead of worldtest (a much more complex program that runs fine on
the Macs) run guitest.

Any insight into the issue would be appreciated. As are hints how code
to reproduce the issue might look like.

Regards,

Kai

The code in question runs fine on Linux and Windows. It also runs fine
if I set the SDL_VIDEODRIVER to X11 on OSX. So I would assume it is
not necessarily a problem in the code, but something that is related
to SDL on OSX specifically. That aside, it’s occurring on both my
iBook G3 running 10.4.11 and my MacBook Pro 3.1 running 10.5.6 under
SDL 1.2.12, 1.2.13 and recent SVN. Haven’t tried anything earlier.

Forgot to mention what might be an important detail:

All blitting is done directly to the screen surface, which is 32bpp
and flags SDL_HWSURFACE | SDL_DOUBLEBUF. Now, if I create a separate
surface, do all the blitting on that and finally blit that to the
screen surface, things work well and I don’t get a crash on exit.

However, this wasn’t required in any previous code using the same
rendering pipeline, so I’d rather not resort to a workaround like
that.

KaiOn Mon, Feb 16, 2009 at 9:53 PM, Kai Sterker <@Kai_Sterker> wrote:

All blitting is done directly to the screen surface, which is 32bpp
and flags SDL_HWSURFACE | SDL_DOUBLEBUF. Now, if I create a separate
surface, do all the blitting on that and finally blit that to the
screen surface, things work well and I don’t get a crash on exit.

Have you tried taking out SDL_DOUBLEBUF? I don’t think it would make
much of a difference for the crashing at the end (now that I think of
it, you’re not doing SDL_FreeSurface on the screen surface, right?),
but it might help fix your visual artefacts/flickering. Last I
checked, double buffering doesn’t actually do anything useful (Mac OS
X uses compositing, everything is automatically double buffered by the
display server), and emulating it might cause a few issues?On Tue, Feb 17, 2009 at 4:53 AM, Kai Sterker <kai.sterker at gmail.com> wrote:


http://pphaneuf.livejournal.com/

The code in question runs fine on Linux and Windows. It also runs fine
if I set the SDL_VIDEODRIVER to X11 on OSX. So I would assume it is
not necessarily a problem in the code, but something that is related
to SDL on OSX specifically. That aside, it’s occurring on both my
iBook G3 running 10.4.11 and my MacBook Pro 3.1 running 10.5.6 under
SDL 1.2.12, 1.2.13 and recent SVN. Haven’t tried anything earlier.

Oh, and also, even though you say it works fine with the X11 driver on
OS X, do you get any warnings running it under Valgrind on Linux?
Sometimes, the problem isn’t fatal in one circumstance (and so stays
hidden), but happens in both cases, and Valgrind will point you at the
right place…On Mon, Feb 16, 2009 at 3:53 PM, Kai Sterker <kai.sterker at gmail.com> wrote:


http://pphaneuf.livejournal.com/

The code in question runs fine on Linux and Windows. It also runs fine
if I set the SDL_VIDEODRIVER to X11 on OSX. So I would assume it is
not necessarily a problem in the code, but something that is related
to SDL on OSX specifically. That aside, it’s occurring on both my
iBook G3 running 10.4.11 and my MacBook Pro 3.1 running 10.5.6 under
SDL 1.2.12, 1.2.13 and recent SVN. Haven’t tried anything earlier.

Forgot to mention what might be an important detail:

All blitting is done directly to the screen surface, which is 32bpp
and flags SDL_HWSURFACE | SDL_DOUBLEBUF. Now, if I create a separate
surface, do all the blitting on that and finally blit that to the
screen surface, things work well and I don’t get a crash on exit.

However, this wasn’t required in any previous code using the same
rendering pipeline, so I’d rather not resort to a workaround like
that.

Does this happen with the SDL 1.3 snapshot?
http://www.libsdl.org/tmp/SDL-1.3.zip

See ya,
-Sam Lantinga, Founder and President, Galaxy Gameworks LLC> On Mon, Feb 16, 2009 at 9:53 PM, Kai Sterker <kai.sterker at gmail.com> wrote:

Oh, and also, even though you say it works fine with the X11 driver on
OS X, do you get any warnings running it under Valgrind on Linux?
Sometimes, the problem isn’t fatal in one circumstance (and so stays
hidden), but happens in both cases, and Valgrind will point you at the
right place…

I will try that (and also SDL-1.3) over the next few days. For now I
just removed the SDL_DOUBLEBUF flag from SDL_SetVideoMode, as it was
the simplest test.

The behaviour is the same, however.

And no, we don’t try to free the screen surface ;-).

Anyway, thanks for the hints so far. I’ll report back once I had a
chance to try out more.

KaiOn Tue, Feb 17, 2009 at 11:27 AM, Pierre Phaneuf wrote:

Does this happen with the SDL 1.3 snapshot?
http://www.libsdl.org/tmp/SDL-1.3.zip

The short version is: it does not work at all with SDL-1.3-4444.

The test program that runs fine with SDL-1.2 (worldtest) just displays
a black screen. The other one (guitest) with the display issues
crashes when trying to access the surface pixel data.

Some more information:
worldtest uses custom code to load the graphics from RGB(A) pngs and
finally calls:

SDL_Surface *tmp = SDL_CreateRGBSurfaceFrom(pixel_data, length, height,
    bytes_per_pixel * 8, length * bytes_per_pixel,
    red_mask, green_mask, blue_mask, alpha_mask);

Here’s a snapshot of one of those surfaces in gdb

(gdb) print *tmp
$2 = {
flags = 16842752,
format = 0x21eed50,
w = 40,
h = 87,
pitch = 10510692,
pixels = 0x188ae00,
userdata = 0x0,
locked = 0,
lock_data = 0x0,
clip_rect = {
x = 2621527,
y = 0,
w = 0,
h = 35581184
},
map = 0xf,
format_version = 1,
refcount = 0
}
(gdb) print *tmp->format
$3 = {
palette = 0x0,
BitsPerPixel = 32 ’ ',
BytesPerPixel = 4 ‘\004’,
Rloss = 0 ‘\0’,
Gloss = 0 ‘\0’,
Bloss = 0 ‘\0’,
Aloss = 0 ‘\0’,
Rshift = 16 ‘\020’,
Gshift = 8 ‘\b’,
Bshift = 0 ‘\0’,
Ashift = 24 ‘\030’,
Rmask = 16711680,
Gmask = 65280,
Bmask = 255,
Amask = 4278190080
}

Now, guitest OTOH generates a lot of the graphics on the fly. For that
it reads and writes directly from/to the pixel data of a previously
allocated surface, using code like this:

Uint8 * offset = ((Uint8 *) vis->pixels) + y * vis->pitch + x *

vis->format->BytesPerPixel;
Uint32 col = *((Uint32 *)(offset));

(That’s just the case for 32bpp surfaces, but you should get the
idea). Again, here’s a snapshot of the surface whose pixel data is
accessed at the time of the crash:

(gdb) print ((surface_sdl)s)->vis
$4 = {
flags = 0,
format = 0x154ab50,
w = 512,
h = 512,
pitch = 134217728,
pixels = 0x4cb9000,
userdata = 0x0,
locked = 0,
lock_data = 0x0,
clip_rect = {
x = 33554944,
y = 0,
w = 0,
h = 22326144
},
map = 0x2,
format_version = 1,
refcount = 1
}
(gdb) print ((surface_sdl)s)->vis->format
5 = {
palette = 0x0,
BitsPerPixel = 32 ’ ',
BytesPerPixel = 4 ‘\004’,
Rloss = 0 ‘\0’,
Gloss = 0 ‘\0’,
Bloss = 0 ‘\0’,
Aloss = 8 ‘\b’,
Rshift = 16 ‘\020’,
Gshift = 8 ‘\b’,
Bshift = 0 ‘\0’,
Ashift = 0 ‘\0’,
Rmask = 16711680,
Gmask = 65280,
Bmask = 255,
Amask = 0
}

Note that before accessing the data, we do:

if (SDL_MUSTLOCK(vis)) SDL_LockSurface (vis);

In both cases, what appears funny to me are the odd values for
clip_rect and pitch. If that is not just an artifact of optimization,
then it might explain both the blank screen and the crash while
accessing the pixel data.

Out of curiosity, I went back to the other box that still has SDL 1.2
and did a check of the surfaces there. For guitest, I got

(gdb) print *vis
$2 = {
flags = 16777220,
format = 0x558da0,
w = 512,
h = 512,
pitch = 2048,
pixels = 0xb0ccd000,
offset = 0,
hwdata = 0x0,
clip_rect = {
x = 0,
y = 0,
w = 512,
h = 512
},
unused1 = 0,
locked = 1,
map = 0x5589e0,
format_version = 2,
refcount = 1
}
(gdb) print *vis->format
$3 = {
palette = 0x0,
BitsPerPixel = 32 ’ ',
BytesPerPixel = 4 ‘\004’,
Rloss = 0 ‘\0’,
Gloss = 0 ‘\0’,
Bloss = 0 ‘\0’,
Aloss = 8 ‘\b’,
Rshift = 16 ‘\020’,
Gshift = 8 ‘\b’,
Bshift = 0 ‘\0’,
Ashift = 0 ‘\0’,
Rmask = 16711680,
Gmask = 65280,
Bmask = 255,
Amask = 0,
colorkey = 0,
alpha = 255 ‘?’
}

And for worldtest it is:

(gdb) print *tmp
$1 = {
flags = 16777216,
format = 0x381bf0,
w = 40,
h = 25,
pitch = 120,
pixels = 0x894800,
offset = 0,
hwdata = 0x0,
clip_rect = {
x = 0,
y = 0,
w = 40,
h = 25
},
unused1 = 0,
locked = 0,
map = 0x383780,
format_version = 3,
refcount = 1
}
(gdb) print *tmp->format
$2 = {
palette = 0x0,
BitsPerPixel = 24 ‘\030’,
BytesPerPixel = 3 ‘\003’,
Rloss = 0 ‘\0’,
Gloss = 0 ‘\0’,
Bloss = 0 ‘\0’,
Aloss = 8 ‘\b’,
Rshift = 0 ‘\0’,
Gshift = 8 ‘\b’,
Bshift = 16 ‘\020’,
Ashift = 0 ‘\0’,
Rmask = 255,
Gmask = 65280,
Bmask = 16711680,
Amask = 0,
colorkey = 0,
alpha = 255 ‘?’
}

In both cases, the clip_rect and pitch look much more sensible to me.
Although it should be noted that the set of tests with SDL-1.3 are on
ppc, the ones with SDL-1,2 on x86.

Any ideas? Do I need to manually set pitch and clip_rect in SDL 1.3 in
contrast to 1.2?

Kai

P.S.: We’ve also ran the problematic code through valgrind on Linux,
but that yielded nothing suspicious:

"I am showing four memory leaks. One in the audio code, one in the
event manager, and the other two in the X11 libraries. I’m not showing
any invalid memory accesses."On Tue, Feb 17, 2009 at 4:36 PM, Sam Lantinga wrote:

Now I am confused. Moving on and trying to fix the deprecation warning
on OSX 10.5

2009-02-22 13:12:16.009 guitest[5674:10b] Warning once: This
application, or a library it uses, is using NSQuickDrawView, which has
been deprecated. Apps should cease use of QuickDraw and move to
Quartz.

I applied to our initialization code what had been suggested earlier
on this list:

http://lists.libsdl.org/htdig.cgi/sdl-libsdl.org/2006-April/055453.html

< CPSProcessSerNum PSN;
< /* Tell the dock about us */
< if (!CPSGetCurrentProcess(&PSN))
< if (!CPSEnableForegroundOperation(&PSN,0x03,0x3C,0x2C,0x1103))
< if (!CPSSetFrontProcess(&PSN))
< [SDLApplication sharedApplication];On Sun, Feb 22, 2009 at 12:03 PM, Kai Sterker <@Kai_Sterker> wrote:

On Tue, Feb 17, 2009 at 4:36 PM, Sam Lantinga wrote:

Does this happen with the SDL 1.3 snapshot?
http://www.libsdl.org/tmp/SDL-1.3.zip

The short version is: it does not work at all with SDL-1.3-4444.

The test program that runs fine with SDL-1.2 (worldtest) just displays
a black screen. The other one (guitest) with the display issues
crashes when trying to access the surface pixel data.


/* Tell the dock about us */
  ProcessSerialNumber psn = { 0, kCurrentProcess };
  /* Dock visibility, no error check because of bundle launchs. */
  TransformProcessType (&psn, kProcessTransformToForegroundApplication);
  /* make the application frontmost */
  SetFrontProcess (&psn);

(And yes, we’re using a slightly customized version of SDLMain.m, see here:
http://cvs.savannah.gnu.org/viewvc/adonthell/src/main/sdl/osx.m?root=adonthell&view=markup
)

And guess what? It now runs in SDL 1.3!

There are some issues with keyboard input (doesn’t react to the arrow
keys, but space and esc are fine) and worldtest shows a problem with
per-surface alpha, but guitest behaves as expected. Except, on exit it
says:

guitest(23733) malloc: *** error for object 0x557e658: Non-aligned
pointer being freed
guitest(23733) malloc: *** set a breakpoint in szone_error to debug

It doesn’t trigger the crash-report window any longer, however.

I then applied the same change to the box with SDL-1.2 (the one that
is actually running OSX 10.5), but it didn’t change anything …
neither did the deprecation warning disappear.

Hope you can make anything out of this. I’ll go and hunt down the
input issues in the meantime.

Kai

2009/2/22 Kai Sterker <kai.sterker at gmail.com>:

guitest(23733) malloc: *** error for object 0x557e658: Non-aligned
pointer being freed
guitest(23733) malloc: *** set a breakpoint in szone_error to debug

It doesn’t trigger the crash-report window any longer, however.

Crash on exit, esp. in SDL_FreeSurface() is usually caused by manually
freeing the video surface. Make sure you don’t
SDL_FreeSurface(this->surf) [or whatever] in gfx::cleanup().

Clearly, if it’s happening in the unit tests, then it’s almost
certainly something else.

Eddy