SDL parachute usage or dodgy compiler optimisation

Andrew_Beardmore · October 1, 2001, 6:54pm

Hi all,

I’ve been writing an OpenGL application which uses SDL for the front end.
On my mandrake 8.0 box at home it runs just fine, with no apparent probs.
But on my RedHat 6.2 machine at work it stops after 10-20secs,
deploying the SDL parachute:
Fatal signal: Segmentation Fault (SDL Parachute Deployed)

My initial question was going to be what’s the best way to go about debugging
once the parachute has been deployed ?

I reckoned I would try and attach to the running process with gdb, or even
run it from within gdb to begin with.
So after recompiling with debugging enabled and optimisations turned off
(-g -O0) I ran it again, but this time it worked fine and wouldn’t segfault!

Exploring this further, it seems to segfault when the optimisation flag is
set to -O2 or higher (initially I had it set at O3). But it runs OK with -O1
or O0.

Is there a known problem with optimised code generated by RH6.2 (compiler
version egcs-2.91.66)? Or have I got a lot more head scratching to go
through before I nail this one?

cheers, Andy.

(p.s. while I’m here - at one stage I thought I’d update to the newst cvs
version of sdl to see if the problem went away. But I had no joy logging in
to the cvs server. the instructions to access the cvs version on libsdl.org
say
"# Hit when prompted for a password"
^-- ? what

I’ve tried typing return, space and even guest (as it used to be when hosted
at loki) for the password, but couldn’t login? In the end I downloaded the
tar file version.)

Robin_Forster · October 1, 2001, 7:30pm

xxgdb will notify you of the offending statement. I use it all the time
and it never failed me. Unless the fault is in a non-debuggable
library. But you should be able to see which line caused the fault
regardless.

Robin.

Andrew Beardmore wrote:>

Hi all,

I’ve been writing an OpenGL application which uses SDL for the front end.
On my mandrake 8.0 box at home it runs just fine, with no apparent probs.
But on my RedHat 6.2 machine at work it stops after 10-20secs,
deploying the SDL parachute:
Fatal signal: Segmentation Fault (SDL Parachute Deployed)

My initial question was going to be what’s the best way to go about debugging
once the parachute has been deployed ?

I reckoned I would try and attach to the running process with gdb, or even
run it from within gdb to begin with.
So after recompiling with debugging enabled and optimisations turned off
(-g -O0) I ran it again, but this time it worked fine and wouldn’t segfault!

Exploring this further, it seems to segfault when the optimisation flag is
set to -O2 or higher (initially I had it set at O3). But it runs OK with -O1
or O0.

Is there a known problem with optimised code generated by RH6.2 (compiler
version egcs-2.91.66)? Or have I got a lot more head scratching to go
through before I nail this one?

cheers, Andy.

(p.s. while I’m here - at one stage I thought I’d update to the newst cvs
version of sdl to see if the problem went away. But I had no joy logging in
to the cvs server. the instructions to access the cvs version on libsdl.org
say
“# Hit when prompted for a password”
^-- ? what

I’ve tried typing return, space and even guest (as it used to be when hosted
at loki) for the password, but couldn’t login? In the end I downloaded the
tar file version.)

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

–
Robin Forster, Systems Engineer,
http://www.rsforster.ottawa.on.ca/

icculus · October 1, 2001, 9:36pm

xxgdb will notify you of the offending statement. I use it all the time
and it never failed me. Unless the fault is in a non-debuggable
library. But you should be able to see which line caused the fault
regardless.

Upgrade your compiler, too. egcs is ANCIENT, and has problems with things
like inline ASM (especially at different optimization levels).

You should be using AT LEAST gcc 2.95.2 at this point, and probably later.

–ryan.

Jp_Calderone · October 1, 2001, 10:57pm

[snip]
Is there a known problem with optimised code generated by RH6.2 (compiler
version egcs-2.91.66)? Or have I got a lot more head scratching to go
through before I nail this one?

Generally, crashes that come in when you add optimization flags (at
least to gcc) are due to uninitialized variables. So I’d check that
first.

cheers, Andy.

(p.s. while I’m here - at one stage I thought I’d update to the newst cvs
version of sdl to see if the problem went away. But I had no joy logging in
to the cvs server. the instructions to access the cvs version on libsdl.org
say
“# Hit when prompted for a password”
^-- ? what

I’ve tried typing return, space and even guest (as it used to be when hosted
at loki) for the password, but couldn’t login? In the end I downloaded the
tar file version.)

CVSROOT=:pserver:guest at libsdl.org:/home/slouken/libsdl.org/cvs cvs login

CVSROOT=:pserver:guest at libsdl.org:/home/slouken/libsdl.org/cvs cvs co SDL12

should do it for ya

Jp CalderoneOn Mon, 1 Oct 2001, Andrew Beardmore wrote:

A_R_Mosteo_Chagoyen · October 1, 2001, 11:48pm

[snip]
Is there a known problem with optimised code generated by RH6.2 (compiler
version egcs-2.91.66)? Or have I got a lot more head scratching to go
through before I nail this one?

Generally, crashes that come in when you add optimization flags (at
least to gcc) are due to uninitialized variables. So I’d check that
first.

Also, O2 is more tested than O0, so the culprit is more likely your
program than an odd optimization. (Optimization exposes some obscure
bug in code).

The long delay before the fault may relate to a memory leak or
dangling pointer reference.

Cheers,

?lex.> On Mon, 1 Oct 2001, Andrew Beardmore wrote:

Andrew_Beardmore · October 2, 2001, 8:06pm

[snip]
Is there a known problem with optimised code generated by RH6.2 (compiler
version egcs-2.91.66) Or have I got a lot more head scratching to go
through before I nail this one

Also, O2 is more tested than O0, so the culprit is more likely your
program than an odd optimization. (Optimization exposes some obscure
bug in code). The long delay before the fault may relate to a memory leak
or dangling pointer reference.

Memory usage seems stable when monitored with top.

The code is actually a particle system simulation written in c++ (heavy
use of templates and inlining). By running it on RH6.2 (compiled with -g
-O2, remember it doesn’t segfault with -O1 or -O0) under gdb I found out
it choked in the ODE integrator and spewed out a NaN.

However, I’ve since got my hands on a RH7.1 box and it runs fine, no NaNs,
no segfaults, nothing to debug (famous last words I’m sure!) with
optimisation set at -O2 or -O3.

So it works on Mandrake 8.0 and RH7.1, and fails on RH6.2 (with anything
higher than -O1).

I’m quite happy to leave it as “a compiler issue”.

Now I’ve just got to persuade our system administrator to upgrade my
pc to RH7.1 - I’ve been trying for 4 months so don’t hold at much
hope…

Cheers,
Andy.

A_R_Mosteo_Chagoyen · October 2, 2001, 11:52pm

The code is actually a particle system simulation written in c++ (heavy
use of templates and inlining). By running it on RH6.2 (compiled with -g
-O2, remember it doesn’t segfault with -O1 or -O0) under gdb I found out
it choked in the ODE integrator and spewed out a NaN.

I had a problem some years ago when switching between a linux box
(SuSE maybe) and a HPUX box. Function atan2 was returning different
results in each version for some (undefined?) values (x/0). Linux box
returned 90 degrees and HP returned NaN. Maybe your problem can be
related (or maybe, as you say, is compiler fault. I must say, always,
always, when I have made culprit the compiler for a weird bug, in the
end was my fault).

Well, I have made crash a compiler (gcc indeed, when unrolling loops),
but never generate incorrect code (but, compilers has bugs, so really
I’ve been lucky I suppose).

Cheers,

?lex.