3DNow! detection on a P4 system

Hi,

I have a Pentium 4 system but I noticed that SDL_Has3DNow() returns
true causing an illegal instruction crash in
BlitRGBtoRGBPixelAlphaMMX3DNOW() even though I am using the latest CVS
version in which this should be fixed (according to the changelog).

I investigated why this happens and it seems that it is due to eax being
defined in the “clobbered registers” list of the inline assembly part of
CPU_have3DNow() in SDL_cpuinfo.c. If eax is in the list some seemingly
random value gets written in the has_3DNow variable. It seems that the
compiler puts has_3DNow in eax (as defined on the input variables line).

Changing line 167
from
: “%eax”, “%ebx”, “%ecx”, "%edx"
to
: “%ebx”, “%ecx”, "%edx"
seems to fix the problem. (Do ebx, ecx and edx need to be defined there
either since they’re pushed and popped by the asm code anyway? I’m no
expert in x86 assembly though…)

I’m running Linux 2.6.1 and glibc 2.3.2. I’m compiling SDL using gcc
3.3.2 and binutils 2.14.

Regards,

-Vesa

(I accidentally posted a message about this a few hours ago before
subscribing to the list so if the list moderator sees that message you
can deny it :))–
? Vesa Halttunen - http://www.jormas.com/~vesuri/ ?

Duh, this had already been discussed on this list. Sorry, gotta read the
archives more carefully next time. I also noticed that I was using the
latest CVS tarball, not actually the current CVS version…

However, it seems that the defining the has_3DNow input parameter type
as =m (as currently in CVS) does not fix the problem. With =m has_3DNow
seems to get the value 1 even if the asm section contains only one nop
instruction! The only way I got it to work correctly was the solution I
described: keep the input parameter in register (=r) and remove eax from
the clobbered registers list.

Regards,

-VesaOn Mon, 2004-01-19 at 01:47, Vesa Halttunen wrote:

I have a Pentium 4 system but I noticed that SDL_Has3DNow() returns
true causing an illegal instruction crash in
BlitRGBtoRGBPixelAlphaMMX3DNOW() even though I am using the latest CVS
version in which this should be fixed (according to the changelog).


? Vesa Halttunen - http://www.jormas.com/~vesuri/ ?

Vesa Halttunen wrote:

Hi,

I have a Pentium 4 system but I noticed that SDL_Has3DNow() returns
true causing an illegal instruction crash in
BlitRGBtoRGBPixelAlphaMMX3DNOW() even though I am using the latest CVS
version in which this should be fixed (according to the changelog).

I investigated why this happens and it seems that it is due to eax being
defined in the “clobbered registers” list of the inline assembly part of
CPU_have3DNow() in SDL_cpuinfo.c. If eax is in the list some seemingly
random value gets written in the has_3DNow variable. It seems that the
compiler puts has_3DNow in eax (as defined on the input variables line).

Changing line 167
from
: “%eax”, “%ebx”, “%ecx”, "%edx"
to
: “%ebx”, “%ecx”, "%edx"
seems to fix the problem.

Ahem, that’s no proper fix : since eax gets modified int the asm code
anyway, this could introduce some other errors. If eax is modified, we
have to tell gcc (the reason it works is that gcc allocated eax to hold
has_3DNow, but this is not a good idea to rely on this behaviour in the
long run).

(Do ebx, ecx and edx need to be defined there
either since they’re pushed and popped by the asm code anyway?

No. But you should rather define them than push/pop them.

Stephane

Vesa Halttunen wrote:

However, it seems that the defining the has_3DNow input parameter type
as =m (as currently in CVS) does not fix the problem. With =m has_3DNow
seems to get the value 1 even if the asm section contains only one nop
instruction! The only way I got it to work correctly was the solution I
described: keep the input parameter in register (=r) and remove eax from
the clobbered registers list.

Are you sure ? I just tried on an athlon XP, a K6, a P3, a P4, and a
Celeron and all those cpus return correct cpu features using “=m”. I’m
using gcc 3.3.1, though.

Stephane

Changing line 167
from
: “%eax”, “%ebx”, “%ecx”, "%edx"
to
: “%ebx”, “%ecx”, "%edx"
seems to fix the problem.
Ahem, that’s no proper fix : since eax gets modified int the asm code
anyway, this could introduce some other errors. If eax is modified, we
have to tell gcc (the reason it works is that gcc allocated eax to hold
has_3DNow, but this is not a good idea to rely on this behaviour in the
long run).

That’s true. Could the inline assembly register constraints be used
to force the variable to always use eax so that this would not be a
problem?

(Do ebx, ecx and edx need to be defined there
either since they’re pushed and popped by the asm code anyway?
No. But you should rather define them than push/pop them.

Can the unnecessary push/pops can be removed then?

Regards,

-VesaOn Mon, 2004-01-19 at 18:25, Stephane Marchesin wrote:

        ? Vesa Halttunen - http://www.jormas.com/~vesuri/ ?

Vesa Halttunen wrote:

That’s true. Could the inline assembly register constraints be used
to force the variable to always use eax so that this would not be a
problem?

There is no way that I know of. I also don’t like that too much because
it might create problem with 64 bits architectures that have IA32
compatibility, but different registers available.

(Do ebx, ecx and edx need to be defined there
either since they’re pushed and popped by the asm code anyway?

No. But you should rather define them than push/pop them.

Can the unnecessary push/pops can be removed then?

Yes, except for ebx, which is used by gcc to access memory symbols and
still needs to be pushed/poped.
But I see you’re confused now :slight_smile:

So, if you checked that you have the issue, could you please send me
your SDL_cpuinfo.o file so that I take a look at it ?

Stephane

Yes, I’m sure. It seems that when =m is used GCC uses 0x8(%esp,1) as
the destination in
" movl $1,%0 # Yep, we have 3DNow! support!
\n"
but esp does not point to the expected location because ebx/ecx/edx have
been pushed to the stack. I’ll send the resulting SDL_cpuinfo.o.

Regards,

-VesaOn Mon, 2004-01-19 at 18:40, Stephane Marchesin wrote:

Are you sure ? I just tried on an athlon XP, a K6, a P3, a P4, and a
Celeron and all those cpus return correct cpu features using “=m”. I’m
using gcc 3.3.1, though.


? Vesa Halttunen - http://www.jormas.com/~vesuri/ ?

Maybe something has changed between GCC 3.3.1 and 3.3.2 that breaks
this… I got this working with =m and 3.3.2 with no tricks as follows:

  • do not initialize has_3DNow in C but write a 0 to output in asm before
    pushing ebx/ecx/edx (for some reason the output contains 1 by default
    no matter what)
  • pop edx/ecx/ebx back right after testl - pops don’t affect the test
    result so jz still works as expected (so if esp is used for storing
    the result the stack pointer points to the expected location)

-VesaOn Mon, 2004-01-19 at 18:40, Stephane Marchesin wrote:

Are you sure ? I just tried on an athlon XP, a K6, a P3, a P4, and a
Celeron and all those cpus return correct cpu features using “=m”. I’m
using gcc 3.3.1, though.


? Vesa Halttunen - http://www.jormas.com/~vesuri/ ?

Maybe something has changed between GCC 3.3.1 and 3.3.2 that breaks
this… I got this working with =m and 3.3.2 with no tricks as follows:

I can’t believe how much energy we’re all expending to initialize a
register properly. :slight_smile:

–ryan.

Maybe something has changed between GCC 3.3.1 and 3.3.2 that breaks
this… I got this working with =m and 3.3.2 with no tricks as follows:

I can’t believe how much energy we’re all expending to initialize a
register properly. :slight_smile:

You should have seen the hours on IRC spent trying to figure out why
we needed to add the register push and pops even though the clobber
list was defined properly. We ended up always pushing and popping
because we couldn’t figure out why gcc wasn’t saving and restoring
the registers properly when the function was inlined.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Maybe something has changed between GCC 3.3.1 and 3.3.2 that breaks
this… I got this working with =m and 3.3.2 with no tricks as
follows:

I can’t believe how much energy we’re all expending to initialize a
register properly. :slight_smile:

You should have seen the hours on IRC spent trying to figure out why
we needed to add the register push and pops even though the clobber
list was defined properly. We ended up always pushing and popping
because we couldn’t figure out why gcc wasn’t saving and restoring
the registers properly when the function was inlined.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Sounds like some obscure bug in gcc, then – did anyone take the time to
notify the gcc development people?

  • Silicon

Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.562 / Virus Database: 354 - Release Date: 1/16/2004

----- Original Message -----
From: slouken@devolution.com (Sam Lantinga)
To:
Sent: Wednesday, January 21, 2004 10:19 PM
Subject: Re: [SDL] 3DNow! detection on a P4 system

You should have seen the hours on IRC spent trying to figure out why
we needed to add the register push and pops even though the clobber
list was defined properly. We ended up always pushing and popping
because we couldn’t figure out why gcc wasn’t saving and restoring
the registers properly when the function was inlined.

Indeed, it is saving the registers, except you have to save ebx yourself if you change it.
Also, I don’t think you are allowed to push/pop registers inside an inline function (but don’t cite me on that) and that causes trouble with gcc 3.3.2 that uses a more thorough inlining.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Sounds like some obscure bug in gcc, then – did anyone take the time to
notify the gcc development people?

I don’t think this is an issue with gcc.
Vesa Halttunen and I made the attached patch that works with gcc 3.3.2 too. The push/pops are removed and we use ide to save ebx instead.

Stephane
-------------- next part --------------
A non-text attachment was scrubbed…
Name: cpuinfo.patch
Type: application/octet-stream
Size: 2651 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20040122/7a4e7b2a/attachment.obj

I don’t think this is an issue with gcc.
Vesa Halttunen and I made the attached patch that works with gcc 3.3.2 too. The push/pops are removed and we use ide to save ebx instead.

I’ve applied a modified version of your patch to CVS.
Can you update from CVS and see if it still works? :slight_smile:

Thanks!
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Sam Lantinga wrote:

I don’t think this is an issue with gcc.
Vesa Halttunen and I made the attached patch that works with gcc 3.3.2 too. The push/pops are removed and we use ide to save ebx instead.

I’ve applied a modified version of your patch to CVS.
Can you update from CVS and see if it still works? :slight_smile:

Well… I found another bug : 3dnow was detected on P2/P3 cpus. So I
took one more look at the code and found a mistake in
CPU_getCPUIDFeaturesExt :
The condition for having extended cpuinfo is that when we query extended
cpuinfo we get a result >= 0x80000001. So we must exit if eax < 0x80000001.
The attached patch does that.

Btw, you probably need the same fix for the VC++ version, but I can’t
test it so it’s not included.

Also, you should keep in mind that SSE/SSE2 insructions that use the SSE
registers (xmm0, xmm1…) also need OS support (to save those registers
during context switches for example).

Stephane

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed…
Name: cpufix2.patch
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20040124/224f7b58/attachment.txt

Well… I found another bug : 3dnow was detected on P2/P3 cpus. So I
took one more look at the code and found a mistake in
CPU_getCPUIDFeaturesExt :

Should be fixed, thanks!

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment