SDL_memcpy variants used in SDL_BlitCopy

Hello, all.
I am wondering is SDL_memcpyMMX() and SDL_memcpySSE() are actually faster
than plain memcpy() on any Intel chips. My tests of copying 1Meg buffer of
regular memory run on Windows 2000, 1.7MHz Intel Xeon show that the MMX
version is 2-4% slower and the SSE version is ~45% slower than a regular
intrinsic/inline memcpy() using “rep movsd”.

Did anyone run similar tests on other chips and/or platform? What were the
results? I am really curious.

If these versions are faster on an AMD chip then they should only be enabled
for an AMD chip, right?

Regards,
Alex.

I am wondering is SDL_memcpyMMX() and SDL_memcpySSE() are actually faster
than plain memcpy() on any Intel chips. My tests of copying 1Meg buffer of
regular memory run on Windows 2000, 1.7MHz Intel Xeon show that the MMX
version is 2-4% slower and the SSE version is ~45% slower than a regular
intrinsic/inline memcpy() using “rep movsd”.

Last time I checked, “rep movsd” was significantly slower (and was
horrified to find that this is what glibc does internally in its own
memcpy()).

Tried it on an AMD chip, though (it was either an Athlon MP or an early
Opteron, can’t remember which).

–ryan.