Hi,
About MMX copying - AMD heavily optimized.
http://www.cs.virginia.edu/stream/FTP/Contrib/AMD/memcpy_amd.asm
Intel/AMD MMX routine, using in Linux sources.
http://grace-ist.org/horde/chora/co.php?r=1.1&f=xine-lib/src/xine-utils/memcpy.c&Horde=56b36958409aa348f1b989d45973dd9f
I suppose SDL should use ideas from http://grace-ist.org
and use different AMD/Intel routines.
simple patch for unrolling SDL_memcpyMMX loop:
static inline void SDL_memcpyMMX(char* to,char* from,int len)
{
int i;
for(i=0; i<len/64; i++) {
__asm__ __volatile__ (
"movq (%0), %%mm0\n"
"movq 8(%0), %%mm1\n"
"movq 16(%0), %%mm2\n"
"movq 24(%0), %%mm3\n"
"movq 32(%0), %%mm4\n"
"movq 40(%0), %%mm5\n"
"movq 48(%0), %%mm6\n"
"movq 56(%0), %%mm7\n"
"movq %%mm0, (%1)\n"
"movq %%mm1, 8(%1)\n"
"movq %%mm2, 16(%1)\n"
"movq %%mm3, 24(%1)\n"
"movq %%mm4, 32(%1)\n"
"movq %%mm5, 40(%1)\n"
"movq %%mm6, 48(%1)\n"
"movq %%mm7, 56(%1)\n"
: : "r" (from), "r" (to) : "memory");
from+=64;
to+=64;
}
if (len&63)
SDL_memcpy(to, from, len&63);
}–
Best regards,
Dmitry Yakimov, ISDEF member
ActiveKitten.com