Some patches for SDL_memset4 and SDL_memcpy4

Hi all,

long time ago some code was posted here to improve the default
implementation fo SDL_memset4 for arm (
http://sdl.5483.n7.nabble.com/ARM-optimised-memset-td29728.html).
I originally ported that code for the ancient GAS present in Xcode and then
lost track of it when converting it to inline assembly.

When I found that code a few days ago and tried inlining it again but
failed, So I looked if there were alternatives and found this nice code
http://lists.uclibc.org/pipermail/uclibc/2003-September/027817.html that
works well with SDL. I have a patch ready (attached) but I think I have to
contact the original author and ask if he’s ok with zlib before inclusion.
Just as backup, I’ve included the original patch that still needs to be
inlined… Could anyone take a look at it maybe and see what’s wrong when
inlining?

Then I noticed that also memcpy4 could be optimized, and added it even
thought SDL doesn’t use it internally – maybe I placed the assembly in the
wrong place, if so tell me and I’ll move it.

What also puzzled me a bit is that for SDL_memcpy there is a check for
using __builtin_memcpy… so what is really preventing us for applying the
same check for the other functions? I’ve added a third patch that adds this
check for other functions too…

Can anyone test and review all these patches?
Also, besides alignment, what is preventing using the __builtin equivalent
functions for memset?

Cheers,
Vittorio
-------------- next part --------------
A non-text attachment was scrubbed…
Name: 1-check_for_some___builtin___functions.patch
Type: application/octet-stream
Size: 1217 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20130720/a1612e28/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed…
Name: 2-ARM_optimized_SDL_memset4.patch
Type: application/octet-stream
Size: 3000 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20130720/a1612e28/attachment-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed…
Name: 3-ARM_optimized_SDL_memcpy4.patch
Type: application/octet-stream
Size: 2548 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20130720/a1612e28/attachment-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed…
Name: sdl-framebuffer.patch
Type: application/octet-stream
Size: 4077 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20130720/a1612e28/attachment-0003.obj

Hi all,

Can anyone test and review all these patches?
Also, besides alignment, what is preventing using the __builtin equivalent
functions for memset?

Cheers,
Vittorio

I’ve attached the patches here for better tracking
https://bugzilla.libsdl.org/show_bug.cgi?id=1989

I’d be nice if anyone could shed some light on those __builtin functions
too :slight_smile:

Cheers,
VittorioOn Sat, Jul 20, 2013 at 12:42 PM, Vittorio Giovara < @Vittorio_Giovara> wrote:

Hi Vittorio,

Can anyone test and review all these patches?
Also, besides alignment, what is preventing using the __builtin equivalent
functions for memset?

I sent through those original patches. I don’t really have a comment
on the ones you’ve supplied, however you might be interested in a
similar bit of work that’s being done in the musl C library. We’re
trying to get ARM optimised versions of mem* functions in there
(currently just memcpy, but hopefully memset will be next). The
discussion/code for this can be found at
http://www.openwall.com/lists/musl/2013/07/24/4.

Regards,
Andre

Hi Andre,On Fri, Jul 26, 2013 at 12:21 AM, Andre Renaud wrote:

I sent through those original patches. I don’t really have a comment
on the ones you’ve supplied, however you might be interested in a
similar bit of work that’s being done in the musl C library. We’re
trying to get ARM optimised versions of mem* functions in there
(currently just memcpy, but hopefully memset will be next). The
discussion/code for this can be found at
http://www.openwall.com/lists/musl/2013/07/24/4.

I remember your submission, and you current code looks very nice. Your
experience with asm beats mine by all means, so would you consider
submitting a patch to bring your optmized memset to inline assembly for
SDL2? I don’t mind dropping mine.

If I remember correctly you mentioned a 5 time speed increase, so I’m sure
many iOS/Android developers would look forward to your contribution.

Cheers,
Vittorio