SDL_RWread *much* slower on Windows in 1.2.10/11

Hi!

When upgrading my SDL library to 1.2.11 lately, I noticed an extreme
decrease in startup time of my SDL based game “Rocks’n’Diamonds” on
the Windows platform.

The bottleneck was quickly found to be image loading: When loading PCX
files in my game, this results in “IMG_Load()” being around 6-8 times
slower than with the previously used SDL version (1.2.8)!

After a little profiling, I found the cause being a change in
"SDL_RWFromFile()" to using “win32_file_read()” instead of using
"stdio_read()" for file read access on WIN32 platforms. (I found out
that this change was introduced in SDL 1.2.10.)

As I can see, “win32_file_read()” effectively uses “ReadFile()” from
the Windows API, while “stdio_read()” uses standard ANSI-C “fread()”.

This problem was confirmed with the same relative speed difference on
two PCs, one with Windows 2000 and one with Windows XP.

After recompiling the SDL.dll using the ANSI-C file access functions
again (by changing the source code a little bit), everything was fast
again. But there must have been a good reason to change this, so this
probably is not the final solution.

Can anybody of the SDL developers comment on why this change was needed
or what might be the reason for the extreme difference in reading bytes
from a file stream?

Best regards,
Holger–
@Holger_Schemel

Holger Schemel wrote:

Hi!

When upgrading my SDL library to 1.2.11 lately, I noticed an extreme
decrease in startup time of my SDL based game “Rocks’n’Diamonds” on
the Windows platform.

The bottleneck was quickly found to be image loading: When loading PCX
files in my game, this results in “IMG_Load()” being around 6-8 times
slower than with the previously used SDL version (1.2.8)!

After a little profiling, I found the cause being a change in
"SDL_RWFromFile()" to using “win32_file_read()” instead of using
"stdio_read()" for file read access on WIN32 platforms. (I found out
that this change was introduced in SDL 1.2.10.)

As I can see, “win32_file_read()” effectively uses “ReadFile()” from
the Windows API, while “stdio_read()” uses standard ANSI-C “fread()”.

This problem was confirmed with the same relative speed difference on
two PCs, one with Windows 2000 and one with Windows XP.

After recompiling the SDL.dll using the ANSI-C file access functions
again (by changing the source code a little bit), everything was fast
again. But there must have been a good reason to change this, so this
probably is not the final solution.

Can anybody of the SDL developers comment on why this change was needed
or what might be the reason for the extreme difference in reading bytes
from a file stream?

The change was to remove dependencies on a specific C runtime from SDL.

If I had to guess, I’d say the reason it’s slower is that ReadFile()
reads exactly what you tell it to, whereas the stdio fread() is reading
ahead and caching data, so several calls to fread() to grab a couple of
bytes go to disk once and then memcpy() thereafter, whereas each
ReadFile() is going to disk.

–ryan.

Hi!

Can anybody of the SDL developers comment on why this change was needed
or what might be the reason for the extreme difference in reading bytes
from a file stream?

The change was to remove dependencies on a specific C runtime from SDL.

Sounds reasonable (especially from a library developer’s point of view).

Does this therefore mean that there’s no technical reason not to use the
ANSI-C functions in a specific C-only project?

My main concern was that my “patched” SDL.dll could cause problems on
Windows platforms I don’t know of (like those mentioned in the 1.2.10
ChangeLog notes for Windows).

If I had to guess, I’d say the reason it’s slower is that ReadFile()
reads exactly what you tell it to, whereas the stdio fread() is reading
ahead and caching data, so several calls to fread() to grab a couple of
bytes go to disk once and then memcpy() thereafter, whereas each
ReadFile() is going to disk.

So ReadFile() works always unbuffered? Wow, that’s terrible indeed for
an image loading routine that reads the file byte by byte (as at least
the PCX image loader does).

Is there any way of running ReadFile() in buffered mode, like fread()?
(Wasn’t able to find anything helpful about this with Google, but then,
I’m not a native Windows programmer (cross-compiling from Linux).)

If not, it may be worth tuning SDL_image’s loading code to use buffered
I/O by itself (by prefetching more bytes internally), if needed (but
this wouldn’t be a trivial approach, of course :-/ ).

Best regards,
Holger–
@Holger_Schemel

Does this therefore mean that there’s no technical reason not to use the
ANSI-C functions in a specific C-only project?

I think it was more a concern for distributing an SDL.dll that Just
Works, whether you use Borland’s tools, Visual Studio, or Cygwin…and
whether you use a multithreaded runtime, debug runtime, etc.

Within your own project, if the stdio codes works, it probably won’t
STOP working.

If not, it may be worth tuning SDL_image’s loading code to use buffered
I/O by itself (by prefetching more bytes internally), if needed (but
this wouldn’t be a trivial approach, of course :-/ ).

Probably be better to buffer inside SDL so it works as before and we
don’t have to clean up every project.

That being said…I don’t think it’s the buffering. I just checked the
MSDN docs and it says it SHOULD be buffering it (there’s a flag to
explicitly disable buffering in CreateFile() but we don’t use it). I
guess we’ll have to research this further.

I’ve added it to the bugtracker:
http://bugzilla.libsdl.org/show_bug.cgi?id=412

–ryan.

Within your own project, if the stdio codes works, it probably won’t
STOP working.

Good to know! Just wasn’t sure about Windows targets like Vista I don’t
yet have access to for testing… (But then, dropping support for these
ANSI-C functions is indeed not that likely, even from Microsoft. :wink: )

If not, it may be worth tuning SDL_image’s loading code to use buffered
I/O by itself (by prefetching more bytes internally), if needed (but
this wouldn’t be a trivial approach, of course :-/ ).

Probably be better to buffer inside SDL so it works as before and we
don’t have to clean up every project.

That being said…I don’t think it’s the buffering. I just checked the

Also can’t really believe this, as several other system layers use
some kind of buffering (e.g. the hard disk’s internal cache etc.),
so a speed difference in that dimensions (6-8 even after repeated
invocations, when everything is in the file system’s buffer cache
anyway) seems quite unlikely.

MSDN docs and it says it SHOULD be buffering it (there’s a flag to
explicitly disable buffering in CreateFile() but we don’t use it). I
guess we’ll have to research this further.

Yep, this indeed seems to be worth some further investigation. Funny
though that I seem to be the only one who experienced such a huge
speed difference. (Not sure if the impact on non-PCX image files is
the same though.)

I’ve added it to the bugtracker:
http://bugzilla.libsdl.org/show_bug.cgi?id=412

Thanks! (If this reveals to be not easily fixable somehow, it might
be worth adding a configure option for the mingw built target to
explicitly choose using the stdio functions.)

Best regards,
Holger–
@Holger_Schemel

Good to know! Just wasn’t sure about Windows targets like Vista I don’t
yet have access to for testing… (But then, dropping support for these
ANSI-C functions is indeed not that likely, even from Microsoft. :wink: )

The C runtime on Windows is shipped with your product (it’s not
something that exists system-wide like Linux), and it just wraps Win32
APIs, so it definitely won’t be going away. :slight_smile:

–ryan.

Holger Schemel wrote:

Within your own project, if the stdio codes works, it probably won’t
STOP working.

Good to know! Just wasn’t sure about Windows targets like Vista I don’t
yet have access to for testing… (But then, dropping support for these
ANSI-C functions is indeed not that likely, even from Microsoft. :wink: )

If not, it may be worth tuning SDL_image’s loading code to use buffered
I/O by itself (by prefetching more bytes internally), if needed (but
this wouldn’t be a trivial approach, of course :-/ ).

Probably be better to buffer inside SDL so it works as before and we
don’t have to clean up every project.

That being said…I don’t think it’s the buffering. I just checked the

Also can’t really believe this, as several other system layers use
some kind of buffering (e.g. the hard disk’s internal cache etc.),
so a speed difference in that dimensions (6-8 even after repeated
invocations, when everything is in the file system’s buffer cache
anyway) seems quite unlikely.

MSDN docs and it says it SHOULD be buffering it (there’s a flag to
explicitly disable buffering in CreateFile() but we don’t use it). I
guess we’ll have to research this further.

Yep, this indeed seems to be worth some further investigation. Funny
though that I seem to be the only one who experienced such a huge
speed difference. (Not sure if the impact on non-PCX image files is
the same though.)

I’ve added it to the bugtracker:
http://bugzilla.libsdl.org/show_bug.cgi?id=412

Thanks! (If this reveals to be not easily fixable somehow, it might
be worth adding a configure option for the mingw built target to
explicitly choose using the stdio functions.)

I could imagine a reason why it would be slower than the ANSI-C function,
and it has to do with buffering. If the Win32 API resides in the kernel and
needs kernel mode rights, you have to pay for a context switch each time
you call it. Thus, even if the later does some buffering, it could be worth
it to add more buffering above that in the ANSI-C function itself.

Good to know! Just wasn’t sure about Windows targets like Vista I don’t
yet have access to for testing… (But then, dropping support for these
ANSI-C functions is indeed not that likely, even from Microsoft. :wink: )

The C runtime on Windows is shipped with your product (it’s not
something that exists system-wide like Linux), and it just wraps Win32
APIs, so it definitely won’t be going away. :slight_smile:

Oops – apparently I was thinking too Linux-centric! :-o

Thanks for the clarification! :-)–
@Holger_Schemel