Efficient image format for SDL2?

gee · April 21, 2021, 9:17am

A bit of a technical question about image formats.

I’m developing a small 2D point’n’click game using C++ with SDL2. So far it’s going very well and it’s been a pleasure to use the SDL, but there’s a little thing that’s been bothering me for a while: how long it takes to load the input images. Overall, for a big level, it can take up to 5 seconds on a recent computer (and up to 10 or 15 seconds on a small tablet for example).

Of course, these timings are not terrible. But considering it’s “just” a modest 2D game, it seems like a lot (I would expect, as a player, a loading time less than 1 second per level). Of course, the game runs at 1920x1080 and the images are thus quite big (especially the animations), but still.

Now to the technical part: I’m using PNG compressed images. Considering my images come from vector drawing (with large flat areas), using pngquant allows to substantially reduce the weight of the image files. I’ve been doing some profiling on the image loading parts of the SDL functions, and here’s something interesting:

using uncompressed bitmaps BMP files with SDL_LoadBPM() loads very fast, but of course the files are also very big
“standard” PNG files take in average 10 times longer to load with IMG_Load() than BMP, but files are 10 to 20 times smaller
lossy compressed PNG (again, using pngquant) are “only” 2 to 5 times longer to load than BMP, with files 50 to 120 times smaller

So far I’ve continued using the latest variant, which gives me quite small files with an average loading time. Ideally, I’d like to get the loading times of SDL_LoadBMP(), but using BMP files really seems like an overkill in terms of disk usage (my simple small 2D game would end up weighing several GB).

Another interesting part: I also did some profiling but separating the file reading itself (which requires disk access, which I know is very costly) and the construction of the SDL image itself. It looks like:

     // Reading from the disk (custom function)
     std::vector<char> buffer = file_to_buffer(filename);
    
     // Constructing the image from RAM access
     SDL_RWops* input = SDL_RWFromMem(buffer.data(), buffer.size());
     SDL_Surface* image = SDL_LoadBMP_RW(input, 0); // or IMG_Load_RW()

Feeding some of my images to a similar, profiling code, here are some results:

disk access for BMP is 4ms in average
disk access for PNG is 0.05ms in average (makes sense considering the files are way smaller)
construction of the image from BMP input is 3ms in average
construction of the image from PNG input is 18ms in average

So what happens is that the longer reading time of BMP is largely compensated by how fast constructing the image from BMP is (whereas PNG is fast to read but slow to decompress). I guess construct a surface from a BMP is basically “just” copying a big chunk of bytes directly into memory, so it makes sense.

So the decompression of PNG images is the costly part in my code. From that, I have some questions:

is it possible that the SDL’s PNG decompression algorithm is not optimal? I haven’t checked, but I imagine it’s using an external PNG library for that, so I doubt it. The solution in that case would be to use another PNG decoder
are there other image formats that are both efficient in terms of compression and fast to decompress? I’ve read TGA is often used for texturing, but I’ve never used it so far. I’ve been also wondering if there were some formats specifically more efficient with my type of image (very flat vector drawings - I imagine using my source SVG would be slow as hell as they would require quantization to be stored in SDL structures)
should I maybe use a different approach? I’ve been thinking that using a unique ZIP container for the assets of my game could maybe make things overall more efficient (provided ZIP decompression is faster than PNG decompression), using one disk access to load everything while allowing to also compress other assets (level descriptions, sounds, etc.)
I’ve seen some interesting development about direct GPU texture interchange (see basis_universal for example), which I can imagine could make loading textures much faster. Do you think this could be integrated in an SDL-based game? (To be clear: this part is far too technical for me, my knowledge of GPU programming is very light, so I have no idea how it works)

Maybe the answer is “your loading times are fine, don’t overthink it”, but I was quite curious to see if people here had some hindsight to share on this issue.

SeanOConnor · April 21, 2021, 3:12pm

Do you need to load all of your images at the start? Why not load them as they’re needed? Set the texture variable to NULL at the start, and then when you come to draw textures check for NULL and only load the PNG then.

gee · April 22, 2021, 1:43pm

I’ve tried similar things, but when loading heavy files, that might result in a small lag at the moment it’s indeed loaded, which is not really good either.

One other idea I had was to just load what’s necessary at start, and then load the rest silently in background when the user is inactive (for example, as soon as no SDL event is generated in a main loop iteration).

AntTheAlchemist · April 22, 2021, 6:38pm

Give lodepng a try. It has an SDL example. Huge PNGs take a split second to load.

Daniel_Gibson · April 23, 2021, 1:41am

is it possible that the SDL’s PNG decompression algorithm is not optimal?

SDL by itself doesn’t even support PNG.
If you’re talking about SDL_Image, it uses libpng, which in my experience is a bit faster than stb_image.h and a lot faster than lodepng (haven’t measured recent versions in a while though, no idea if lodepng has gotten significantly faster in the last years).

PNG is quite slow to decode either way, slower than JPEG (about factor 5 last time I checked) and of course a lot slower than uncompressed formats like TGA or BMP.
(Note: Comparison to JPEG just for reference, I wouldn’t recommend using JPEG for textures because of its compression artifacts.)

Generally it’s a good idea to load needed resources when loading a level and not during gameplay - apart from the time it takes to decode image files, waiting for harddisks can cause additional delay.

Daniel_Gibson · April 23, 2021, 2:36am

Oh right, regarding

It’s not, ZIP and PNG use the same compression algorithm (“deflate” from zlib).
If you put compressed PNGs in ZIPs, you should store them uncompressed (zip compression level 0) - that way you can still put everything in zips and maybe compress your other game data with ZIP compression (if it’s in uncompressed formats).
You could use a custom ZIP-like pak-format with zstd or LZ4 compression and put BMP/TGA or uncompressed PNGs in them - both have significantly faster decompression speed than ZIP/zlib/deflate (LZ4 generally is super fast for decompressing; and zstd is 4-5 times as fast as zlib/deflate at least when using low compression levels that are comparable to zlib compression).

If you wanna stick to ZIP because creating your own archive format is too much work, you could take a look at zlib-ng which apparently decompresses about twice as fast as the original zlib .

Whatever archive format you use, it might be a good idea to use uncompressed PNGs (compression level 0) instead of BMP or TGA, because PNG applies (lossless) filters to the data to make it more compressible.

You can’t use basis universal with SDL_Renderer, so that won’t help.

As I wrote before (and you already mentioned yourself), whatever you end up using, you should load the images needed in a level when loading that level so you don’t get delays/stuttering/dropped frames during gameplay because the game needs to wait for textures to be read and uncompressed.

Devsman · April 30, 2021, 4:20pm

If you’re not already, one method that can help is to put all the images that you’re (usually) going to load together into a single file and draw only the relevant part of the loaded texture.

The way I do this is to define a Sprite type in my engine that contains a pointer to the texture (actually, it contains another type I call Spritesheet, which in turn contains a pointer to a pointer to the texture; this way I can make it inherit from an abstract class that represents all loadable resources, including audio, text, etc; but you don’t have to formalize all that if you don’t want to), along with a top coordinate, a left coordinate, a width and a height. Then I use those four fields to control the source SDL_Rect.

Doing it this way, there’s less access time on machines with HDDs, since it’s all one file.

grhayes · July 16, 2021, 12:03am

I built my game basically into a class with a state system.
I load my assets in the load state or when I need.
I use an asset-manager I built to keep track of everything I loaded and allows me to reuse everything as I need. I used a hash table to determine if something is already loaded and simply return a pointer back to the asset. You could use the file name as a key for example.

You can also create a vector for each asset so that if it is being used on screen you have those box positions.

You don’t need to load them into one texture to get a huge performance boost.
Just draw all the textures of the same type you can in a row rather than switching back and forth from one texture to another.

gee · September 28, 2022, 3:25pm

After quite a few months, I’ve got an update on this topic. It turns out that the idea to use a fast-decompression format such as LZ4 was a very good idea (thanks a lot for that @Daniel_Gibson ). I’ve managed to substantially reduce the loading times using a custom image format based on LZ4.

I’ve made a blog post about this, which presents a detailed benchmark for loading times and memory usage with source code included (same license as SDL2).

ROSY · October 3, 2022, 9:39am

Simple compression procedures with BasicC:

void READPAKFILE(char*plik,void*Pix){
  char*pix=(char*)Pix;
  int il,g=0;
  char b;
  Open(plik,rb)
  while(!Eof){
    b=Inp;
    il=Inp+g;
    for(;g<=il;++g) pix[g]=b;
  }
  Close
}

void WRITEPAKFILE(char*plik,void*Pix,size_t s){
  char*pix=(char*)Pix;
  char b;
  int i,n=0;
  Open(plik,wb)
  b=pix[0];
  for(i=1;i<s;++i){
    If(pix[i]==b AND n<255)
      n++;
    Else
      Out(b)
      Out(n)
      n=0;
    EndIf
    b=pix[i];
  }
  Out(b)
  Out(n)
  Close
}

Daniel_Gibson · October 4, 2022, 11:39am

Nice, great work!