Memory usage varies depending on how "fast" I load images

Yes, I’m sorry if the title is confusing, but it’s the best way I can quickly explain.

Now for the whole explanation: I was monitoring the memory usage of my game to see where things needed to be improved. I noticed that there was a huge peak when loading data. I was expecting small peaks one after the other: for each image, I load a SDL_Surface, turn it into a SDL_Texture and free the surface, so for each image, there is a short period of time with SDL_Surface using a bit of memory. But I did not expect ONE big peak: it basically seemed like all SDL_Surface objects are freed at the end at the same time.

Okay, so I figured I’d put a SDL_Delay(1000) between each image loading, just to have a better chance of understanding what was happening. This is where the weird part happens… Not only did it not peak that high, but after the full level was loaded, the memory usage (which remains constant) was surpringly lower!

Basically, if I load all my images “normally” without waiting:

  • Peak when loading: 750 MB
  • Constant memory usage when playing: 450 MB

And if I put a 1s delay everytime I load an image:

  • Peak when loading: 320 MB
  • Constant memory usage when playing: 150 MB

Now I’m very confident there is no memory leak (I’ve used the address sanitizer quite intensively and let the game randomly run for days without any memory leak), and I must also add here that I’m talking about resident memory (the virtual memory remains at the same level, 1.7 GB, in any case, which is reassuring in a way).

My best guess is that it’s the memory caching which behaves differently depending on how fast I load data. But honestly it’s still a bit of a mystery to me.

Another note: while I’m loading, I’m sometimes rendering a frame (to display a loading spinwheel). If I deactivate this, then I get the 150 MB memory usage, just like if I had put the 1s delay. So I’m also suspecting something happens with the renderer, but it does not really make any sense to me.

What I’d really like to understand is:

  • could it be something related to the way SDL handles memory, or is it related to a lower level part? (caching / operating system memory management for example)

  • is there a way to reduce memory usage without delaying the image load? (which I obviously don’t want to do as it makes loading level take ages) I mean, it’s not a small gap, the delayed version is literally 3 times faster, it could make a lot of difference on bigger levels played on small devices (mobile phones, etc.)

  • I still haven’t quite figured out why the memory “peaks” when loading, but that might be an unrelated problem. I’ll keep investigating

Thanks a lot, and sorry if I’m not making myself clear.

Alright, so I managed to (sort of) reproduce this weird behavior with a self-contained SDL program. The full source code is given below. It’s using Linux syscalls for memory estimation so it’s not compatible with other platforms, but that’s for the sake of the example.

What this code is doing:

  • first, fill an array of SDL_Texture* with rectangles, and display (or not) a “loading” screen
  • second, just display one green rectangle repeatedly for a certain time

While doing the first part (“loading”), you can choose to either:

  • just create the rectangles, do nothing else
  • display a (red) rectangle to indicate that loading is ongoing
  • display the rectangle AND wait for 1s between each rectangle creation

By tracking the resident memory usage, I can compare the three options and I get the following curves:

Weirdly, I find the exact opposite behavior of my original bug (= adding a delay makes the memory usage way higher after the loading period). It seems that not displaying anything while loading is better for memory usage afterwards.

Note that the curves do not “drop” exactly when loading is done, as loading stops at 200 iterations. So I’m really suspecting some cache operations going on there.

The full source code is below, you can compile it with:

$ gcc sdl_mem.c `pkg-config --cflags --libs glib-2.0 sdl2`

Run it with argument 0 (or no argument) to do nothing, with 1 to add the red "loading screen"and with2` to add delay.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <glib.h>

#include <SDL.h>

enum Strategy { NOTHING = 0, LOADING_IMAGE = 1, DELAY = 2 };
void init(SDL_Window** window, SDL_Renderer** renderer, GArray** images);
void finish(SDL_Window* window, SDL_Renderer* renderer, GArray* images);
void append_rectangle (SDL_Renderer* renderer, GArray* images, Uint8 r, Uint8 g, Uint8 b);
void render_start (SDL_Renderer* renderer);
void render_end (SDL_Renderer* renderer);
void display (SDL_Renderer* renderer, SDL_Texture* texture);
int exit_requested();
size_t mem_usage();

int main (int argc, char** argv)
  enum Strategy strategy = 0;
  int nb_images = 200;

  if (argc > 1)
    strategy = atoi(argv[1]);
  if (argc > 2)
    nb_images = atoi(argv[2]);
  SDL_Window* window = NULL;
  SDL_Renderer* renderer = NULL;
  GArray* images = NULL;
  init (&window, &renderer, &images);

  append_rectangle (renderer, images, 255, 0, 0);
  append_rectangle (renderer, images, 0, 255, 0);

  Uint32 latest = SDL_GetTicks();

  int iteration = 0;
  while (images->len < nb_images)
    if (exit_requested())
    printf("Loading %i %li\n", iteration, mem_usage());
    iteration ++;
    append_rectangle (renderer, images, 0, 0, 255);

    if (strategy != NOTHING)
      if (strategy == DELAY)
      Uint32 ticks = SDL_GetTicks();
      if (ticks - latest > 15)
        display(renderer, g_array_index (images, SDL_Texture*, 0));
        latest = ticks;

  while (iteration < 5 * nb_images)
    iteration ++;
    printf("Running %i %li\n", iteration, mem_usage());

    if (exit_requested())
    Uint32 ticks = SDL_GetTicks();
    if (ticks - latest < 30)
      SDL_Delay(30 - (ticks - latest));
    display(renderer, g_array_index (images, SDL_Texture*, 1));
    latest = ticks;

  finish (window, renderer, images);
  return EXIT_SUCCESS;

 * Functions

void init (SDL_Window** window, SDL_Renderer** renderer, GArray** images)
                             800, 600, SDL_WINDOW_RESIZABLE);
  *renderer = SDL_CreateRenderer (*window, -1, 0);
  *images = g_array_new (FALSE, FALSE, sizeof(SDL_Texture*));

void finish (SDL_Window* window, SDL_Renderer* renderer, GArray* images)
  int i = 0;
  for (; i < images->len; i ++)
    SDL_DestroyTexture(g_array_index (images, SDL_Texture*, i));
  g_array_free(images, TRUE);

void render_start (SDL_Renderer* renderer)
  SDL_SetRenderDrawColor (renderer, 0, 0, 0, 255);
  SDL_RenderClear (renderer);

void display (SDL_Renderer* renderer, SDL_Texture* texture)
  SDL_Rect source;
  source.x = 0;
  source.y = 0;
  source.w = 600;
  source.h = 400;

  SDL_Rect target;
  target.x = 0;
  target.y = 0;
  target.w = 600;
  target.h = 400;

  SDL_RenderCopy(renderer, texture, &source, &target);

void render_end (SDL_Renderer* renderer)
  SDL_RenderPresent (renderer);

int exit_requested()
  SDL_Event ev;
  int broken = FALSE;
  while (SDL_PollEvent(&ev))
    if (ev.type == SDL_QUIT || (ev.type == SDL_KEYDOWN && ev.key.keysym.sym == SDLK_ESCAPE))
      return TRUE;
  return FALSE;

void append_rectangle (SDL_Renderer* renderer, GArray* images, Uint8 r, Uint8 g, Uint8 b)
  Uint32 rmask, gmask, bmask, amask;
  rmask = 0xff000000;
  gmask = 0x00ff0000;
  bmask = 0x0000ff00;
  amask = 0x000000ff;
  rmask = 0x000000ff;
  gmask = 0x0000ff00;
  bmask = 0x00ff0000;
  amask = 0xff000000;

  SDL_Surface* surface = SDL_CreateRGBSurface (0, 600, 400, 32, rmask, gmask, bmask, amask);
  SDL_FillRect(surface, NULL, SDL_MapRGBA(surface->format, r, g, b, 255));
  SDL_Texture* texture = SDL_CreateTextureFromSurface (renderer, surface);
  g_array_append_val (images, texture);
  SDL_FreeSurface (surface);

size_t mem_usage()
  FILE* file = fopen("/proc/self/stat", "r");
  size_t charlen = 256;
  char pid[charlen], comm[charlen], state[charlen], ppid[charlen], pgrp[charlen], session[charlen], tty_nr[charlen],
      tpgid[charlen], flags[charlen], minflt[charlen], cminflt[charlen], majflt[charlen], cmajflt[charlen],
      utime[charlen], stime[charlen], cutime[charlen], cstime[charlen], priority[charlen], nice[charlen],
    O[charlen], itrealvalue[charlen], starttime[charlen], vsize[charlen];
  long rss;
  fscanf(file, "%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %li",
         pid, comm, state, ppid, pgrp, session, tty_nr, tpgid,
         flags, minflt, cminflt, majflt, cmajflt,
         utime, stime, cutime, cstime, priority, nice, O,
         itrealvalue, starttime, vsize, &rss);
  return rss * sysconf(_SC_PAGE_SIZE) / (1024 * 1024);