Texture performance for animation

lzrdkng · June 12, 2018, 1:23am

I’ve developed a module that extract FLIC files and convert it to an usable format for SDL. There’s two implementations for the converter.

First implementation

typedef struct sdl_flic_animation {
	Uint16 w;
	Uint16 h;
	Uint32 delay;
	Uint16 depth;
	Uint16 count;
	SDL_Texture *frames[];
} SDL_FlicAnimation;

SDL_FlicAnimation*
SDL_ConvertAnimation(SDL_Renderer *render,
		     FLIC_Animation *animation)
{
	SDL_FlicAnimation *retval = NULL;

	retval = malloc(sizeof(SDL_FlicAnimation) + 
                        sizeof(SDL_Texture*) * animation->framesno);

        // Copy meta data
        memcpy(retval, animation, sizeof(SDL_FlicAnimation));

	surface = SDL_CreateRGBSurfaceWithFormat(0,
						retval->w,
		                                retval->h,
						retval->depth,
			                        SDL_PIXELFORMAT_INDEX8);


	SDL_SetPaletteColors(surface->format->palette,
		             animation->colors,
		             0,
			     animation->colorno);

	int size = retval->w * retval->h;

	for (size_t i = 0; i < retval->count; ++i) {
		SDL_LockSurface(surface);
                // Skip 'i' frames.
		surface->pixels = animation->pixels + (i * size); 
		SDL_UnlockSurface(surface);

		retval->frames[i] = SDL_CreateTextureFromSurface(render,
                                                                 surface);
	}

	/*
	 * DO NOT FREE THE PIXELS.
	 */
	surface->flags |= SDL_PREALLOC;
	SDL_FreeSurface(surface);

	return retval;
}

The second implementation:

It’s almost identical, but instead of having an array of SDL_Texture, the whole animation is in one texture:

typedef struct sdl_flic_animation {
	Uint16 w;
	Uint16 h;
	Uint32 delay;
	Uint16 depth;
	Uint16 count;
	SDL_Texture *frames;
} SDL_FlicAnimation;

SDL_FlicAnimation*
SDL_ConvertAnimation(SDL_Renderer *render,
		     FLIC_Animation *animation)
{
	SDL_FlicAnimation *retval = NULL;

	retval = malloc(sizeof(SDL_FlicAnimation);

        // Copy meta data
        memcpy(retval, animation, sizeof(SDL_FlicAnimation) -
                                  sizeof(SDL_Texture*));

	surface = SDL_CreateRGBSurfaceWithFormatFrom(animation->pixels,
						retval->w,
		                                retval->h * retval->count,
						retval->depth,
                                                retval->width * sizeof(BYTE),
			                        SDL_PIXELFORMAT_INDEX8);


	SDL_SetPaletteColors(surface->format->palette,
		             animation->colors,
		             0,
			     colorno);


	retval->frames = SDL_CreateTextureFromSurface(render, surface);

	/*
	 * DO NOT FREE THE PIXELS.
	 */
	surface->flags |= SDL_PREALLOC;
	SDL_FreeSurface(surface);

	return retval;
}

Note: FLIC_Animation is an internal structure for the extracter.

Both implementations work, but the first one is a little bit faster, and I have no idea why. My question is, which implementation should I keep for performance. Will SDL be able do render the animation better with an array of SDL_Texture, or a single SDL_Texture and a SDL_Rect to move the source? I would also be interested in why the first implementation is faster.

Smiles · June 12, 2018, 7:25am

I wouldn’t worry about the time it takes to create a SDL_Texture as this is a one off or doesn’t happen very often.

It should be faster to use one SDL_Texture and draw everything using that texture first. Swapping texture will result in more draw calls, you may not see / notice this with SDL but under the hood it should be.

I would say do a simple test
Using one texture draw something 10000+ times.
Alternating between 2 textures draw something 10000+ times.

lzrdkng · June 12, 2018, 7:58pm

Using a high resolution clock, clock_gettime with the clock ID set to CLOCK_PROCESS_CPUTIME_ID, here’s the result of both implementations.

The file used is a FLC format with a 320 x 240 resolution and 51 frames. The window was set to a resolution of 640 x 320 and the renderer was set with SDL_RENDERER_ACCELERATED.

Both clock were started after the data was extracted from the file and just before the converter, in separated processes. Each iteration consist of copying the current frame to the renderer and flush it, then increment to the next frame.

Implementation 1

Finished in 9.318597 seconds with 131072 iterations.

Implementation 2

Finished in 9.155299 seconds with 131072 iterations.

Overall, the second implementation is approximately 1.77 % more faster, even though the converter is slower. I guess that SDL uses cache memory more effectively.

Blerg · June 12, 2018, 11:06pm

While I’m not even a beginner with graphics, of the many pages I’ve read about graphic performance many state that a single large texture will be faster than many small textures as there aren’t as many draw calls being made. That’s mainly only with a single mesh or like with your video though.

Quote: col000r
Each material on each mesh causes one draw-call per frame, per light that affects it and per camera that sees it.

If I have 1 mesh with 20 seperate textures, only ambient light, one camera, this will also result in 20 draw-calls. If I would combine the 20 textures into one atlas the scene would now only need a single draw-call.

If I have 20 meshes with one texture each, only ambient light, one camera, this will result in 20 draw-calls.
If I would combine all the textures into one big atlas, the scene would still require 20 draw-calls, even though all the objects would now use the same texture.

Since each draw-call requires a bit of time per call, this is probably why your large texture works faster under the hood. Google search with more information.

lzrdkng · June 13, 2018, 12:25am

I see, I’ll keep that in mind while developing.