SDL2 performances on RaspberryPI

Hello,

I installed SDL2 and SDL_image packages and made the test below to just display a png and take some times.
The results are really disappointing.

Running in X11.

First the info are:
renderer name : opengl
flags = SDL_RENDERER_ACCELERATED SDL_RENDERER_TARGETTEXTURE
nb textures max : 5
texture size max : 8192 x 8192
current driver : x11

RenderCopy 200x200 : 2 336 201 microseconds
RenderPresent : 92 759 microseconds
RenderCopy 800x600 1 254 microseconds
RenderPresent 423 180 microseconds.

I hope there is something wrong in my test because with these results, it’s unuseable.

My code is:

#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#include <sys/time.h>

#include <SDL2/SDL.h>
#include <SDL2/SDL_image.h>

SDL_Window* window = NULL;
SDL_Renderer *renderer;
SDL_Texture* texture;

SDL_Texture *WID_loadTexture( SDL_Renderer *renderer, char *name);

uint64_t getTimeStamp() {
	struct timeval tv;
	gettimeofday(&tv,NULL);
	return tv.tv_sec * 1000000 + tv.tv_usec;
}

int main(int argc, char** argv)
{
	if (SDL_Init(SDL_INIT_VIDEO) != 0 )
	{
		fprintf(stderr,"Échec de l'initialisation de la SDL (%s)\n",SDL_GetError());
		return -1;
	}

	window = SDL_CreateWindow("test SDL2",SDL_WINDOWPOS_UNDEFINED,
		SDL_WINDOWPOS_UNDEFINED,
		800,
		600,
		SDL_WINDOW_SHOWN);

	if( window )
	{

		renderer = SDL_CreateRenderer(window,-1,SDL_RENDERER_ACCELERATED); 

		if ( renderer )
		{
			SDL_RendererInfo info;
			SDL_GetRendererInfo(renderer, &info);
			fprintf(stderr, "name : %s\n", info.name);
			fprintf(stderr, "flags : ");
			if (info.flags & SDL_RENDERER_SOFTWARE) fprintf(stderr, "SDL_RENDERER_SOFTWARE");
			if (info.flags & SDL_RENDERER_ACCELERATED) fprintf(stderr, "SDL_RENDERER_ACCELERATED ");
			if (info.flags & SDL_RENDERER_PRESENTVSYNC) fprintf(stderr, "SDL_RENDERER_PRESENTVSYNC ");
			if (info.flags & SDL_RENDERER_TARGETTEXTURE) fprintf(stderr, "SDL_RENDERER_TARGETTEXTURE ");
			fprintf(stderr, "\nnb textures max : %d\n", info.num_texture_formats);
			fprintf(stderr, "texture size max : %d x %d \n", info.max_texture_width, info.max_texture_height);
			fprintf(stderr, "current driver : %s\n", SDL_GetCurrentVideoDriver());

			if (SDL_SetRenderTarget(renderer, NULL) != 0) {
				fprintf(stderr, "SDL_SetRenderTarget error %s\n", SDL_GetError());
			}
			
			SDL_SetRenderDrawBlendMode(renderer, SDL_BLENDMODE_BLEND);
			SDL_SetRenderDrawColor(renderer,0,0,0,255);
			SDL_RenderClear(renderer);

			texture = WID_loadTexture(renderer, "800_600.png");

			SDL_Rect clip = {0,0,200,200};

			int t1, t2;

			t1 = getTimeStamp();
			SDL_RenderCopy(renderer, texture, NULL, &clip);
			t2 = getTimeStamp();
			printf("RenderCopy 200x100 : %ld\n", (t2-t1));

			t1 = getTimeStamp();
			SDL_RenderPresent(renderer);
			t2 = getTimeStamp();
			printf("RenderPresent : %ld\n", (t2 - t1));

			t1 = getTimeStamp();
			SDL_RenderCopy(renderer, texture, NULL, NULL);
			t2 = getTimeStamp();
			printf("RenderCopy 800x600 : %ld\n", (t2-t1));

			t1 = getTimeStamp();
			SDL_RenderPresent(renderer);
			t2 = getTimeStamp();
			printf("RenderPresent : %ld\n", (t2 - t1));

			bool loop = true;
			while ( loop )
			{
				SDL_Event event;
				SDL_WaitEvent( &event );
				if ( event.type == SDL_QUIT )
				{
					fprintf(stderr, "received SDL_QUIT\n");
					loop = false;
				}
			}

            if (texture != NULL) SDL_DestroyTexture(texture);

		 SDL_DestroyRenderer(renderer);
		}
		else
		{
			fprintf(stderr,"Fail to create renderer (%s)\n",SDL_GetError());
		}
		SDL_DestroyWindow(window);
	}
	else
	{
		fprintf(stderr,"Fail to create window: %s\n",SDL_GetError());
	}

	SDL_Quit();

	return 0;
}


SDL_Texture *WID_loadTexture( SDL_Renderer *renderer, char *name)
{
	SDL_Texture *texture = NULL;

	if(name == NULL || renderer == NULL)
		return NULL;

	SDL_Surface* surface = IMG_Load( name );

	if (surface == NULL)
	{
		fprintf(stderr, "error loading : %s", name);
	}
	else
	{

		texture = SDL_CreateTextureFromSurface( renderer, surface );

		SDL_FreeSurface( surface );
	}

	return texture;
}

You probably got the Mesa software renderer. As stated over here, the render drivers (surprisingly) don’t ask the backend to only select accelerated contexts.

To get hardware acceleration on the Raspberry Pi you have two options:

  1. Use the old firmware-side driver usually located in /opt/vc/lib.
    This is an older driver by Broadcom that kind of works, but doesn’t integrate well with X11. SDL has the rpi video driver for this driver. Because there’s no real integration with X11, you just get one fullscreen window. This works with or without X11.

  2. Use the new vc4 driver by Eric Anholt.
    Broadcom hired him to write an open source driver that uses open APIs (specifically: It’s a gallium driver). If you activate this driver (which deactivates the older driver), X11 itself and the x11 driver of SDL will get hardware acceleration. This only works from X11, of course. With SDL 2.0.5, there’s no support to run SDL applications from the console in this configuration. However, SDL 2.0.6 has a new KMSDRM driver which will make it possible.
    To activate this driver use the raspi-config program of the Raspbian distribution. Under “Advanced Options” -> “GL Driver” you can select the “GL (Full KMS)” option. It’s still somewhat experimental and the behavior might be slightly different compared to the older driver.

@ChliHug

Many thanks for these infos. I’ll give it a try asap.

Al

Indeed, but I don’t understand why Stretch is so much slower than Jessie when using it (with the VC4 GL driver enabled there’s no noticeable difference). On Jessie ‘glxgears -info’ reports around 180 FPS whereas on Stretch it reports about 45 FPS, a quarter of the speed!

Another oddity is that on Jessie the gears are blue, red and green but on Stretch they’re white, yellow and cyan! I suppose this could be a deliberate change but it seems suspicious. FWIW Jessie is reporting 'Mesa 13.0.0' and Stretch 'Mesa 13.0.6'.

I’ll ask about this at the Raspberry Pi forum to see if anybody has an explanation.

Richard.

Oh, wow. Something’s wrong there. Certainly doesn’t look like that on Debian 9 (x86-64). Where could have this gone wrong? Throwing the word “miscompilation” around is probably not right, but I can’t believe Debian stable has something like that in it. Then again, it is the ARM arch and other OpenGL stuff seems to display correctly.

I’ve started from scratch by booting from a virgin Stretch image and running glxgears -info without any settings or configuration changes whatsoever (not even setting up the networking or timezone). The result is the same: wrong colours and only 45 frames-per-second!

Either my RPi has a very strange hardware fault, which affects Stretch but not Jessie, or there’s something seriously wrong with Raspbian Stretch.

Richard.

Probably not hardware, just software. I get it too. My guess is bug or miscompilation of the software renderer. Or somwehere between the application and the software renderer. (glxgears uses the lighting feature of OpenGL, so maybe there.) Ugh, recompiling Mesa takes forever, though. :weary:

Ah, I thought it was just me; that’s a relief. Nobody at the Raspberry Pi forum has as yet acknowledged that it’s a problem with Stretch. I’m assuming that the incorrect colours and the low frame rate originate from the same cause, but we don’t know that yet.

My own (SDL) programs that use OpenGL lighting are rendering correctly however, albeit ridiculously slowly.

Richard.

It looks like there’s a problem with LLVM or maybe how Mesa uses it. You can work around this issue by using the softpipe gallium driver by setting the environment variable GALLIUM_DRIVER=softpipe. This is of course slower than using JIT code generation, but at least it’s correct. If softpipe is instructed to use LLVM, it also shows issues. The two software renderers might share some code there.

hum well…
My first conclusion is that I should make my tests on Raspbian Jessie to have chance to run my app on a working configuration.

Thanks all for your help I’ll stay tuned on this thread and I’ll report my results to see with you if you could validate it, at least in case of major problems
Al.

Interesting. But it’s the speed that matters to me (the colours are correct in all my own programs) and setting that variable reduces it even more - from 45 fps to 27 fps! I want to get back to something more like the 180 fps that glxgears runs at in Raspbian Jessie, with the Mesa driver.

Edit: I see that llvmpipe is documented as requiring “An x86 or amd64 processor … Support for SSE2 is strongly encouraged”. Do you think it’s possible that it’s been built for Stretch without Neon support, by mistake?

Richard.

So I made my test on a Raspbian Jessie, I’ve installed SDL2 with apt-get, and the results are not much better:

RenderCopy with src Rect 200x200 dst Rect 200x200 (from & 800x600 jpeg): 2 424 008 microseconds
RenderPresent 88 866 microseconds
RenderCopy 800x600 839 microseconds
RenderPresent : 137 983 microseconds.

So a faster than on Stretch for the second RenderCopy, but not for the first one.

So I tried to enable Full KMS mode.
When the application starts, I get the following errors:
libGl error: MESA-LOADER failed to retrieve device information
MESA-LOADER: failed to retrieve device information

Then results are here much better:
RenderCopy 200x200 : 21 594 microseconds
RenderPresent : 597 microseconds
RenderCopy 800x600 : 713 microseconds
RenderPresent : 455 microseconds

Anyway, I’m still surprised by the RenderCopy 200x200 time, which will cause problems in my app everytime I’ll move textures I will have to make clipped RenderCopy which seems to be really slow.

Do you have an idea for that ?

Al

That’s normal (by which I mean that I get the same warnings and they don’t seem to indicate anything serious).

I can’t help with that I’m afraid. The VC4 GL Driver isn’t ‘real’ OpenGL, some things are hardware accelerated and others are software emulated. If it’s not fast enough you may have to abandon X11, but that would mean rebuilding SDL2 which is where we came in!

Richard.

@rtrussell
OK, I’m going further in my tests to see if the application could work properly.

thanks a lot.
Al

I’m sorry, I didn’t look closely enough at your code example.

You only take one sample and that’s clearly not enough to form a reasonable conclusion about the renderer performance. When benchmarking, it’s very important to know what you are testing and what exactly uses up the time. It gets even more complicated if things get cached and queued, as is the case with many graphics drivers. All the draw operations might take a few microseconds and the graphics driver does the real work asynchronously. In the first draw operation, the driver may also set up the pipeline which takes some time. This only happens once and if you only time that, it will look really slow.

Here’s an extended version of your code. I’d still say that it doesn’t produce an accurate picture of renderer performance, but the time of the RenderCopy calls should now be in the expected range.

renderperf.c
#include <SDL.h>

static SDL_Window* window = NULL;
static SDL_Renderer *renderer;
static SDL_Texture* texture;

static SDL_Texture *WID_loadTexture( SDL_Renderer *renderer, char *name)
{
	SDL_Texture *texture = NULL;

	if(name == NULL || renderer == NULL)
		return NULL;

	SDL_Surface* surface = SDL_LoadBMP(name);

	if (surface == NULL)
	{
		SDL_Log("error loading : %s", name);
	}
	else
	{
		texture = SDL_CreateTextureFromSurface( renderer, surface );
		SDL_FreeSurface( surface );
	}

	return texture;
}

static Uint64 getTimeStamp()
{
	return SDL_GetPerformanceCounter();
}

static double getTimeStampDiff(Uint64 d1, Uint64 d2)
{
	return (double)(d2 - d1) / (double)SDL_GetPerformanceFrequency();
}

int main(int argc, char** argv)
{
	if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER | SDL_INIT_EVENTS) != 0 )
	{
		SDL_Log("Failed to initialize SDL (%s)",SDL_GetError());
		return -1;
	}

	window = SDL_CreateWindow("test SDL2",SDL_WINDOWPOS_UNDEFINED,
		SDL_WINDOWPOS_UNDEFINED,
		800,
		600,
		SDL_WINDOW_SHOWN);

	if( window )
	{

		renderer = SDL_CreateRenderer(window,-1,SDL_RENDERER_ACCELERATED);

		if ( renderer )
		{
			SDL_RendererInfo info;
			SDL_GetRendererInfo(renderer, &info);
			SDL_Log("name : %s", info.name);
			SDL_Log("flags :%s%s%s%s",
				(info.flags & SDL_RENDERER_SOFTWARE) ? " SDL_RENDERER_SOFTWARE" : "",
				(info.flags & SDL_RENDERER_ACCELERATED) ? " SDL_RENDERER_ACCELERATED" : "",
				(info.flags & SDL_RENDERER_PRESENTVSYNC) ? " SDL_RENDERER_PRESENTVSYNC" :"",
				(info.flags & SDL_RENDERER_TARGETTEXTURE) ? " SDL_RENDERER_TARGETTEXTURE" : "");
			SDL_Log("num textures formats : %d", info.num_texture_formats);
			SDL_Log("texture size max : %d x %d ", info.max_texture_width, info.max_texture_height);
			SDL_Log("current driver : %s", SDL_GetCurrentVideoDriver());

			if (SDL_SetRenderTarget(renderer, NULL) != 0) {
				SDL_Log("SDL_SetRenderTarget error %s", SDL_GetError());
			}

			SDL_SetRenderDrawBlendMode(renderer, SDL_BLENDMODE_BLEND);
			SDL_SetRenderDrawColor(renderer,0,0,0,255);
			SDL_RenderClear(renderer);

			texture = WID_loadTexture(renderer, "800_600.bmp");

			SDL_Rect clip = {0,0,200,200};

			#define NUM_SAMPLES 20
			Uint64 time_cl1[NUM_SAMPLES], time_rc1[NUM_SAMPLES], time_rp1[NUM_SAMPLES];
			Uint64 time_cl2[NUM_SAMPLES], time_rc2[NUM_SAMPLES], time_rp2[NUM_SAMPLES];
			int i;

			for (i = 0; i < NUM_SAMPLES; i += 2) {
				time_cl1[i] = getTimeStamp();
				SDL_RenderClear(renderer);
				time_cl1[i + 1] = getTimeStamp();

				time_rc1[i] = getTimeStamp();
				SDL_RenderCopy(renderer, texture, NULL, &clip);
				time_rc1[i + 1] = getTimeStamp();

				time_rp1[i] = getTimeStamp();
				SDL_RenderPresent(renderer);
				time_rp1[i + 1] = getTimeStamp();
			}

			SDL_Log("  RenderClear    RenderCopy 200x200     RenderPresent   (Seconds)");
			for (i = 0; i < NUM_SAMPLES; i += 2) {
				double cl = getTimeStampDiff(time_cl1[i], time_cl1[i + 1]);
				double rc = getTimeStampDiff(time_rc1[i], time_rc1[i + 1]);
				double rp = getTimeStampDiff(time_rp1[i], time_rp1[i + 1]);
				SDL_Log("    %.6f           %.6f                %.6f", cl, rc, rp);
			}

			for (i = 0; i < NUM_SAMPLES; i += 2) {
				time_cl2[i] = getTimeStamp();
				SDL_RenderClear(renderer);
				time_cl2[i + 1] = getTimeStamp();

				time_rc2[i] = getTimeStamp();
				SDL_RenderCopy(renderer, texture, NULL, NULL);
				time_rc2[i + 1] = getTimeStamp();

				time_rp2[i] = getTimeStamp();
				SDL_RenderPresent(renderer);
				time_rp2[i + 1] = getTimeStamp();
			}

			SDL_Log("  RenderClear    RenderCopy 200x200     RenderPresent   (Seconds)");
			for (i = 0; i < NUM_SAMPLES; i += 2) {
				double cl = getTimeStampDiff(time_cl2[i], time_cl2[i + 1]);
				double rc = getTimeStampDiff(time_rc2[i], time_rc2[i + 1]);
				double rp = getTimeStampDiff(time_rp2[i], time_rp2[i + 1]);
				SDL_Log("    %.6f           %.6f                %.6f", cl, rc, rp);
			}

			int loop = 1;
			while ( loop )
			{
				SDL_Event event;
				SDL_WaitEvent( &event );
				if ( event.type == SDL_QUIT )
				{
					SDL_Log("received SDL_QUIT");
					loop = 0;
				} else if (event.type == SDL_KEYUP && event.key.keysym.sym == SDLK_ESCAPE) {
					loop = 0;
				}
			}

            if (texture != NULL) SDL_DestroyTexture(texture);

			SDL_DestroyRenderer(renderer);
		}
		else
		{
			SDL_Log("Fail to create renderer (%s)",SDL_GetError());
		}
		SDL_DestroyWindow(window);
	}
	else
	{
		SDL_Log("Fail to create window: %s",SDL_GetError());
	}

	SDL_Quit();

	return 0;
}

That documentation seems outdated. I mean, LLVM handles the translation to native code. Mesa just passes LLVM IR. And it works… just not entirely.

It doesn’t look like Mesa 13.0.6 has any Neon code. Not even vc4 had it at that point.

I’m guessing there’s a bug in LLVM. I tried the tests from LLVM 4.0.1 and there were 10 unexpected failures. Looks like ARM doesn’t get much love.

I’m still scratching my head over what kind of bug could cause the colours to be wrong sometimes (in glxgears, but not in my own programs - even those that use lighting) and, more importantly, to reduce the speed by a factor-of-four compared with Jessie!

So far, nobody ‘in authority’ at the Raspberry Pi forum has acknowledged that this is even a problem worthy of their attention. :frowning:

Richard.

@ChliHug,
That’s interesting, I effectively saw yesterday that if I make 2 RenderCopy() the first of 100x100 and the second 200x200 of the same image, the second was really faster.
You answer give me info on what causes it but does it mean that everytime I will make a RenderCopy of a texture the first one will be slow and the next faster?
What happens if I make few RenderCopy of texture1, then faw RenderCopy of texture2 and back few RenderCopy of texture1?

Unfortunately, I have to switch to another subject right now but I hope I could go farther on this soon.

Anyway, thanks a lot for your help.
Al

No, I don’t think you should see such a long delay for every new texture. It’s just the very first draw operation of the context that probably sets up some stuff that will be used for its whole lifetime. There may be exceptions if some significant change happens, but I don’t know anything about graphics driver internals to give you specifics. It’s not something you should have to worry about with the SDL renderers.

@ChliHug
Ok, I’m going to make these kind of tests. Maybe it could differ depending on the video driver…*
Al

Finally an ‘official’ response at https://github.com/RPi-Distro/repo/issues/79

Personally I don’t think it’s very satisfactory. Although it may well be true that “the change of behaviour comes from upstream Debian packages” if it affects only the armhf build and not (x86) Debian stable, doesn’t that mean the Raspberry Pi people should take some responsibility?

Richard.