Inconsistent frame rates when using Textures

I have a project that I’ve converted from SDL 1.2 to 2.0. The project is a simple tiling engine. It loads a tileset as a single graphic, loads a map into a 2d array, and draws the screen based on that data.

In 1.2 I used surfaces. In 2.0 I’m trying to learn how to use textures. I’m clearly not doing it right.

The rendering loop basically looks like this:

Code:
SDL_Rect src, dest;
for (int y = 0; y < screentilesdown; y++) {
for (int x = 0; x < screentilesacross; x++) {
int tilenumber = map[y + yoffset][x + xoffset];
src.x = (tilenumber % texturetilesacross) * tilewidth;
src.y = (int)(tilenumber / texturetilesacross) * tileheight;
src.w = tilewidth;
src.h = tileheight;
dest.x = x * tilewidth;
dest.y = y * tileheight;
dest.w = src.w;
dest.h = src.h;
SDL_RenderCopy(target, tileset, &src, &dest);

    if (ShowGrid) SDL_RenderCopy(target, grid, NULL, &dest);
}

}

static Uint32 framecount = 0;
Uint32 start = SDL_GetTicks();
SDL_RenderPresent(target);
fprintf(debugfile, “Frame %d Grid %s RenderPresent %d\n”, framecount++, (ShowGrid ? “ON” : “OFF”), SDL_GetTicks() - start);

“tileset” is a texture created from a surface that was loaded from a BMP file. It’s a single image that contains all of the tiles for the scene. “grid” is a single 16x16 texture that is entirely transparent except for a black line on top and a black line on the left. “map” is a 2d int array that contains the tile number to draw for each location. “ShowGrid” is toggled at runtime by pressing G on the keyboard. When true it effectively overlays a grid on the scene so you can clearly see the tile boundaries.

The issue I’m trying to solve is why I get inconsistent frame rates. I’ve put performance counters into my code and the whole thing takes less than 1ms except for SDL_RenderPresent(). This is where the inconsistencies appear.

My test runs involve a tileset with 16x16 pixel tiles being drawn onto a 1920x1080 screen (the render target). This results in a scene with 120 tiles across and 67 tiles down which is a total of 8040 tiles being drawn per update. If the grid is on then all 8040 tiles get another (mostly transparent) tile drawn over them. The grid is OFF to start.

My various tests show a huge frame rate inconsistency. The first dozen frames are all over the place, but then they smooth out. This smooth section could be good (15ms) or bad (130ms). However, after awhile, the frame rate will change, either up or down, and stay there for awhile before changing again. I’ve seen stretches of 300+ms.

In some of my tests, turning the grid ON will actually improve the frame rate dramatically. In other tests it reduces it.

Strangely enough, if I turn the grid ON during the test and then subsequently turn it back off, the frame time usually smooth out to around 15-17ms each and stays there for as long as I leave the program running. I have VSYNC turned on, so that’s where I would expect it to be.

And then, every once in awhile, my test will exhibit none of these patterns. Turning the grid ON or OFF will change the frame rate but not consistently; sometimes up, sometimes down.

All of these tests were run sequentially with no code changes whatsoever. Just shut the program down, examine the debug output, and then run it again.

I think it is clear to me that I don’t fully understand what is happening behind the scenes when I call SDL_RenderCopy() and SDL_RenderPresent(). Does anyone have any ideas on how I can figure out where my seemingly random bottlenecks are?

I think it is clear to me that I don’t fully understand what is happening
behind the scenes when I call SDL_RenderCopy() and SDL_RenderPresent().
Does anyone have any ideas on how I can figure out where my seemingly
random bottlenecks are?

Your loop seems correct.

I’d change the fprintf() call from a per frame based call to a per sec
call… maybe your bottleneck is the disk access?

Try to benchmark with something like this for instance that writes every
second to the standard output:

uint32_t now = SDL_GetTicks();
++framecount;
if ((now - last) >= 1000) {
printf("%f fps\n", framecount);
framecount = 0;
last = now;
}

You are also omitting the hw/sw platform you are using, I seem to remember
there was for instance a certain class of intel gfx chips that used to have
problem with drawing a lot of quads with certain versions of the intel
linux drivers.

You are pushing 16000 quads per frame so this could be an issue.–
Bye,
Gabry

I know fprintf() is a slow function (relatively speaking) but that doesn’t seem to have any effect. I’ve wrapped various pieces of the code in SDL_GetTicks() and only the actual SDL_RenderPresent() function call takes any measurable time at all. In my sample code I did put the second call to SDL_GetTicks() inside the fprintf() function rather than immediately after the graphics call so I will pull it out just to be sure but I’m pretty sure it’s a non-issue, particularly since I see similar results if I turn off the logging entirely.

As for HW/SW, you’re absolutely right; I should have included that. My dev system is an ASUS Republic of Gamers laptop: Core i7 2.3GHz, 8GB RAM, nVidia Geforce GTX 660M for graphics. OS is Windows 7 Ultimate 64-bit, dev environment is Visual Studio 2010 Premium.

One thing I haven’t been able to determine is how the video RAM works. The nVidia control panel indicates that it has 2GB of dedicated GDDR5 video memory and 2GB shared system memory for 4GB total available graphics memory. I can’t tell if/when it starts borrowing system memory to use for the GPU, though. Could that be the random bottleneck?

You might want to doublecheck that you’re actually using the 660m and not
the integrated/hybrid video device. A lot of laptops in that type of
configuration have the Intel 4400 or whichever that’s built into the chip,
and the 660m is only used for games. If that’s the case for you, you can
get into the Nvidia control panel to specify which device to use on a
per-application basis (usually the integrated is default for unknown apps).
I have an MSI that’s set up this way.

I’ve seen games in this type of setup show the wrong device name in both
directions, though (shows hybrid when using discrete, and shows discrete
when using hybrid), so don’t automatically trust the renderer string. My
MSI has an indicator light that shows which device is being used, that’s
always been right in my case.On Thu, May 8, 2014 at 2:31 PM, chpicker wrote:

I know fprintf() is a slow function (relatively speaking) but that
doesn’t seem to have any effect. I’ve wrapped various pieces of the code in
SDL_GetTicks() and only the actual SDL_RenderPresent() function call takes
any measurable time at all. In my sample code I did put the second call to
SDL_GetTicks() inside the fprintf() function rather than immediately after
the graphics call so I will pull it out just to be sure but I’m pretty sure
it’s a non-issue, particularly since I see similar results if I turn off
the logging entirely.

As for HW/SW, you’re absolutely right; I should have included that. My dev
system is an ASUS Republic of Gamers laptop: Core i7 2.3GHz, 8GB RAM,
nVidia Geforce GTX 660M for graphics. OS is Windows 7 Ultimate 64-bit, dev
environment is Visual Studio 2010 Premium.

One thing I haven’t been able to determine is how the video RAM works. The
nVidia control panel indicates that it has 2GB of dedicated GDDR5 video
memory and 2GB shared system memory for 4GB total available graphics
memory. I can’t tell if/when it starts borrowing system memory to use for
the GPU, though. Could that be the random bottleneck?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org