Do factors like RAM speeds, Video RAM speeds, storage drive speeds, and L3 cache sizes affect the speeds of software renderers?


My software renderer achieves over 60 frames-per-second by filling every pixel of the 1920x1080 resolution.

I know I could achieve a faster speed with a better processor, but I want to know whether other hardware products affect the speed, too.

I created an array of pixels with “malloc()”. During each frame, I clear the array by setting all of its items to zero with “memset()”, and then I fill the array with my images again.

I use these functions too:

Would faster RAM affect the speed?

Does the bandwidth of the RAM of graphics cards affect the speed? (Even though software renderers use CPUs, SDL still sends the images to the RAM of GPUs.)

Would I notice a speed difference with a Solid-State Drive?

Does the size of the L3 cache matter? Normal quad-core processors have 6 MB for their L3 caches, so 1.5 MB per core. Better quad-core processors have 8 MB, plus 4 extra threads. If the four extra threads are disabled, those quad-core processors will split the 8 MB among 4 cores, so each core would have access to 2 MB of the L3 cache instead of 1.5 MB. Would that extra 0.5 MB affect the speed of software renderers?

Are those factors so unimportant that they won’t provide a perceptible speed improvement, and I should care about ONLY a CPU upgrade?

I don’t need more than 1080p60. I’m asking these questions just for curiosity.


The short answer is YES of course they do. The long answer is CPU and bus speed eclipse them so their effect is usually negligible.

All of my games use my own 2D engine and I blit to the screen the same way you do. The biggest bottleneck is CPU speed and the bus speed to the graphics adapter. Memory speed will play a role but it’s so fast already and the difference between 3000 and 3200 is a few percent so it’s hard to notice. As with most things it very much depends on what you are doing in your software rendering. If you are doing a bunch of math the CPU will matter the most, if it’s mostly simple copying then memory and bus speed will matter the most. Cache amount and speed only come into play if you are blowing the cache, if for example you’re copying 1.1MB chunks but only have 1MB of cache, you could get huge performance gains by optimizing it somehow to fit under the 1MB of cache. SSD will not matter unless you’re doing a ton of data streaming from storage.

A huge performance gain can be had with pixel format, RGBA BGRA 8888 or 565, if you don’t care so much about color, and your engine can easily be converted to use 565 you will immediately gain nearly 2x performance by using 565 instead of 8888. This is because USUALLY the bottleneck is the texture transfer, and you will transfer 1/2 the data, so it will be twice as fast. Even if the video card has to covert it to 8888 internally (because it is optimized to do so and blazingly fast), I’ve never seen an instance where 565 was not nearly twice as fast as using 8888.

I’ve also seen instances where glDrawPixels is MUCH faster than updatetexture, that may be worth checking out.


I tried SDL_PIXELFORMAT_RGB565 instead of SDL_PIXELFORMAT_ARGB8888, like this:

SDL_Texture* texture = NULL;


Then I reduced the size of data used by each pixel to “Uint16” instead of “Uint32”, and I reduced the size of each color captured by “SDL_MapRGB()” to Uint16. (Unless I misinterpreted the page below, I should be able to use “Uint16 SDL_MapRGB()” instead of “Uint32 SDL_MapRGB()”):

Those changes caused my images to appear positioned and proportioned correctly, just with worse colors, but the result is that the speed of the program decreased, so I reverted back to “SDL_PIXELFORMAT_ARGB8888” and switched my pixels and colors back to “Uint32” because I became saddened by the reduced speed.

I stopped trying that idea.

Later, I read this page:

It says this:

"This is a fairly slow function, intended for use with static textures that do not change often.

If the texture is intended to be updated often, it is preferred to create the texture as streaming and use the locking functions referenced below."

So instead of using “SDL_UpdateTexture()”, I tried the “SDL_LockTexture()” and “SDL_UnlockTexture()” functions. The result is that the speed of the program increased by around 7 percent, but closing it caused a message from the debugger that suggested that using “free()” on my pixels array caused a crash. When I removed the “free(pixels)” line, the debugger didn’t detect a problem. Why would the “SDL_LockTexture()” and “SDL_UnlockTexture()” functions ruin the program’s ability to free the memory of my array?

Edit 1:
I decided to place my array of pixels on the stack instead of on the heap, so I won’t need to free the memory. Now I can enjoy the 7% speed improvement, and I will assume that “SDL_LockTexture()” is not designed to handle a pixels array that’s on the heap.