Hello there!
I’ve just created my first “bigger” SDL game in C (still barely 10
thousands line), and i found myself enjoying the optimalization more than
actually writing the game.
When I checked for the slowest part of it, SDL_Flip turned out to be an
undoubltable winner, on 1920*1080 resolution (and no HW_accel thanks to
some really fancy features that involves low level access, like pixel-depth
collision detection) SDL_Flip takes more time than all the other things in
my program together, every frame. And its not because I did something wrong
with DisplayFormats that would cause SDL to convert from different surface
types every frame, but even a simple program that initializes with 0 bpp
(which results 32), and SDL_ANYFORMAT|SDL_ASYNCBLIT flags do 1000 SDL_Flips
in 4.5 second which means, I can maximally reach 220 fps if i do nothing
else, yet my program runs with like 130 fps with 60 sprites (200 * 200)
doing crazy stuff on high definition background. Yeah i know 130 fps is
more than enough but still, this game has Playstation 1 like graphics and
is unplayably slow on an average laptop.
So that I decided to update only the parts that have changed on the map.
The changes cover about only the half of the screen, but usually there are
100-200 of them/frame. Also i have to count the places of the characters in
the last frame, because they are not there now, so I have to update those
too. But for this high number of rects, SDL_Flip is like 20% faster than
calling SDL_UpdateRects without any preoptimalization. (The capturing of
the rects is actually pretty smooth, if i just call SDL_UpdateRect on the
biggest area that covers all changes, i get like 10% performance increase
compared to not capturing them and calling SDL_Flip(), so the problem is
defeinitely inside SDL_UpdateRects).
But I just can’t get how to optimalize the about 300-400 rectangles. I
mean, with some simple programs, I figured out, that SDL_UpdateRects slow
down linearry to both the number of rects and the size of area covered.
Actually increasing the area with about 3500-4000 pixels has the same time
penalty than adding one new frame but covering exactly the same area.
From this comes a trivial optimalization algorithm, that if
with merging two rects the covered area increases by less than 3500 pixels,
the two rects should be merged, and if the intersection of two rects has
bigger area than 3500 pixels, one of the rects should be divided in to 3
parts, and the one that intersects should be ignored (like how SDL_BlitPool
does). But the problem is that everytime I merge or divide rects there may
become a new opportunity to optimalize between the rects I have already
checked and the newly created rect. And because of this, my algorithm will
either be by far slower than a bubble sort, and just slow down the program
even more than it was, or if I don’t fully run it just for ex. go through
all the elements let’s say 5 times, it won’t give out enough good results
and it will be still slow. And i just don’t have any time to make it run in
a thread. I mean the flip call happens just right after the last Blitting
is done, and my program still have to wait for the flip thread to end when
it wants to start Blitting in the next frame.
Any idea how to use it right?
Thanks in advance!
Yours faithfully
Tam?s Csala