UpdateRects TURBO!

“Pete Shinners” wrote

i believe the flaw comes from the painful method of searching
through all the rectangles comparing to all the other rectangles.
this gives me exponential slowdown for each rectangle in the list.

well, it turns out the flaw was some sloppy loop handling on my
part. i’ve fixed the code and am amazed at the results.

all these tests are done with the aliens demo with the
SDL_Delay call taken out so the code runs as fast as possible.
(i also made the seed a constant so each run would be the same)

Lets just start by saying the code with my ‘new’ updaterects
function runs anywhere between 130% to 180% the speed of the
original updaterects. in the most complex scene the framerate
went from 57fps to 102fps. here’s my numbers…

aliens with SDL_UpdateRects runs at 296fps with one alien and
57fps with MAX_ALIENS. SDL_UpdateRects is about 73% of the total
runtime.

aliens with my UpdateRectsOpto runs at 384fps with one alien
and 102fps with MAX_ALIENS. UpdateRectsOpto is about 60% of the
total runtime.

i attained these results with the DirectX driver. initial dabbling
with the GDI driver seems to show similar results.

i’m releasing the code here so people can try it out, report
the results, maybe even tune it further? if all goes well i’d
like to see if this could get some form of this merged with SDL.

basically the code rearranges all the rects so that no pixel is
updated twice. it also prunes and carefully places new rectangles
so that the minimum number of rectangles is used.

to try it out, include the source file with your project and
add the declaration for the new function in your file.

extern void UpdateRectsOpto(SDL_Surface* screen, int rectcount, SDL_Rect* rectlist);

then, just replace
SDL_UpdateRects(screen, numrects, rectlist);
with
UpdateRectsOpto(screen, numrects, rectlist);

i’d like to hear what happens :]

begin 666 rectsopto.zip
M4$L#!!0(`&BS(2E,P*W2^00``+@:```+<F5C=’-O<‘1O+F/M66UO
MVS80_NP^0^W#BADQW&2#=B7V,'0=2L&!$N1ET]#42 at 2;5&124&BH[A#_OON MJ!=3$BT[BY/TPU#4M8YW#^^Y-\KLT0#^%%QQ-P+%4L7%#*8R1=BJ9@@>;2$
MN4P8R.F4>B\OP<#6,2^JUC"/)6""EP%GEQ$/MPRX,+%C[S(>,J@*N/YR,R MT$:?F6)P%7 A6)(".#$^_YH6SR.9S/I:[8K%"DY2-82?CH^/272TOT=_?BRP M89PJ/^*WH^"L(>2R(5-\SK3($+Y#I[[><Y^A\KL<&5=]-N6"P?D?UP[1ZH.3
M_SMZZ
]6+S^U5N$BJ^9H7A]\;FAN#16/UPT898KF,!0O/AP]=O-Y>\?4>^& M"W7R2__P!)>/\GA>!SR%1"X4J7H1<T6*B<'<13Q5F"\@0%?,(I9"*C%/F#!P M\:^0VE[>LR1RXYB2CF(W'8(;D6+"T4))N&,L)K,5$.59*(ADIA&TNA?Q.#7W M0LMTD4Q=C^DUC>_$;IK"7S?GY[3,9P)+JE_D]EYR'VYT15U215W$2CJ4I*L< M90"IES FAEA<2F^DO1A2=7TEBX$6$FN$_&=_KU<NP""3R1TMG**4K.E96Y<" M/H3PE"J@=S2 >)'CI[23!+E(H 3(ZSF1<DX$9HG,M.^]:GT"3N5/?XY](STG MY=^8G%;R/@Q6[N/WG_ODQ9S-O7CIE$##BLP0FO:#REI;5F1P\^.*A$#7,@:^ M+%*'K4V*0[C%[W>4#$Q\F?><K2M\P(+S21)PWV<B)X>CP.$:'#B,C>#!P0'' M6/<HV#T^K9S_FW\993"95)5+2KV>)W&4B 4CIS5HB* <:_[D%,(F<*AM-#+1 M\>7B-F)>P+R[?#2))0B6E7X:=:==KKL36MVI^T,&9=L7'/IP-H&:+$39^_?5 MCTQI.Z±?+=^GURC%@PM9D!6PY$$S8F at SU<M?SL/0::[XG=KX/5I9AA:6
MW,(RM+#DZUF&%I;<PC
TL.3K6(96EKJ>XHF22,>I>:S$AE3>^LK5=XT<# M\SCT:'#J5I$)_X:A)FE1?.0!<:\C0TUD)+WN:%U4[:\=2..(*V/'<K]5D*HF M^H*GT at 0,G-'#:8=RUE#.NI27#>5EEW) 0[$5C4,XH0\KRL'!:A0T@<,<L!DC MA'):H<3)TK<!+)LA7A^[@^A^/)=DD:^L96<P"B8>HDC$P?;:.7P]CBQ*
M\67-5JQ;$AVWW6P1O95X8’9R=52’[;2>#1&6>E[J]7/MNOT<5VTN=&Y@#D>
M-Q’KZO(G5,C&/L=]%??,N;
;;JP%OTOYR1.DE8M#N_V&#B>H9G*H^ELY[*I^
MF
)K=7CL)OFOJ;4]SFV#,;0,1MZL_I=NVM+WSK;EEA’%+2,J7)<D;AE1H25)
M&WI]Y>Z%!“QJ=T??2G&7T at ZFSXW=PKI<QK[:V1KS587F]-(.R9;7L9E+E
M([[#[UVT\-6[?QLIITIPE;6+#V9"):\T8O7\UX+#
-KP<O/J”?F]%=E&X^
MX-X^J4^O[M>EDS<[RJ9FP=YRX<U at WR[US8<Y F?!6\UR5=[KQWE+6[-F’<=
MGJV8VP[/UQG^753#M>^%S8/LE6=Y[O3_PWSW"=K=:/Y.<K3V\J))]#-^1=(
MCOZDC?&;7#^'S;597[ZK/O='YH7JG1!O;K =\KK>M/JH?ZXK#]F]<= L\HY
M3!&KVI_D6OPO4$L!A0% ````@:+,A*4S K=+Y! ``N!H```L````````` K0@+:!')E8W1S;W!T;RYC4$L%!@!``$.0```"(%````````
end

i’m releasing the code here so people can try it out, report
the results, maybe even tune it further? if all goes well i’d
like to see if this could get some form of this merged with SDL.

I don’t think SDL should include anything like this, since each game
has domain-specific knowledge about its geometry that can be used for
more efficient rectangle management. For instance, a game might know that
some of the rects don’t overlap any other rects, so they don’t have to
be checked for intersection.

But if you are confident that your code is generally useful, feel free
to release it so people can include it in their games.

basically the code rearranges all the rects so that no pixel is
updated twice. it also prunes and carefully places new rectangles
so that the minimum number of rectangles is used.

to try it out, include the source file with your project and
add the declaration for the new function in your file.

As it stands your code doesn’t do anything at all, but I suppose that
is just a typo somewhere. Otherwise the algorithm looks like O(N^2),
checking all pairs for intersection, so I don’t think it will scale
well.

By the way, you should not use SDL_UpdateRect() repeatedly. Use
SDL_UpdateRects() instead. (There is a very important difference on many
platforms, especially X11.)

Hello,

I have a quite big performance problem.

I’m building a software 3d isometric engine. I use a cahce not to having to
redraw the entire terrain each frame.
Draiong the full terrain in 640x480 on my K6-2 450 on win2k takes about 110
ms.
But calling SDL_UpdateRect(screen,0,0,0,0) takes 60 ms !!!
I have a surface in system memory, 32 bpp, 640x480. I haven’t been able to
try fullscreen hardware (it crashes, perhaps a bug from me or from ym
banshee beta driver); but this time seems me REALLY too big, because copying
from system memory to system memory using memcpy a 640x360 surface takes 15
ms. Isn’t there any way to make it faster ?
because I can’t find any way out, doing 3d software need system memory
framebuffer.
Please help me,

Thanks

Stephane> By the way, you should not use SDL_UpdateRect() repeatedly. Use

SDL_UpdateRects() instead. (There is a very important difference on many
platforms, especially X11.)

“Mattias Engdeg?rd” wrote in

i’m releasing the code here so people can try it out, report
the results, maybe even tune it further? if all goes well i’d
like to see if this could get some form of this merged with SDL.

I don’t think SDL should include anything like this, since each game
has domain-specific knowledge about its geometry that can be used for
more efficient rectangle management. For instance, a game might know that
some of the rects don’t overlap any other rects, so they don’t have to
be checked for intersection.

But if you are confident that your code is generally useful, feel free
to release it so people can include it in their games.

well, i’ll definitely release the code (either way). i’ve been thinking
a little about the ‘program knows how to optimize best’ situation. in
the big run i see there are basic rectangle updates which the programmer
knows won’t be intersecting. for them the developer can still call
SDL_UpdateRects and not have to worry about it. In the second case
there is the usual list of sprites that need updating. The developer
could implement some smart checking, but really i don’t see it gaining
too much over something generic like this.

As it stands your code doesn’t do anything at all, but I suppose that
is just a typo somewhere. Otherwise the algorithm looks like O(N^2),
checking all pairs for intersection, so I don’t think it will scale
well.

hmm, it’s working on this end™. i was definitely worried about
about the N^2 time, but in the aliens test it appears to be scaling
excellently. in fact, the ratio of speedup rises the larger the
number of sprites. this seems to show any time spent looping over
the rects is gaining a lot more than the cost of updating the pixels.

By the way, you should not use SDL_UpdateRect() repeatedly. Use
SDL_UpdateRects() instead. (There is a very important difference on
many platforms, especially X11.)

heh, this is exactly what i was hoping to find. i’ll roll one up that
does a cleaner job. in the meantime, my main question is if Aliens
presents a close enought to real-world usage of SDL. if so this
routine becomes very useful (a free 30% to 80% speedup is hard to
turn down)

anyways, i’m still pulling for the possible merging with SDL.
i don’t see how this is too far from the SDL_Blit and SDL_LowerBlit
combo. updating needless pixels seems far slower than the time needed
to clean the list of rectangles. of course, if you know your rects
are already clean you can call the version of the function that
won’t bother. (just like if you know your blit coordinates are all
kosher, you can call the routine that doesn’t bother cleaning them.)

i’ll be playing with other programs this weekend. i’ll see how they
do.

“Stephane Magnenat” wrote in message
news:001901c014ff$0800cca0$0a01a8c0 at K6450…

I have a quite big performance problem.

Yes.

I’m building a software 3d isometric engine. I use a cahce not to having
to
redraw the entire terrain each frame.
Draiong the full terrain in 640x480 on my K6-2 450 on win2k takes about
110
ms.

That’s awful.

But calling SDL_UpdateRect(screen,0,0,0,0) takes 60 ms !!!

That’s really awful.

I have a surface in system memory, 32 bpp, 640x480. I haven’t been able to
try fullscreen hardware (it crashes, perhaps a bug from me or from ym
banshee beta driver); but this time seems me REALLY too big, because
copying
from system memory to system memory using memcpy a 640x360 surface takes
15
ms. Isn’t there any way to make it faster ?

Even (especially?) that is awful. Methinks sys-mem to sys-mem should not
take that long.

because I can’t find any way out, doing 3d software need system memory
framebuffer.

Are you running any processes in the background? Are you getting accurate
timings? Are your surfaces the same depth as your display? Have you tried
decreasing bits-per-pixel to decrease bandwidth? Have you tried
SDL_HWSURFACE? Have you tried getting a minimal full-screen example program
to compile and run?

Have you downloaded Zynx (http://rainerdeyke.com/zynx)? Does it work? If
so, is it smooth?

It is not clear from your post if you are using OpenGL or not.–
Rainer Deyke (root at rainerdeyke.com)
Shareware computer games - http://rainerdeyke.com
"In ihren Reihen zu stehen heisst unter Feinden zu kaempfen" - Abigor

In the second case
there is the usual list of sprites that need updating. The developer
could implement some smart checking, but really i don’t see it gaining
too much over something generic like this.

It’s nothing that needs to be done inside SDL for integration, nor is it
platform-specific code in any way. It can just as well be done by the
user, and often better.

hmm, it’s working on this end™.

Let’s look at your code, shall we?


int workcount;

workcount = 0;

/* now we do the real work, breaking ovelapping rects and finding hidden */
for(i = 0; i < workcount; ++i) {


}

/* now we update the rects */
for(i = 0; i < workcount; ++i) {


}

I hope this is not the secret behind your good results.

i was definitely worried about
about the N^2 time, but in the aliens test it appears to be scaling
excellently. in fact, the ratio of speedup rises the larger the
number of sprites. this seems to show any time spent looping over
the rects is gaining a lot more than the cost of updating the pixels.

This is a good indication that something is wrong.

heh, this is exactly what i was hoping to find. i’ll roll one up that
does a cleaner job. in the meantime, my main question is if Aliens
presents a close enought to real-world usage of SDL. if so this
routine becomes very useful (a free 30% to 80% speedup is hard to
turn down)

No, the Aliens demo is not a real-world usage of SDL.

Hello,

thanks fdor the answer, I’ll try to clarify the ideas :

I’m building a software 3d isometric engine. I use a cahce not to having
to
redraw the entire terrain each frame.
Draiong the full terrain in 640x480 on my K6-2 450 on win2k takes about
110
ms.

That’s awful.

yes I know, but that 3d software, need a bit optimising

But calling SDL_UpdateRect(screen,0,0,0,0) takes 60 ms !!!

That’s really awful.

Yes, I have tried on a K6 225, and it takes only 30 ms, win2k and banshee
drivers sucks :frowning:

I have a surface in system memory, 32 bpp, 640x480. I haven’t been able
to

try fullscreen hardware (it crashes, perhaps a bug from me or from ym
banshee beta driver); but this time seems me REALLY too big, because
copying
from system memory to system memory using memcpy a 640x360 surface takes
15
ms. Isn’t there any way to make it faster ?

Even (especially?) that is awful. Methinks sys-mem to sys-mem should not
take that long.

Fro this, I use memcpy, just because I didn’t know how SDL was doing bad on
my system, but Uint32 aligned copy should be fast better.
I know, and I always use 32 bpp, all the surface at sames depth that screen
(I converted all them before using)

because I can’t find any way out, doing 3d software need system memory
framebuffer.

Are you running any processes in the background? Are you getting accurate
timings? Are your surfaces the same depth as your display? Have you
tried
decreasing bits-per-pixel to decrease bandwidth? Have you tried
SDL_HWSURFACE? Have you tried getting a minimal full-screen example
program
to compile and run?

I’m using software, and using HWSURFACE in fullscreen makes my program
crahsed, and I know to CTRL-ALT-DEL Msdev to return to normal state, and
then I can’t know how and why it has crashed. Anyway, I need to stay in 32
bpp all the time, excepted at final blit, because I do really a lot of alpha
blend and other stuff, and It’s good in 32 bpp.

Have you downloaded Zynx (http://rainerdeyke.com/zynx)? Does it work? If
so, is it smooth?

I’ll try it as soon as possible

It is not clear from your post if you are using OpenGL or not.

I’m not using OpenGL at all, it is 3d software

Thanks

Stephane

“Stephane Magnenat” wrote in message
news:002901c01588$479226f0$0a01a8c0 at K6450…

Hello,

thanks fdor the answer, I’ll try to clarify the ideas :

But calling SDL_UpdateRect(screen,0,0,0,0) takes 60 ms !!!

That’s really awful.

Yes, I have tried on a K6 225, and it takes only 30 ms, win2k and banshee
drivers sucks :frowning:

If you’re updating the entire screen each frame, you should be using be
using SDL_Flip, but that only works with SDL_FULLSCREEN | SDL_DOUBLEBUF |
SDL_HWSURFACE.

because I can’t find any way out, doing 3d software need system memory
framebuffer.

Are you running any processes in the background? Are you getting
accurate

timings? Are your surfaces the same depth as your display? Have you
tried
decreasing bits-per-pixel to decrease bandwidth? Have you tried
SDL_HWSURFACE? Have you tried getting a minimal full-screen example
program
to compile and run?

I’m using software, and using HWSURFACE in fullscreen makes my program
crahsed, and I know to CTRL-ALT-DEL Msdev to return to normal state, and
then I can’t know how and why it has crashed. Anyway, I need to stay in 32
bpp all the time, excepted at final blit, because I do really a lot of
alpha
blend and other stuff, and It’s good in 32 bpp.

I’ve had problems like that. They can be traced by embedding calls printf
in the critical code and reading the stdout.txt file afterwards. Remember
to fflush(stdout) after each printf.

It is not clear from your post if you are using OpenGL or not.

I’m not using OpenGL at all, it is 3d software

Understood. The reason I was asking is that it is possible for OpenGL to
use software rendering in some cases.–
Rainer Deyke (root at rainerdeyke.com)
Shareware computer games - http://rainerdeyke.com
"In ihren Reihen zu stehen heisst unter Feinden zu kaempfen" - Abigor

If you’re updating the entire screen each frame, you should be using be
using SDL_Flip, but that only works with SDL_FULLSCREEN | SDL_DOUBLEBUF |
SDL_HWSURFACE.

it always works, but has the same effect as a full-screen SDL_UpdateRect()
if page flipping isn’t available

If you’re updating the entire screen each frame, you should be using be
using SDL_Flip, but that only works with SDL_FULLSCREEN | SDL_DOUBLEBUF |
SDL_HWSURFACE.

it always works, but has the same effect as a full-screen SDL_UpdateRect()
if page flipping isn’t available

It’s not exactly the same if you’re not redrawing the whole screen every
frame. But in general, yes, this is true.

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

it always works, but has the same effect as a full-screen SDL_UpdateRect()
if page flipping isn’t available

It’s not exactly the same if you’re not redrawing the whole screen every
frame. But in general, yes, this is true.

Sorry, I meant that when page flipping is not available, SDL_Flip()
has the same effect as SDL_UpdateRect(). It seems my talent is not for
human languages