Reading and writing Data to/from Surface

HI…

I am trying to develop a FRomTopView Type of Game. It is supposed to become
some Sort of Ego Shooter but in 2d seen from the top. The idea is, that not
the PLayer char Rotates, when one moves the mouse, but World around it. So,
This means, i have some Graphics stored in a Surface (the map) and the Screen
Surface. Now i have to Rotate and Translate the image in the map and then
write it to the Screen Surface…

Well it turns out, that this is really slow, since i can`t use Blits… On a
1.2Ghz Duron with a Geforce2mx graphics Card i get about 90 fps for a 320x240
Window… But i get down to 7fps (seven) with 640x480…

I could use OpenGl and just use the Map as a Texture and Let the Hardware do
the Rotation, but this is intended to be as lightweight as possible and
should work with no hardware gl support…

And seemingly the Trnasformation is pretty fast, but the REading and
writiing to and from the surfaces is the slowest part…

Funnily i also don`t get an HWSurface for any Resolution… But this is a
different story…

Any Hints or Suggestions?

T.I.A Florian Schmidt–
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr f?r 1 ct/ Min. surfen!

i’m not an expert, but … hope my comments helps

HI…

I am trying to develop a FRomTopView Type of Game. It is supposed to become
some Sort of Ego Shooter but in 2d seen from the top. The idea is, that not
the PLayer char Rotates, when one moves the mouse, but World around it. So,
This means, i have some Graphics stored in a Surface (the map) and the
Screen Surface. Now i have to Rotate and Translate the image in the map and
then write it to the Screen Surface…

Well it turns out, that this is really slow, since i can`t use Blits… On a
1.2Ghz Duron with a Geforce2mx graphics Card i get about 90 fps for a
320x240 Window… But i get down to 7fps (seven) with 640x480…

I could use OpenGl and just use the Map as a Texture and Let the Hardware
do the Rotation, but this is intended to be as lightweight as possible and
should work with no hardware gl support…

that’s true

And seemingly the Trnasformation is pretty fast, but the REading and
writiing to and from the surfaces is the slowest part…

that’s strange. I mean, copying a whole image from system memory to video
memory is slow, but not that slow. Are you using “SDL_BlitSurface” for this ?

Funnily i also don`t get an HWSurface for any Resolution… But this is a
different story…

if you are doing rotation on the surface, i would keep it in system memory
(not video), because reading and writting to video memory costs much more,
and the transformation probably requires lot’s of reading and writing. So use
a SWSURFACE as scratch (probably with a bigger area than the screen) and then
blit the screen are to the framebuffer …

Any Hints or Suggestions?

hope this helps … i may have misunderstood your current schem though.

:slight_smile:

janOn Thursday 07 November 2002 10:57, mista.tapas at gmx.net wrote:

T.I.A Florian Schmidt

I am trying to develop a FRomTopView Type of Game. […]
Now i have to Rotate and Translate the image in the map and
then write it to the Screen Surface…

You may want to check out SDL_rotozoom, or even better, SDL_gfx. See
www.libsdl.org for more info.

Well it turns out, that this is really slow […] i get about 90
fps
for a 320x240 Window… But i get down to 7fps (seven)
with 640x480…

Are you blitting the scene as a normal 2D screen first, and then
rotating the whole screen at once (as a rotozoomer would)? If that
is the case, then the main reason for the slowdown is probably that
you access the source surface (the one that is being rotated) in a
more-or-less random manner. This is almost impossible for the
compiler to optimize, and the CPU can’t really cache your source
data. I recently wrote a screensaver that rotozooms the user’s
display (not using SDL). It was just as slow as your program, and
the only reason for that was the randomness of the source surface
access. I did not bother to optimize my code, so I’m afraid I can’t
offer any solutions…–
Matthijs Hollemans
All Your Software
www.allyoursoftware.com

Hello Matthijs,

Thursday, November 07, 2002, 15:25:29, you wrote:

I am trying to develop a FRomTopView Type of Game. […]
Now i have to Rotate and Translate the image in the map and
then write it to the Screen Surface…

MH> You may want to check out SDL_rotozoom, or even better, SDL_gfx. See
MH> www.libsdl.org for more info.

SDL_gfx supports only full surface zooming, but if I need to zoom a
region?

  1. I need to create a new temporary surface, then zoom it, it`s
    expensive for animation.
  2. What can I use instead SDL_gfx?–
    Best regards,
    Alexander mailto:Editor at echo.ru

I am trying to develop a FRomTopView Type of Game. […]
Now i have to Rotate and Translate the image in the map and
then write it to the Screen Surface…

MH> You may want to check out SDL_rotozoom, or even better, SDL_gfx. See
MH> www.libsdl.org for more info.

SDL_gfx supports only full surface zooming, but if I need to zoom a
region?

  1. I need to create a new temporary surface, then zoom it, it`s
    expensive for animation.
  2. What can I use instead SDL_gfx?

for rotation you will always need to work with 2 surfaces, the original and
the one where the algorithm is writing the rotationed image … (otherwise
during the rotation the algorithm would overwrite original pixels that are
still needed) So at least a scratch surface is needed … (which could be
later blitted back to the original screen or directly to the framebuffer)

Hmm…

If the problem is doing rotation to a fixed number of angles, (If, for
instance, you had a 5 degree resolution, you would only need 72 angles)
you may want to look into “Template Metaprogramming” in C++ (
http://osl.iu.edu/~tveldhui/papers/Template-Metaprograms/meta-art.html )

You could convert the rotation algorythm to a template meta-program
taking the angle as an argument. The result would be 72 distinct
routines for rotating to each distinct angle, which the compiler should
be much better able to optimize. (Please note that you will probably at
least need Template MetaProgrammed versions of sin() and cos() for this
to work effectively.)

I’m not confident from the original posters email that this is indeed
his problem (He mentioned that all the time was being consumed in the
blits, not the rotation), but I hope that someone will find this useful.

-LorenOn Thu, 2002-11-07 at 07:25, Matthijs Hollemans wrote:

I am trying to develop a FRomTopView Type of Game. […]
Now i have to Rotate and Translate the image in the map and
then write it to the Screen Surface…

You may want to check out SDL_rotozoom, or even better, SDL_gfx. See
www.libsdl.org for more info.

Well it turns out, that this is really slow […] i get about 90
fps
for a 320x240 Window… But i get down to 7fps (seven)
with 640x480…

Are you blitting the scene as a normal 2D screen first, and then
rotating the whole screen at once (as a rotozoomer would)? If that
is the case, then the main reason for the slowdown is probably that
you access the source surface (the one that is being rotated) in a
more-or-less random manner. This is almost impossible for the
compiler to optimize, and the CPU can’t really cache your source
data. I recently wrote a screensaver that rotozooms the user’s
display (not using SDL). It was just as slow as your program, and
the only reason for that was the randomness of the source surface
access. I did not bother to optimize my code, so I’m afraid I can’t
offer any solutions…

Matthijs Hollemans
All Your Software
www.allyoursoftware.com


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

for rotation you will always need to work with 2 surfaces, the original
and
the one where the algorithm is writing the rotationed image … (otherwise

during the rotation the algorithm would overwrite original pixels that are

still needed) So at least a scratch surface is needed … (which could be
later blitted back to the original screen or directly to the framebuffer)

I use two buffers. Sorry if tat wasnt clear. I have the Map-image in a
Surface, and The Rotation-algorithm reads from this buffer and writes into the
screen surface, which is then updatet by SDL_Flip() (if that was the name of
the function that flips the doublebuffer)…

Some other person mentioned that this is computationally heave, since the
pixels in the Map Image get accessed on a pretty much random basis (at least it
seems so to the compiler)… I can understand that point… But there must be
some computational tricks. I remember Demos from the 486 and 386 days which
did some amazing full screen rotating and zooming and blending at 320x200…

I`m prettty much puzzled how they achieved that…

Florian Schmidt–
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr f?r 1 ct/ Min. surfen!

i’m not an expert, but … hope my comments helps

sure :slight_smile:

And seemingly the Trnasformation is pretty fast, but the REading and
writiing to and from the surfaces is the slowest part…

that’s strange. I mean, copying a whole image from system memory to video
memory is slow, but not that slow. Are you using “SDL_BlitSurface” for
this ?

No my loop looks (in pseudo code) like this

for all (x,y) in screen surface
Transform (x,y) to (x2,y2) (Map coordinates)
writeToScreenSurface(x,y, getMapPixel(x2,y2))
end for

Funnily i also don`t get an HWSurface for any Resolution… But this is a
different story…

if you are doing rotation on the surface, i would keep it in system memory

(not video), because reading and writting to video memory costs much more,

and the transformation probably requires lot’s of reading and writing. So
use
a SWSURFACE as scratch (probably with a bigger area than the screen) and
then
blit the screen are to the framebuffer …

Ok, so you say, i should have another Surface in System memory, where i
assemble all the scene (Rotated Map, player char, weapons fire) and then blit
this to the screen surface?

I will try this as soon as i get home (may take a while though, university
day`s just started)–
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr f?r 1 ct/ Min. surfen!

An embedded and charset-unspecified text was scrubbed…
Name: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20021108/2b402d5c/attachment.txt

Hi,

No my loop looks (in pseudo code) like this

for all (x,y) in screen surface
Transform (x,y) to (x2,y2) (Map coordinates)
writeToScreenSurface(x,y, getMapPixel(x2,y2))
end for

Well, I hope you don’t use the function calls in your real code, because
then there is at
least one other reason for the slow-ness :wink:

Actually i did so at first, but then i thought “hmm, maybe the function call
overhead makes it slow”, but this was not the reason… I commented out all
Surface access in the functions and i got a whoppinbg 300fps (with of course
no picture rendered at all)…

Then i removed the “//”`s and back it went to crawling…

But maybe the compiler fooled me with some optimization… I`m not sure…

Florian Schmidt–
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr f?r 1 ct/ Min. surfen!

sorry, i was misundestanding … I think your idea is as fast as it could go
(but for some hardware implementation that i don’t know of).

my guess is that the problem is in the “writeToScreenSurface” function …
One way to check that is replacing your algorithm by:

for all (x,y) in screen surface
writeToScreenSurface(x,y, random_value)
end for

and check the speed you get with only that …

if that’s not the problem, then add:

for all (x,y) in screen surface
writeToScreenSurface(x,y, getMapPixel( random_coordinates ) )
end for

and so on … untill you find what is slow. I curious about the problem of
the (lack of) locality of the data: suppose memories running at 100Mhz …
random access would invalidade cache, a 640x480x2 have 600Kb of data do be
read. A very rough estimate is that reading will cost you around 1/100 of a
second per frame. I wonder what you’ll get from the tests above (if you come
to do them)

hey, pls, let us know what you find out, i’m very curious about the result :slight_smile:

janOn Friday 08 November 2002 06:02, mista.tapas at gmx.net wrote:

for rotation you will always need to work with 2 surfaces, the original
and
the one where the algorithm is writing the rotationed image …
(otherwise

during the rotation the algorithm would overwrite original pixels that
are

still needed) So at least a scratch surface is needed … (which could be
later blitted back to the original screen or directly to the framebuffer)

I use two buffers. Sorry if tat wasnt clear. I have the Map-image in a
Surface, and The Rotation-algorithm reads from this buffer and writes into
the screen surface, which is then updatet by SDL_Flip() (if that was the
name of the function that flips the doublebuffer)…

Some other person mentioned that this is computationally heave, since the
pixels in the Map Image get accessed on a pretty much random basis (at
least it seems so to the compiler)… I can understand that point… But
there must be some computational tricks. I remember Demos from the 486 and
386 days which did some amazing full screen rotating and zooming and
blending at 320x200…

I`m prettty much puzzled how they achieved that…

Florian Schmidt

for all (x,y) in screen surface
writeToScreenSurface(x,y, random_value)
end for

and check the speed you get with only that …

12 fps

if that’s not the problem, then add:

for all (x,y) in screen surface
writeToScreenSurface(x,y, getMapPixel( random_coordinates ) )
end for

7fps

and so on … untill you find what is slow. I curious about the problem of

the (lack of) locality of the data: suppose memories running at 100Mhz …

random access would invalidade cache, a 640x480x2 have 600Kb of data do be

read. A very rough estimate is that reading will cost you around 1/100
of a
second per frame. I wonder what you’ll get from the tests above (if you
come
to do them)

hey, pls, let us know what you find out, i’m very curious about the result
:slight_smile:

Hmm, i`m courious about what my results mean–
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr f?r 1 ct/ Min. surfen!

Hmm, i`m courious about what my results mean

your function to write pixels to the screen takes (for the whole screen):
~ 1/12 s = 0.083s

with random reads from memory:
~ 1/7 s = 0.142s

that’s about 60ms only for some random reads …

not that this means anything :slight_smile: but it’s a good measure to have in mind when
estimating timings to do things

i did the following test here:

#include
#include

#define SIZE 640*480
unsigned short data[SIZE];
unsigned short data2[SIZE];

int main()
{
unsigned short s;
unsigned long x=0;
for ( int j = 0; j < 100; j ++ )
for (int i = 0; i < SIZE; i ++ )
{
int idx = (int) ( SIZE *( double) rand() / (RAND_MAX+1.0) );
x += data[ idx ];
data2[i] = x & 0xFFFF;
}
}

running this i got:

$ g++ -O2 -o test test.cc && time ./test

real 0m3.930s
user 0m3.870s
sys 0m0.000s

divide this by 100 (because of the j loop) and this would be approximately
the time to generate a scratch image (i’m working in an athlon 1.4g). Roughly
25 fps, not counting the blit to the screen and the page flip timings (or if
one would rewrite data2[i] to a direct screen write).

the results are compatible with your 7 fps, because of the extra processing
involved and that an average cpu specs is still a bit lower than this machine
i’m working with.

some other interesting data: if i change the “x += data[ idx ]” above for “x
+= data[ i ]”, so that the reads are done with high locality of data, i still
get:

real 0m2.492s
user 0m2.470s
sys 0m0.000s

better, but not that better …

sorry. not much of a help in the end. Pls, if you find some better solution
let us know, there must be a way …

jan

ps.: do you have to rotate every frame ? Maybe make objects move on the
rotated image for some frames and let the background(?) rotate more sparingly
… just an idea …

Jan Pfeifer wrote:

real 0m3.930s
user 0m3.870s
sys 0m0.000s

These are my results on an XP2100 with DDR333:

real 0m2.542s
user 0m2.540s
sys 0m0.000s

This example is nice because it shows how much important is cache in
modern PCs and how fast is the 3d hardware we use.

About the fact that there are rotozoom working on 486s you should think
that in that case you used 320x200 res and 8bit depth, this is a random
access on 64000 bytes against an access on 614400. And direct vga buffer
access is not ten times better than 6/7 years ago.

To make an highres rotozoom game with a decent framerate you definitely
need to use 3d acceleration IMHO…

Bye,
Gabry

BTW I remember also a rotozoom game on A500 (7mhz 68000), that one used
the great amiga blitter in a clever way to make rotations with 0 cpu
usage, obviously rotations had not a great precision :slight_smile: Anyway modern
VGA’s blitters are not that flexible when speaking about 2d :frowning:

An embedded and charset-unspecified text was scrubbed…
Name: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20021108/2f6ed4e7/attachment.txt

BTW I remember also a rotozoom game on A500 (7mhz
68000), that one used the great amiga blitter in a clever way
to make rotations with 0 cpu usage, obviously rotations had
not a great precision :slight_smile: Anyway modern VGA’s blitters are
not that flexible when speaking about 2d :frowning:

I don’t know if this used the “shear” trick, but it is possible to
achieve a pretty accurate rotation effect by performing two shears
on the source image. This may be easier to optimize with respect to
the cache. You’ll have to google for the exact algorithm, because I
don’t have it handy.–
Matthijs Hollemans
All Your Software
www.allyoursoftware.com

I don’t know if this used the “shear” trick, but it is possible to
achieve a pretty accurate rotation effect by performing two shears
on the source image. This may be easier to optimize with respect to
the cache. You’ll have to google for the exact algorithm, because I
don’t have it handy.

What is a shear? Sorry, english is not my native language…

I suppose a 8 bit pixel depth should be sufficent but then i will have to
bother with color lookup tables, and i thought this time was over now (i
remember programming vga cards directly and the hassles with color lookup)… but
i suppose it will make the rotation stuff more efficient…

Well thanks all you guys. I have some brainfood now thanks to your input… I
will perform some testing when i get home…

Florian Schmidt–
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr f?r 1 ct/ Min. surfen!

in a simple way to explain is: if you shear a “|” you get a “/”, or a “” if
to the other side :slight_smile: – note, the height is the same, it’s just slided to the
side.

It’s just to slide each line (or column if a vertical shear) a proportional
ammount to a side. Did you get it ?On Friday 08 November 2002 10:30, mista.tapas at gmx.net wrote:

I don’t know if this used the “shear” trick, but it is possible to
achieve a pretty accurate rotation effect by performing two shears
on the source image. This may be easier to optimize with respect to
the cache. You’ll have to google for the exact algorithm, because I
don’t have it handy.

What is a shear? Sorry, english is not my native language…

I suppose a 8 bit pixel depth should be sufficent but then i will have to
bother with color lookup tables, and i thought this time was over now (i
remember programming vga cards directly and the hassles with color
lookup)… but i suppose it will make the rotation stuff more efficient…

Well thanks all you guys. I have some brainfood now thanks to your input…
I will perform some testing when i get home…

Florian Schmidt

I just remember something. Don’t you end up with a
lot of pixel-sized gabs in your final image after the
rotation? I rember you have to apply some filter when
you have blitted the rotation because of the conversion
from floating point to raster. Or maybe I didn’t use the
right algoritm, can’t remember…

The gaps occur when you do the transformation in the wrong order. If
you loop through the source image and “forward” transform all the
pixels, then you will miss some of the pixels in the destination
image because they don’t map 1-to-1. So instead you loop through the
destination image, and “backward” transform all the pixels. In other
words, for each destination pixel you determine the corresponding
source pixel. No gaps.–
Matthijs Hollemans
All Your Software
www.allyoursoftware.com

BTW I remember also a rotozoom game on A500 (7mhz 68000), that one used
the great amiga blitter in a clever way to make rotations with 0 cpu
usage, obviously rotations had not a great precision :slight_smile: Anyway modern
VGA’s blitters are not that flexible when speaking about 2d :frowning:

Seek 'n Destroy ? :wink: