Reading and writing Data to/from Surface

Well thanks all you guys. I have some brainfood
now thanks to your input…

If this problem is indeed related to “locality issues” – i.e.
reading from the source image causes cache hits all the time – then
the solution is obviously to improve the locality of the data so it
fits into the cache as much as possible.

You may want to think about using the following approach: You said
you use a map, which probably means you have a bunch of tiles.
Instead of blitting all these tiles on the screen and then rotating
the whole screen, you may want to rotate each tile individually up
front, and then blit the rotated tiles to the screen. Why would this
be faster? Because the data for one tile easily fits into the cache,
whereas the whole screen does not. Of course, this makes your
rendering engine a little more complicated…–
Matthijs Hollemans
All Your Software
www.allyoursoftware.com

the locality issue. It’s good, but not that much. And i wonder that in the
tiles algorithm you may end up having problems with locality on the writting
surface …

I think the gain with the shear trick is actually from hw acceletration. The
shear can be done with a series of blits, right ? (one for each line and then
one for each column) Would this work fast ?from my test (see the code in the other email) there was only a 50% gain with

On Friday 08 November 2002 11:05, Matthijs Hollemans wrote:

Well thanks all you guys. I have some brainfood
now thanks to your input…

If this problem is indeed related to “locality issues” – i.e.
reading from the source image causes cache hits all the time – then
the solution is obviously to improve the locality of the data so it
fits into the cache as much as possible.

You may want to think about using the following approach: You said
you use a map, which probably means you have a bunch of tiles.
Instead of blitting all these tiles on the screen and then rotating
the whole screen, you may want to rotate each tile individually up
front, and then blit the rotated tiles to the screen. Why would this
be faster? Because the data for one tile easily fits into the cache,
whereas the whole screen does not. Of course, this makes your
rendering engine a little more complicated…

I could be wrong, but I seem to remember that the shearing trick worked
best for angles between 45 and -45 degrees… So, you would be best
served to do your rotations in 90 degree increments first, then shear…

-LorenOn Fri, 2002-11-08 at 04:26, Matthijs Hollemans wrote:

BTW I remember also a rotozoom game on A500 (7mhz
68000), that one used the great amiga blitter in a clever way
to make rotations with 0 cpu usage, obviously rotations had
not a great precision :slight_smile: Anyway modern VGA’s blitters are
not that flexible when speaking about 2d :frowning:

I don’t know if this used the “shear” trick, but it is possible to
achieve a pretty accurate rotation effect by performing two shears
on the source image. This may be easier to optimize with respect to
the cache. You’ll have to google for the exact algorithm, because I
don’t have it handy.

Matthijs Hollemans
All Your Software
www.allyoursoftware.com


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

OK, so somebody already mentioned a shear-trick, if you’re interested in how
it looks like you can fetch:

http://www.kbs.twi.tudelft.nl/People/Staff/J.C.Wojdel/roto_test.tgz

The program loads an image and pseudo-rotates it using combinations of
horizontal and vertical shears (skews). Just move your mouse around to see
it. On small rotations (only 2 skews) I get arount 45fps on P3/933. For
rotations up to 90degrees (5 skews) it drops down to 30fps.
The program will work only in 32bps (see lines 109 and 118, where the
pointers should be cast depending on the pixeldepth), because I was too lazy
to do it properly (you should make depth switch in the beginning of the
function and repeat the loops with different casts; you don’t want to make
switch within the loop).
Is it faster than yours ?
Regards,
JacekOn Fri, Nov 08, 2002 at 09:02:07AM +0100, mista.tapas at gmx.net wrote:

for rotation you will always need to work with 2 surfaces, the original
and
the one where the algorithm is writing the rotationed image … (otherwise

during the rotation the algorithm would overwrite original pixels that are

still needed) So at least a scratch surface is needed … (which could be
later blitted back to the original screen or directly to the framebuffer)

I use two buffers. Sorry if tat wasnt clear. I have the Map-image in a
Surface, and The Rotation-algorithm reads from this buffer and writes into the
screen surface, which is then updatet by SDL_Flip() (if that was the name of
the function that flips the doublebuffer)…

Some other person mentioned that this is computationally heave, since the
pixels in the Map Image get accessed on a pretty much random basis (at least it
seems so to the compiler)… I can understand that point… But there must be
some computational tricks. I remember Demos from the 486 and 386 days which
did some amazing full screen rotating and zooming and blending at 320x200…

I`m prettty much puzzled how they achieved that…


±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

it’s a cool example, but it’s worth mentioning that it runs with 320x320
(we’ve been considering 640x480x16 bits in the discussion).

=)

jan

btw it would be nice to make it rotate countinously (different angles require
extra shears …) and get some average speed …> The program loads an image and pseudo-rotates it using combinations of

horizontal and vertical shears (skews). Just move your mouse around to see
it. On small rotations (only 2 skews) I get arount 45fps on P3/933. For
rotations up to 90degrees (5 skews) it drops down to 30fps.
The program will work only in 32bps (see lines 109 and 118, where the
pointers should be cast depending on the pixeldepth), because I was too
lazy to do it properly (you should make depth switch in the beginning of
the function and repeat the loops with different casts; you don’t want to
make switch within the loop).
Is it faster than yours ?
Regards,
Jacek