Sensible optimization [was re: tile based _junk_]

Darrell Johnson wrote:

Frank Ramsay wrote:

[snip]

I have to disagree with you about this. By going ‘wild optimizing’ a
tile based engine you increase the number of free CPU cycles for the
program to do other things. If that is beffer FX or better enemy unit
AI.

The point was that this kind of optimization (reusing the unchanged
parts of the screen) is a compromise, cutting versatility in return for
a very small (in a scrolling engine) performance gain. You can’t use
the extra cycles for better FX, because that would invalidate more of
the screen and you’d lose them again.

That depends upon the kind of game you are writing, for instance, a
stragegy game like Warcraft or Civ re-using unchanged parts of the
display makes since because (I) would render the game in layers and
matte the layers together on the back buffer.
Yes, the entire screen will have to be re-freshed, but there is often
no need to re-render the entire screen when you can do linear memory
access to re-build the unchanged portions.

[snip, my little rant aboud resource usage and varius systems]

We’re talking about a simple scroller! Any Pentium system, and most
486ers, will do fine while regenerating the whole screen every frame.

Scrollers are not the only tile based games out there, most strategy games
are tile based as well. The exact game that (damn, can’t find the post) is
discussing is a ‘simple scroller’, that doesn’t mean that the techniques
could not be applied to a StarCraft clone that requires 20 fps regardless of the
work being done, then you really need an optimized engine.
Now for an example. When I started writing (actually porting from DOS) the

Isomentric engine I’m working on, at a resolution of 320x200 is got around
12 fps on a P-150MMX w/32Meg This did full screen refreshes for every frame.

[snip-some comments on computers]

Writing
sloppy code that needs a huge amount of resources simply because your
computer has those resources leads to code bloat Witness Lotus Notes,
a 200 Meg e-mail program (yes it does more than e-mail, but that all 80%
of people use it for)

I never said to write sloppy code. If you had bothered to read the

Fair enough, sloppy was a poor choice in words, perhaps less efficient would
have been better.

whole text of my message instead of just the paragraph you quoted, you’d
see that I said to concentrate your efforts on fundamental and necessary
improvements, not complex and superfluous hacks.

superfluous, that would depend upon your needs.
hacks, well I didn’t see anything in the thread I would call a hack.
His design seemed rather well planned and thought out. Not a quick fix
to get speed.

[snip, a whole bunch of really good advice]

-think versatility: don’t start off by optimizing around your intent to
allow only horizontal scrolling, or assuming that you won’t ever want
sprites to overlap, or you won’t want to cover the entire surface with
sprites or animate tiles; in general this means regenerating the entire
screen with every frame
[snip - the rest of the above mentioned advice]

I wanted to comment on this one item, Don’t try to make your code
so versitile that it takes 5 times (I just picked a number at random) as
long to draw as you need. Yes, plan your engine so that you can expand
it, but don’t make it so flexible that a simple task takes 5 times longer
than is needed. Keep your intent in mind, and when optimizing ask yourself,
does this engine really need to handle animated tiles? If you want animated
water, do you really need 2 tiles? or can use use palette rolling with one tile
and then not have to refresh the screen at all?

		-fjr-- 

Frank J. Ramsay - Software Engineer, Epsilon Data Management
fjr at epsilon.com
framsay at epsilon.com

Genetic Engineering: (noun) Proactive Evolution.

Frank Ramsay wrote:

Darrell Johnson wrote:

Honestly guys, I don’t think there’s any need to go wild optimizing a
basic tile-based engine. Today’s computers are awfully fast, and you

I have to disagree with you about this. By going ‘wild optimizing’ a
tile based engine you increase the number of free CPU cycles for the
program to do other things. If that is beffer FX or better enemy unit
AI.

The point was that this kind of optimization (reusing the unchanged
parts of the screen) is a compromise, cutting versatility in return for
a very small (in a scrolling engine) performance gain. You can’t use
the extra cycles for better FX, because that would invalidate more of
the screen and you’d lose them again.

Also you make the game less of a resource hog if it only needs 50%
the of the CPU time to run at full speed. Sure that eats up the cycles
elimiating your enhancement, but that is WHY you did the enhancement,
to allow for those additions. Not to mention that by super optimizing
the engine, you reduce the resources the games requires. If your game
needs a PII-450 w/256Meg of RAM to run, how many people do you think
are going to get a copy? Remember not everyone has a high end computer,
there are a lot of people with P-166’s (I’d say most home computers
are a P-166 or below) If you can optimize the same game so it runs
on a P-150 w/32Meg you will have a much larger audience pool.

We’re talking about a simple scroller! Any Pentium system, and most
486ers, will do fine while regenerating the whole screen every frame.

While we’re on the subject, the sad truth is that if you’re selling
games, the kind of people that have PII-450s w/256meg are the kind of
people who have money and take their gaming seriously and actually pay
for games - a pretty rare thing among computer owners. However, this is
completely offtopic and I imagine a fair number on this list are making
free stuff anyway.

Writing
sloppy code that needs a huge amount of resources simply because your
computer has those resources leads to code bloat Witness Lotus Notes,
a 200 Meg e-mail program (yes it does more than e-mail, but that all 80%
of people use it for)

I never said to write sloppy code. If you had bothered to read the
whole text of my message instead of just the paragraph you quoted, you’d
see that I said to concentrate your efforts on fundamental and necessary
improvements, not complex and superfluous hacks.

Here are some tips on making a fundamentally sound 2d graphics engine:
-first choose efficient data structures and algorithms; this is the
front line of optimization, where you can make hundredfold improvements
-keep seperate components seperate whenever possible, so you can modify
or replace one part without changing others
-try to access memory in a linear fashion whenever possible, cache hits
are your best friends on today’s fast computers with slow memories
-think versatility: don’t start off by optimizing around your intent to
allow only horizontal scrolling, or assuming that you won’t ever want
sprites to overlap, or you won’t want to cover the entire surface with
sprites or animate tiles; in general this means regenerating the entire
screen with every frame
-write the whole thing in C (or C++) first, get it working so you
understand the problems; you might find out that it is fast enough
without any more fiddling, and if you run out of time or motivation
you’ll be glad to have something working instead of some really great
ideas on how to make it go fast
-use a profiler; whatever you think you know, you don’t know which part
of your code is taking all the cycles until you use a profiler
-before writing assembly code, write a C equivalent and examine the
compiler’s assembly output; don’t be too surprised if the compiler does
a better job than you would have, or if it misses out on an obvious
opimization
-keep assembly to a minimum; as a rule, you shouldn’t rewrite anything
in assembly that takes up (by itself!) less than 20% of CPU load; the
way things are going, you might be really glad to be able to recompile
to a new platform with minimal changes
-when you have done everything else, and it’s still too slow, then
consider adding optimizations that exploit the details of your
particular game (examples: one-way scrolling in Super Mario Brothers,
tile-by-tile stepping in many CRPGs, level floors and plumb walls in
Doom)

I hate slow, bloated software as much as the next guy, but there are
sensible optimizations and silly ones.

Cheers,

Darrell Johnson

We’re talking about a simple scroller! Any Pentium system, and most
486ers, will do fine while regenerating the whole screen every frame.

Ha ha ha ha. I’m glad to hear you think 10fps is good. My pentium II 333 was
running my engine at 40fps regenerating the screen everytime, now I optimised
it thanks to the influence of this mailing list (except for you) and it runs
over 150fps. You’re thinking of a 320x240x8bitcolor tile based engine.
Multiply everything by 2 and you have to optimize. Don’t try and conseal your
ignorance with drivel, it’s just patronizing.

-= aaron p. matthews

rival games wrote:

We’re talking about a simple scroller! Any Pentium system, and most
486ers, will do fine while regenerating the whole screen every frame.

Ha ha ha ha. I’m glad to hear you think 10fps is good. My pentium II 333 was
running my engine at 40fps regenerating the screen everytime, now I optimised
it thanks to the influence of this mailing list (except for you) and it runs
over 150fps. You’re thinking of a 320x240x8bitcolor tile based engine.
Multiply everything by 2 and you have to optimize. Don’t try and conseal your
ignorance with drivel, it’s just patronizing.

Come on, we may not agree with him, but please don’t be insulting. I know it
can be hard not to shoot from the hip when you see something you disagree with,
but we don’t want this to degernerate into name calling.

btw, multiply by 4, not 2, :slight_smile:

		-fjr-- 

Frank J. Ramsay - Software Engineer, Epsilon Data Mangement
fjr at epsilon.com
framsay at epsilon.com

Genetic Engineering: (noun) Proactive Evolution.

Come on, we may not agree with him, but please don’t be insulting. I know it
can be hard not to shoot from the hip when you see something you disagree with,
but we don’t want this to degernerate into name calling.

btw, multiply by 4, not 2, :slight_smile:

You’re right, I said what I did on impulse, which is what I’m used to from back in
the BBS days… Oh well, this won’t be the 1st message list I’ve gotten kicked off
of.

-= aaron p. matthews

rival games wrote:

We’re talking about a simple scroller! Any Pentium system, and most
486ers, will do fine while regenerating the whole screen every frame.

Ha ha ha ha. I’m glad to hear you think 10fps is good. My pentium II 333 was
running my engine at 40fps regenerating the screen everytime, now I optimised
it thanks to the influence of this mailing list (except for you) and it runs
over 150fps. You’re thinking of a 320x240x8bitcolor tile based engine.

Ahem, how were you regenerating the screen every time? In one
assembly-coded long linear write to the screen buffer (the kind of
sensibly optimized 2d engine I was talking about), or unoptimized C
copying one tile at a time with plenty of unnecessary (or inappropriate;
such as <32 bit or not on 32 bit boundaries) memory accesses. Copying
from the tiles in memory to the screen is not a lot faster than copying
one area of the screen to another (unless it is using a hardware speedup
in the video card, which you can’t count on).

If you haven’t made a near-optimal full-regeneration version, how can
you claim another strategy is better? The truth is that you are
comparing an unoptimized engine to an optimized one, nothing more. This
strategy of reusing most of the screen is well suited to taking
advantage of one of the few C techniques that is as fast as well coded
assembly: a large memcpy. So I’ve been guessing you use the efficient
memcpy, then fill in the luckily small holes (due to relatively slow
scrolling and small numbers of sprites) inefficiently with your
unoptimized (or poorly optimized) C routines. This is not so much a
fundamental improvement as an easy means of gaining acceptable
performance in a special case. These kind of hacks are not suited to
high-performance games and are not good for your development as a
programmer.

Also, your Pentium II 330 will not usually be even twice as fast as a
Pentium 160 for this stuff (maybe 150%, if that). Their memory access
works at closer to the same rate, and you have to code carefully to not
have all the extra cycles eaten up by cache misses. It may seem a lot
faster in a memory-hogging windowing environment or a 3d game, but
that’s because the P2 330 undoubtedly has oodles more memory, a faster
hard drive, and a useable 3d card; none of which are factors here (Quake
without hardware 3d support would also run faster, but computation is
the limiting factor in that case, and Quake was very well written to
take advantage of higher clock rates).

To reiterate, I have never said that optimization is bad. I do,
however, believe that reusing the unused parts of the screen is a bad
optimization; it limits you to modifying only small parts of the screen
at a time, so you can’t have animated tiles or hundreds of sprites on
the screen.

As for the fellow who was talking about using this type of optimization
in a Starcraft-type game, think for a second: how much of the screen do
you think you could typically reuse in Starcraft? Maybe a lot, in the
beginning phase, but when you get to the exciting major battles where
you want a high frame rate, the screen is filled with sprites and you
can’t reuse any of it. An inconsistent frame rate is worse than a just
plain low frame rate; there’s nothing worse than thinking your machine
is fast enough to run a game at a certain resolution, then having it all
fall apart just when it gets interesting. As for maintaining a reusable
background layer, you are talking about a whole extra full-screen blit
per frame! This is slower than an efficient full regeneration, if you
are using tiles (if you are using some sort of voxel engine or more
complex 3d system which isn’t well suited to hardware acceleration, you
just might find this kind of strategy worthwhile; but I was never
talking about that stuff).

Modern computers are fast. They aren’t so fast that you can forget
about efficiency, they are fast in ways that change the rules, and old
optimization strategies that were brilliant in their day are now useless
or even counter-productive.

I know some of you guys are mad at me; I’ve been pretty blunt in
essentially calling a lot of your work useless garbage, so I’m not
offended. But take a minute and really think about what I’m saying
before you dismiss it out of hand.

BTW, you don’t actually think that you’re really running at 150 fps, do
you? If nothing else, your monitor can’t keep up with that.

Cheers,

Darrell Johnson

Ahem, how were you regenerating the screen every time? In one
assembly-coded long linear write to the screen buffer (the kind of
sensibly optimized 2d engine I was talking about), or unoptimized C

In your first message, you say not to use assembly code because it’s not portable
(which was grossly patronizing, since there’s not one person on this list who
doesn’t know that) And now you’re saying I should use it?

If you haven’t made a near-optimal full-regeneration version, how can
you claim another strategy is better? The truth is that you are
comparing an unoptimized engine to an optimized one, nothing more. This

Like I said, my engine origionally regenerated the whole screen. I couldn’t get it
at a reasonable framerate. I understand what you say about sacrificing speed for
more cool effects, but not everyone has a fast enough computer for that.

To reiterate, I have never said that optimization is bad. I do,
however, believe that reusing the unused parts of the screen is a bad
optimization; it limits you to modifying only small parts of the screen
at a time, so you can’t have animated tiles or hundreds of sprites on
the screen.

If you have an animating tile on the screen, you just keep track of it, and update
it when necessary. And even in StarCraft you rarely ever have 100 sprites on the
screen, and when you do, it’s slow! Even games like tetris have an inconsistant
frame rate.

BTW, you don’t actually think that you’re really running at 150 fps, do
you? If nothing else, your monitor can’t keep up with that.

See? Very patronizing…

Listen, take a look at my code AND THEN tell me what’s wrong with it. I can tell
you one thing wrong, it redraws the screen everytime you move out of a 640x480
screen, there’s a more efficient way to do that, and I know what that is, and I’ll
implement it later. I truly do wish that you were right, I want to be able to
redraw the whole screen every time. It makes things MUCH easier. But the only way
I’ve seen to attain this is to assume the video card has (good) 2D acceleration,
and I have no intention of that. I would like this to run reasonably fast on a
Pentium 133.

My code -= http://www.Nayzak.com/~jerryma/rival/emotion.cpp

-= aaron p. matthews

rival games wrote:

Ahem, how were you regenerating the screen every time? In one
assembly-coded long linear write to the screen buffer (the kind of
sensibly optimized 2d engine I was talking about), or unoptimized C

In your first message, you say not to use assembly code because it’s not portable
(which was grossly patronizing, since there’s not one person on this list who
doesn’t know that) And now you’re saying I should use it?

Don’t twist my words. I said to use assembly sparingly, where you need
it. In a tile-and-sprite game, using assembly to draw your tiles and
sprites is the most (if not the only) logical place.

If you haven’t made a near-optimal full-regeneration version, how can
you claim another strategy is better? The truth is that you are
comparing an unoptimized engine to an optimized one, nothing more. This

Like I said, my engine origionally regenerated the whole screen. I couldn’t get it
at a reasonable framerate. I understand what you say about sacrificing speed for
more cool effects, but not everyone has a fast enough computer for that.

Yes, like I said, your original unoptimized engine was too slow. You
added an optimization and now it goes faster. This doesn’t imply
anything else; certainly nothing about the inherent superiority of dirty
rectangle methods.

To reiterate, I have never said that optimization is bad. I do,
however, believe that reusing the unused parts of the screen is a bad
optimization; it limits you to modifying only small parts of the screen
at a time, so you can’t have animated tiles or hundreds of sprites on
the screen.

If you have an animating tile on the screen, you just keep track of it, and update
it when necessary. And even in StarCraft you rarely ever have 100 sprites on the
screen, and when you do, it’s slow! Even games like tetris have an inconsistant
frame rate.

If you have a single animating tile, you can treat it like a sprite. If
you want your tiles to be animated, with grass rustling in the wind, or
machinery in the background …

Starcraft slows down when there are lots of units, partly because of the
extra AI it must do, but it is logical that it would slow down somewhat
when there are a great many sprites onscreen. If it covered the entire
screen with sprites, it would have to draw twice as many pixels (or some
similar overhead); however, if it used and needed a dirty rectangle
strategy, it would have to draw dozens of times the normal number of
pixels, and it would go from >30 FPS to <5 FPS.

BTW, I’ve seen a lot of lousy implementations of Tetris. If one has an
inconsistent frame rate, it belongs firmly in that category.

BTW, you don’t actually think that you’re really running at 150 fps, do
you? If nothing else, your monitor can’t keep up with that.

See? Very patronizing…

I don’t care if I seem patronizing. If you’re redrawing the screen
multiple times between refreshes, that’s a flaw in your program, and a
flawed benchmark. Redraw at most once per refresh and report CPU idle
time if you want a real benchmark beyond “fast enough”.

Listen, take a look at my code AND THEN tell me what’s wrong with it. I can tell
you one thing wrong, it redraws the screen everytime you move out of a 640x480
screen, there’s a more efficient way to do that, and I know what that is, and I’ll
implement it later. I truly do wish that you were right, I want to be able to
redraw the whole screen every time. It makes things MUCH easier. But the only way
I’ve seen to attain this is to assume the video card has (good) 2D acceleration,
and I have no intention of that. I would like this to run reasonably fast on a
Pentium 133.

All right, I’ll tell you what’s wrong with it right now: you’re using
SDL_BlitSurface to draw your tiles; this is totally inappropriate
(workable for strategy games, perhaps, but not for action games). If
you think your old engine was bad, try switching it to much smaller
tiles. Locking and writing directly to the buffer would be a good
start; I’d assumed you’d have taken that most basic step already.
Regardless, this fits very nicely with my evaluation of having a
trivially easy slow method for painting your tiles, and a trivally easy
fast one for copying big blocks.

Like I also said, you’re not going to take too much of a performance hit
moving to a Pentium 133. I’d be surprised if even your old, slow engine
went slower than an entirely playable 20 FPS on most P133 systems; an
optimal full-redraw page-flipping system could possibly be limited by
the monitor’s refresh rate.

Cheers,

Darrell Johnson

Yes, like I said, your original unoptimized engine was too slow. You
added an optimization and now it goes faster. This doesn’t imply
anything else; certainly nothing about the inherent superiority of dirty
rectangle methods.

The way my system works, if everything on the screen needs to be updated, the slowest
it will go is as if I was redrawing the whole screen every time. Now how is that
inefficient?

If you have a single animating tile, you can treat it like a sprite. If
you want your tiles to be animated, with grass rustling in the wind, or
machinery in the background …

Then you keep track of them and update them when necessary. It’s pretty simple.

Starcraft slows down when there are lots of units, partly because of the
extra AI it must do, but it is logical that it would slow down somewhat
when there are a great many sprites onscreen. If it covered the entire
screen with sprites, it would have to draw twice as many pixels (or some
similar overhead); however, if it used and needed a dirty rectangle
strategy, it would have to draw dozens of times the normal number of
pixels, and it would go from >30 FPS to <5 FPS.

Not if they use a system like mine.

BTW, I’ve seen a lot of lousy implementations of Tetris. If one has an
inconsistent frame rate, it belongs firmly in that category.

I’m sure those implementations redraw the whole screen every time too.

I don’t care if I seem patronizing. If you’re redrawing the screen
multiple times between refreshes, that’s a flaw in your program, and a
flawed benchmark. Redraw at most once per refresh and report CPU idle
time if you want a real benchmark beyond “fast enough”.

If a program runs fast enough to update the “screen” more than 18.2 times a second,
then it’s “fast enough.” Even quake doesn’t wait for retrace.

All right, I’ll tell you what’s wrong with it right now: you’re using
SDL_BlitSurface to draw your tiles; this is totally inappropriate
(workable for strategy games, perhaps, but not for action games). If
you think your old engine was bad, try switching it to much smaller
tiles. Locking and writing directly to the buffer would be a good
start; I’d assumed you’d have taken that most basic step already.
Regardless, this fits very nicely with my evaluation of having a
trivially easy slow method for painting your tiles, and a trivally easy
fast one for copying big blocks.

Not use SDL_BlitSurface? What portable way do you have in mind?

Like I also said, you’re not going to take too much of a performance hit
moving to a Pentium 133. I’d be surprised if even your old, slow engine
went slower than an entirely playable 20 FPS on most P133 systems; an
optimal full-redraw page-flipping system could possibly be limited by
the monitor’s refresh rate.

If you consider 20 fps an “entirely playable” framerate, you really do come from
programming databases. Even Arjan Brusee (made some jazz jackrabbit 2 game) says
redrawing the whole screen every frame is too slow. But I’m sure you’ve made plenty of
blockbuster platform games too.

-= aaron p. matthews

Darrell Johnson wrote:

[snip, a whole bunch of stuff about inline asembly and linear memory access]

Just a quick comment about using assembly, I’m working on a x86 CPU, If I use
assembly then someone on a PPC CPU can’t use my code. The portability
is a huge part of the reason I’ve chosen SDL/C to code in.

As for the fellow who was talking about using this type of optimization
in a Starcraft-type game, think for a second: how much of the screen do
you think you could typically reuse in Starcraft? Maybe a lot, in the
beginning phase, but when you get to the exciting major battles where
you want a high frame rate, the screen is filled with sprites and you
can’t reuse any of it. An inconsistent frame rate is worse than a just
plain low frame rate; there’s nothing worse than thinking your machine
is fast enough to run a game at a certain resolution, then having it all
fall apart just when it gets interesting.

I didn’t say the a game should run at the fastest possible frame rate, you
put the screen refresh on a timer so it happens at 18(ish)fps on any speed
computer and use the other cpu cycles to do the business of running the game.
And you plan for a slow computer.

As for maintaining a reusable
background layer, you are talking about a whole extra full-screen blit
per frame! This is slower than an efficient full regeneration,

full-screen blit? (OK, I want to make sure we are on the same page here, I use the
term blit to refer to puting data on the actuall video surface,
not into a buffer.)
You seem to assume that I’d copy the sprites directly to the screen(or to a HW buffer).
I don’t, it’s too many RAM-Video memory accesses. You stated (I believe) quite correctly that
video->video copying if faster than RAM-Video. Well RAM->RAM is faster still. Build you
entire display in RAM and do a single loop of memcpy’s to put in into video memory.

btw, the groundlayer->buffer move is:

memcpy(backBuffer,groundBuffer,_bufferSize);

I don’t think you can write a full screen regernation that is faster than that.

if you
are using tiles (if you are using some sort of voxel engine or more
complex 3d system which isn’t well suited to hardware acceleration, you
just might find this kind of strategy worthwhile; but I was never
talking about that stuff).

Nope, isomentric tiles is what I’m talking about, at least were on the same
page. I would have hated to think this debate came to nothing because we
where talking about different things.

		-fjr-- 

Frank J. Ramsay, Software Engineer - Epsilon Data Management
fjr at epsilon.com
framsay at epsilon.com

Genetic Engineering: (noun) Proactive Evolution.

rival games wrote:

If you consider 20 fps an “entirely playable” framerate, you really do come from
programming databases.

It certainly makes more sense saying 20 fps is playable framerate than declaring engine
to be able to run at anything >30 fps, without mentioning that >30 framerates are
unnoticable by human eye (like most game companies do these days…). In fact even 15fps
is good enough if it is consistent framerate. Most videos are encoded at 15fps and are
smooth enough.

Vasek

It certainly makes more sense saying 20 fps is playable
framerate than declaring engine
to be able to run at anything >30 fps, without mentioning
that >30 framerates are
unnoticable by human eye (like most game companies do these
days…). In fact even 15fps
is good enough if it is consistent framerate. Most videos
are encoded at 15fps and are
smooth enough.
Vasek

Well, Movies run at 24 frames per second or so and you can’t
even tell the difference
between them and real life, so I would say that 20 is
entirely playabe. I guess I come from
programming databases then huh. Well I guess George Lucas,
Steven Speilberg, James Cameron
and the like all come from database work too.

As much as I didn’t want to get drawn into this debate :slight_smile: I have to say I
think this is wrong. Games running at the full refresh rate of the display
generally do look a lot smoother. The old
"but-the-human-eye-only-needs-12-frames-per-second" argument just isn’t
true.

Ideally, you want to have the display redraw rate decoupled from your game
logic. The game logic should try and run at constant speed on any machine,
but the display should increase its rate to go as fast as the processor
power will allow.
Of course you set a limited at the video hardware refresh rate - no point
drawing frames the user will never see.

We had the NVidia games-company-liason guy for the UK visit us a while back,
and he recommended that you should be aiming at high-end of 100Hz (!) as
that is the sort of refresh rate the new generation of video cards (eg TNT3)
support. Personally, I think that 100Hz is somewhat overkill (although he
claimed he was running his desktop at the rate), but on the other hand I
think that limiting your games to 24 fps is selling yourself way short.

Sorry - nothing about SDL in this post :frowning:

Ben.–
Ben Campbell (Antipodean Straggler)
Programmer, CyberLife Technology Ltd
ben.campbell at cyberlife.co.uk

Frank Ramsay wrote:

Darrell Johnson wrote:

[snip, a whole bunch of stuff about inline asembly and linear memory access]

Just a quick comment about using assembly, I’m working on a x86 CPU, If I use
assembly then someone on a PPC CPU can’t use my code. The portability
is a huge part of the reason I’ve chosen SDL/C to code in.

A few little snippets can make a big difference. Certainly you should
write a portable C version first, but if you optimize one platform at a
time with maybe a couple hundred lines of assembly, you can get big
performance boosts for a small price.

As for the fellow who was talking about using this type of optimization
in a Starcraft-type game, think for a second: how much of the screen do
you think you could typically reuse in Starcraft? Maybe a lot, in the
beginning phase, but when you get to the exciting major battles where
you want a high frame rate, the screen is filled with sprites and you
can’t reuse any of it. An inconsistent frame rate is worse than a just
plain low frame rate; there’s nothing worse than thinking your machine
is fast enough to run a game at a certain resolution, then having it all
fall apart just when it gets interesting.

I didn’t say the a game should run at the fastest possible frame rate, you
put the screen refresh on a timer so it happens at 18(ish)fps on any speed
computer and use the other cpu cycles to do the business of running the game.
And you plan for a slow computer.

Please bear in mind that I wasn’t just responding to you. I believe
that only 3d games look better in higher framerates, where they generate
sort of a psuedo-motion-blur effect as your eye superimposes multiple
frames. If 24 FPS is fast enough for a movie, it’s fast enough for a
realtime strategy game (I think 18 FPS might be a bit slow). 2d games
use pre-drawn animation, which is never animated at 60 FPS, so generally
at high frame rates you’re just drawing the same screen multiple times.

Nonetheless, wouldn’t it be annoying to find that your computer couldn’t
keep up to that 18(ish) fps when the screen filled up, even though it
worked okay before that? Dirty rectangle strategies can give you kind
of a false sense of security, setting you up for a rude shock when you
see the performance you end up with from scenes where most of the screen
needs updating. It’s best to make your updating as fast as possible,
then consider these kinds of strategies. In some cases (especially
older computers), you will get a performance gain from dirty rectangles,
but the tile and sprite drawing are more fundamental operations, and
they should be optimized first, lest your higher-level optimizations
hide their crustitude.

As for maintaining a reusable
background layer, you are talking about a whole extra full-screen blit
per frame! This is slower than an efficient full regeneration,

full-screen blit? (OK, I want to make sure we are on the same page here, I use the
term blit to refer to puting data on the actuall video surface,
not into a buffer.)
You seem to assume that I’d copy the sprites directly to the screen(or to a HW buffer).
I don’t, it’s too many RAM-Video memory accesses. You stated (I believe) quite correctly that
video->video copying if faster than RAM-Video. Well RAM->RAM is faster still. Build you
entire display in RAM and do a single loop of memcpy’s to put in into video memory.

Blit is a phonetic spelling of of BLT, short for BLock Transfer. A
memcpy, in C terms. I did not assume anything about whether you were
using system or video memory; BTW, I’m not entirely sure that RAM->RAM
is still significantly faster than RAM->Video, and there are many cases
in which Video->Video is not any faster than RAM->Video.

The problem with block transfers is that with modern systems you can
perform several cycles of computation between each memory access.

A memcpy (on 80x86) basically compiles to a “REP STOSD” plus setup code,
which blits as fast as is possible. The problem with this is that it
wastes all those extra cycles you could use. While it is as fast as
possible, an assembly-coded copying loop is just as fast, with cycles to
spare (whether C loops can keep up is entirely up to the quality of the
optimizing compiler). This wasn’t true with older computers, on which
the “REP STOSx” worked faster.

You can use those extra cycles to intelligently switch between copying
sources and theoretically draw the screen as quickly as you can copy it
(though this will, as I said, cause some cache misses, the problem
shouldn’t be too severe, especially if you don’t have a great many
different tiles).

btw, the groundlayer->buffer move is:

memcpy(backBuffer,groundBuffer,_bufferSize);

Wouldn’t it more typically be two of these, adding up to the
groundbuffer size? (don’t get me wrong, there wouldn’t be any
significant performance difference, the setup overhead is light, I’m
just nitpicking) Or do you not let the groundbuffer get split?

I don’t think you can write a full screen regernation that is faster than that.

Certainly not faster, but nearly as fast (the difference being cache
misses from switching from one tile to another). Where the full
regeneration could be faster is that you can combine the sprite drawing
step with it. I’m assuming the backbuffer is a system RAM
double-buffer, which is then blitted into the video ram during the
retrace. If you’re not writing a bit here and a bit there, you can
efficiently write to a true page-flipper and drop the double-buffer blit
(if it’s available, which it should usually be).

The really great thing about a linear memory access renderer is that in
many cases you’ll be fast enough to do your rendering directly onto the
front buffer during the retrace(heresy! another case where fast
computers change the rules); of course, you have to test to be sure,
unless you want flickering. After all, if there’s more than enough
bandwidth to do a back-buffer blit to the screen each retrace and CPU
cycles to spare, why not render on the fly?

I have to admit, though, that I was thinking of something a little
different. There is not always an extra blit added by your method, but
it still holds true in some cases.

if you
are using tiles (if you are using some sort of voxel engine or more
complex 3d system which isn’t well suited to hardware acceleration, you
just might find this kind of strategy worthwhile; but I was never
talking about that stuff).

Nope, isomentric tiles is what I’m talking about, at least were on the same
page. I would have hated to think this debate came to nothing because we
where talking about different things.

A custom linear renderer is especially beneficial for non-square tiles.
Remember that writing one or two bytes is as slow as writing four
word-aligned bytes (on a 32 bit computer), so if you aren’t writing
pixels in aligned words, you’re slowing your writes down to a half or
quarter speed.

I didn’t originally set out to prove the utter uselessness of background
caching and dirty rectangle methods, and I still don’t mean to.
However, I do want to make it clear that they do not /necessarily/ mean
a performance gain, and they severely limit the capabilities of the
engine (remember that with full-redraw, the background could just as
easily be made up entirely of animated tiles, and would take a minimal
performance hit from having huge numbers of sprites). Often it’s easier
to succumb to the temptation to write a clever and interesting
high-level optimization than it is to grind through the relative tedium
of cutting the fat away from basic operations. The very fastest engines
use both, but you can guess which one I would say comes first.

Linearizing your renderer is an unsexy process that involves computer
science fundamentals and hard concentration, not just slapping together
a few clever recipes. You have to sort your sprites efficiently and
maintain an ordered list that is accurate as you move through each
scanline, keep track of which tile you’re in and which sprites you’re
over, all with the utmost concern for efficiency to keep the overhead
down to a few extra cycles per pixel. It’s pretty trivial compared to
efficient 3d, of course, but much more than I see most doing. There are
endless thousands of library jockeys and cut-and-pasters who can make a
game run by gluing this bit to that, but very few true optimizers who
can get their algorithms and data structures right and code them
properly to squeeze the best possible performance out of their target
platforms. It’s a matter of practice; too many programmers are
impressed with their own ability to get things to work, and never
develop their ability to make really efficient code. They won’t develop
this ability with quick fixes and clever tricks.

It’s kind of depressing to see how many 2d games are as slow or slower
than 3d ones, even when the 2d engine is relatively low-res and very
simple. The reason is clear: most good graphics optimizers work in 3d,
and most of the professional 2d graphics programmers aren’t very good,
still working from the outdated notes the good ones made before they
moved to 3d.

Of course, full linear rendering locks you out of using such strategies
as dirty rectangles, and I wouldn’t really call it a totally fundamental
improvement, but optimizing your tile and sprite drawing even as
isolated functions should be tried before higher-level optimizations
(geez, I’m starting to sound like a broken record :slight_smile: ).

Cheers,

Darrell Johnson

P.S. I know, I said I’d drop the subject, but it looks like someone here
is interested in what I have to say and it would be rude to not reply.

Darrell Johnson wrote:
[snip]

btw, the groundlayer->buffer move is:

memcpy(backBuffer,groundBuffer,_bufferSize);

Wouldn’t it more typically be two of these, adding up to the
groundbuffer size? (don’t get me wrong, there wouldn’t be any
significant performance difference, the setup overhead is light, I’m
just nitpicking) Or do you not let the groundbuffer get split?

I don’t split the ground buffer, and I update it when needed. When it is
updated I set a flag telling the system to foget the dirty rectangles and
do a full window refresh (refering to the refresh buffer->physical display)
Yes, this is slow for animated tiles, but I would use palette rotation to
solve that.

Certainly not faster, but nearly as fast (the difference being cache
misses from switching from one tile to another). Where the full
regeneration could be faster is that you can combine the sprite drawing
step with it. I’m assuming the backbuffer is a system RAM
double-buffer, which is then blitted into the video ram during the
retrace. If you’re not writing a bit here and a bit there, you can
efficiently write to a true page-flipper and drop the double-buffer blit
(if it’s available, which it should usually be).
Yes, the buffer is in system RAM and I write the buffer to the SDL_Surface
then call the updaterec(? sorry don’t have the docs in front of me) function
(which I believe waits for the next refresh to update the display)

The really great thing about a linear memory access renderer is that in
many cases you’ll be fast enough to do your rendering directly onto the
front buffer during the retrace(heresy! another case where fast
computers change the rules); of course, you have to test to be sure,
unless you want flickering. After all, if there’s more than enough
bandwidth to do a back-buffer blit to the screen each retrace and CPU
cycles to spare, why not render on the fly?
[snip]
A custom linear renderer is especially beneficial for non-square tiles.
Remember that writing one or two bytes is as slow as writing four
word-aligned bytes (on a 32 bit computer), so if you aren’t writing
pixels in aligned words, you’re slowing your writes down to a half or
quarter speed.

Ok, now were getting down to the important things, how to render your image
(to the buffer) fastest. I’ll admit I’m not familure (at least not by the
name you use) with Linear rendering. So I really can’t comment on the
relitive merits it. But I am curious how it would deal with almost every
scaline of every tile having transparency information.

		-fjr--

Frank J. Ramsay, Software Engineer - Epsilon Data Management
fjr at epsilon.com
framsay at epsilon.com

Genetic Engineering: (noun) Proactive Evolution.

Well, Movies run at 24 frames per second or so and you can’t even tell the difference
between them and real life, so I would say that 20 is entirely playabe. I guess I come from
programming databases then huh. Well I guess George Lucas, Steven Speilberg, James Cameron
and the like all come from database work too.

See if you can get Quake II (or any game0 to run at 24 fps, then tell me how playable it is. I
think you’ll find aiming very difficult.

-= aaron p. matthews

Frank Ramsay wrote:

Darrell Johnson wrote:
[snip]

btw, the groundlayer->buffer move is:

memcpy(backBuffer,groundBuffer,_bufferSize);

Wouldn’t it more typically be two of these, adding up to the
groundbuffer size? (don’t get me wrong, there wouldn’t be any
significant performance difference, the setup overhead is light, I’m
just nitpicking) Or do you not let the groundbuffer get split?

I don’t split the ground buffer, and I update it when needed. When it is
updated I set a flag telling the system to foget the dirty rectangles and
do a full window refresh (refering to the refresh buffer->physical display)
Yes, this is slow for animated tiles, but I would use palette rotation to
solve that.

I was thinking that you would have a single buffer the same size as the
screen, and change the upperleft corner pointer as you scrolled, then
treat it like two partial buffers, one from the upperleft corner pointer
to the end, the other from the start to the upperleft pointer. This way
you could scroll forever without ever doing a full redraw; of course
you’d have to be careful around that edge…

The really great thing about a linear memory access renderer is that in
many cases you’ll be fast enough to do your rendering directly onto the
front buffer during the retrace(heresy! another case where fast
computers change the rules); of course, you have to test to be sure,
unless you want flickering. After all, if there’s more than enough
bandwidth to do a back-buffer blit to the screen each retrace and CPU
cycles to spare, why not render on the fly?
[snip]
A custom linear renderer is especially beneficial for non-square tiles.
Remember that writing one or two bytes is as slow as writing four
word-aligned bytes (on a 32 bit computer), so if you aren’t writing
pixels in aligned words, you’re slowing your writes down to a half or
quarter speed.

Ok, now were getting down to the important things, how to render your image
(to the buffer) fastest. I’ll admit I’m not familure (at least not by the
name you use) with Linear rendering. So I really can’t comment on the
relitive merits it. But I am curious how it would deal with almost every
scaline of every tile having transparency information.

I made up the term linear rendering on the spot (after calling it
several other things) simply to describe drawing the entire buffer in a
sequential fashion. You start at buffer[0] and go right along, one word
(1,2, or 4 pixels, as the case may be) at a time through the whole
buffer. This takes a little preparation, so source pixels are at hand
when you reach their destination. Like I said, it takes a little
applied computer science to be organized, rather than just drawing
sprites in random order.

It’s dead simple for just drawing tiles. You draw along one row of the
first tile until you hit the end of it; move to the second tile and
continue. It would be worthwhile to just try this one out at first.
Even in C, if you don’t cross word boundaries I bet it’ll be nearly as
quick as a buffer blit. If you don’t have a lot of sprites, you might
just draw them on afterwards.

It gets a little more complicated when you include sprites. Basically,
you either treat sprites with transparencies as a collection of spans of
pixels or just use the pixel transparency test. It’s really not all
that bad, if you’re in a span you write the sprite pixels instead of
tile pixels (this is what I meant about keeping an ordered list of
sprites; you need to know when the next span starts, so you can exit the
loop to switch to drawing the next).

If you’re doing straight spans, you organize your data so you can
quickly calculate (say, with a memory read and an addition or two) the
distance from the end of one span to the start of the next and the
length of a span. You just set your inner loop iterator to the span
length, your pointer to the data in the span, and loop away. Bear in
mind that you have to pack everything in doublewords before writing (if
you want it to be efficient), so you have to do a little loop-unrolling
reminiscent dance around the edges of spans.

There are lots of different ways you can do it, and lots of details to
consider. It’s a good day’s coding to do an optimized assembly
version. You’d have to decide what’s best for your engine and make it
work.

Cheers,

Darrell Johnson

Vaclav Slavik wrote:

rival games wrote:

If you consider 20 fps an “entirely playable” framerate, you really do come from
programming databases.

It certainly makes more sense saying 20 fps is playable framerate than declaring engine
to be able to run at anything >30 fps, without mentioning that >30 framerates are
unnoticable by human eye (like most game companies do these days…). In fact even 15fps
is good enough if it is consistent framerate. Most videos are encoded at 15fps and are
smooth enough.
Vasek

Well, Movies run at 24 frames per second or so and you can’t even tell the difference
between them and real life, so I would say that 20 is entirely playabe. I guess I come from
programming databases then huh. Well I guess George Lucas, Steven Speilberg, James Cameron
and the like all come from database work too.

I read somewhere that movies can run at a low framerate because a video
camera captures the blurring effect when an object moves fast compared to
the camera speed. In a video game on the computer, the movement is discrete
and so you need to crank up the framerate to compensate for the lack of
blurring.> ----- Original Message -----

From: slavik2@czn.cz (Vaclav Slavik)
To:
Sent: Friday, August 20, 1999 12:51 AM
Subject: Re: [SDL] sensible optimization [was re: tile based junk]

rival games wrote:

If you consider 20 fps an “entirely playable” framerate, you really do
come from

programming databases.

It certainly makes more sense saying 20 fps is playable framerate than
declaring engine
to be able to run at anything >30 fps, without mentioning that >30
framerates are
unnoticable by human eye (like most game companies do these days…). In
fact even 15fps
is good enough if it is consistent framerate. Most videos are encoded at
15fps and are
smooth enough.

Vasek

I read somewhere that movies can run at a low framerate because a video
camera captures the blurring effect when an object moves fast compared to
the camera speed. In a video game on the computer, the movement is discrete
and so you need to crank up the framerate to compensate for the lack of
blurring.

Movies can run at a low frame rate because the shutter on a physical movie
camera is open for a finite amount of time. This effectively amounts to
applying a box filter in time to your movie, causing motion blur. For
most computer animation, the “camera” has an effectively infinitesimal
shutter speed, and so the temporal aliasing effects are a lot more
pronounced and hence the animation becomes chunky. We can approximate the
motion blur effect by determining how each pixel changes with time in
between frames and convolving that function with a filter of some sort to
get a final pixel value. This process, or any one of several
approximations to it (postfiltering and stochastic sampling to name only a
few) are routinely done when generating high-quality computer animations,
but I’ve never heard of it being done in real time, because it is clearly
an expensive process.On Fri, 20 Aug 1999, Prasanth Kumar wrote:


| Rafael R. Sevilla @Rafael_R_Sevilla_94 |
| Instrumentation, Robotics, and Control Laboratory |

College of Engineering, University of the Philippines, Diliman

“Rafael R. Sevilla 94-22131” wrote:

I read somewhere that movies can run at a low framerate because a video
camera captures the blurring effect when an object moves fast compared to
the camera speed. In a video game on the computer, the movement is discrete
and so you need to crank up the framerate to compensate for the lack of
blurring.

[snip]

pronounced and hence the animation becomes chunky. We can approximate the
motion blur effect by determining how each pixel changes with time in
between frames and convolving that function with a filter of some sort to
get a final pixel value. This process, or any one of several
approximations to it (postfiltering and stochastic sampling to name only a
few) are routinely done when generating high-quality computer animations,
but I’ve never heard of it being done in real time, because it is clearly
an expensive process.

Ah, but soon we’ll have Playstation II capabilities (in PC’s as well as
consoles), where the 600FPS rendering speeds for complex scenes can be
used to generate motion blur brute force (and real-time water caustics,
etc).

In fact, some of the next generation of PC 3D boards coming out this
fall will have sub-pixel rendering capabilities that can be used for
this…

Which is bad news for the current crop of game developers (IMHO) - it’s
going to cost a lot of time and money to fill all that cinematic quality
with content that can take advantage of it. I can foresee game
development costs going from the $2-5 million now to $10-30 million in a
couple of years…> On Fri, 20 Aug 1999, Prasanth Kumar wrote:

Gary Scillian @Gary_Scillian
"There’s a seeker born every minute." - Firesign Theatre