Performance tricks

user_at_domain.inval · May 20, 2002, 10:36pm

Hi,

where can I find some stuffs about how to make my games more ‘speddy’ in
SDL?
I’m interested in video and input stuffs.
(like, converting from xx bpp to surface bpp will increase blitsurface
speed)

Another question: I see that SDL keyboard input is ‘message’ based, but
this I think is too slow in comparation with DirectInput (for example).
Any improvement in the future?

Thanks in advance.

icculus · May 20, 2002, 11:38pm

Another question: I see that SDL keyboard input is ‘message’ based, but
this I think is too slow in comparation with DirectInput (for example).
Any improvement in the future?

It’s fast enough, unless your on some really REALLY low end embedded
device, in which case you shouldn’t be coding to an abstraction layer.

Seriously. It’s good enough for first person shooters, it’s good enough
for anything. The real SDL bottlenecks are almost always video output.

–ryan.

Jonathan_Atkins · May 21, 2002, 12:34am

user at domain.invalid wrote:

Hi,

where can I find some stuffs about how to make my games more ‘speddy’ in
SDL?
I’m interested in video and input stuffs.
(like, converting from xx bpp to surface bpp will increase blitsurface
speed)

Another question: I see that SDL keyboard input is ‘message’ based, but
this I think is too slow in comparation with DirectInput (for example).
Any improvement in the future?

Thanks in advance.

you could always use SDL_GetKeyState and SDL_GetModState…–
-==-
Jon Atkins
http://jcatki.2y.net/

icculus · May 21, 2002, 12:56am

you could always use SDL_GetKeyState and SDL_GetModState…

…which goes through the same mechanism, and having to iterate over each
keystate is going to be slower on top of that.

I come from an era that used "short" instead of "int" when we could get away with it to save stack space, and had to be extremely nervous about "function call overhead", and even I don't think the event subsystem is slow.

In an age of dual CPUs and gigabytes of RAM, are we really going to bust
our balls over this interface? Most big commercial game engines I’ve seen
probably spend four times as long in unnecessary copy constructors as they
do handling input events, and no one considers them inefficient.

–ryan.

Joseph_Carter · May 21, 2002, 2:18am

I come from an era that used "short" instead of "int" when we could get away with it to save stack space, and had to be extremely nervous about "function call overhead", and even I don't think the event subsystem is slow.

hear hear!

In an age of dual CPUs and gigabytes of RAM, are we really going to bust
our balls over this interface? Most big commercial game engines I’ve seen
probably spend four times as long in unnecessary copy constructors as they
do handling input events, and no one considers them inefficient.

Indeed, we routinely do things which are slower than SDL’s event loop that
we know are inefficient because it’s that or have extremely ugly code. I
came from 1MHz machines, so I’ve learned to avoid floats at all costs and
use bytes for anything that does not absolutely need more…

My box has 512 megs, and even the average machine comes with at least 128
megs. My CPU is 800MHz and takes 22 cycles for a 64 bit sqrt instruction.
While the average person has a CPU about half that speed and which takes
far longer to do anything significant with even a 32 bit float (I feel
sorry for you people dumb enough to buy Intel these days!). the fact is
that we no longer live in a world where every byte is precious - even in
the game industry.

Actually, I should say especially in the game industry. (ooh, my turn
on the soapbox for the newbies on the list!) The old hacks to squeeze
every last drop of performance out of the machines which made games
essentially unreadable black voodoo are going the way of the dodo fast
because people don’t want to work with that kind of crappy code!

Because they tend to release their source code after a few years, a good
example of what I mean can be found in Id Software. Look at the code for
Wolf3D or Doom sometime. I have, and I can tell you it’s disgusting. On
the other hand, Quake 2 is fairly elegant code (for its time) and I happen
to know that Quake 3’s code could almost be called clean in many places,
especially the renderer. Doom 3 is expected to be better still. There’s
not one company today in the gaming industry or any other for that matter
where sloppy code leads to job security, so get that idea out of your head
right now.

My experience has led me to the following rules of game programming, which
often apply to many other types of programming as well:

Do what works best and makes for the most legible code. Everyone else
who ever has to read it will want to thank you. Failure to do this
will result in them wanting to shoot you instead.
Don’t try to out-optimize the optimizer! Back when Quake was written,
Michael Abrash carefully hand-tuned several essential parts of the
Quake renderer in i586 assembly. Parts of this were still used with
OpenGL - but the first thing I did with Project Twilight was remove
it for a speed gain! If your code’s not fast enough after you’ve
written it, then you can try to tweak things to be more optimized.
Don’t make spaghetti code. Spaghetti is no fun unless you can eat the
consiquences, after all. Divide your code by functional blocks and
keep those blocks seperate. We probably all understand that in a game
global variables are a near necessity, but not every variable needs to
be global and even if they do, they don’t all need to be exported to
other modules. Anything you don’t know needs to be public should be
static, and you should be reluctant to change that.
Don’t extern things in files. Quake did this in several places and
even in Project Twilight still does. This has proven to be nothing
but a headache for us. Find the right header to put a thing into. If
it doesn’t exist, create it! Don’t use extern for special case things
that shouldn’t be public (but aren’t static either) as is done in many
commercial games. It will bite you in the ass later on.
Classes aren’t everything! Yes, when you learned C++, you learned all
about classes, inheritance, and maybe even multiple inheritance.
These are all great things (with the possible exception of multiple
inheritance which is abused more than used), but you would be
surprised how little they benefit most parts of game programming. I
am not saying they don’t have their uses in games - our Neither engine
will use C++ specifically because we want classes and inheritance for
all the cool things they are made so perfectly for. Use the right
type of code for the right type of problem. There is absolutely
nothing wrong with procedural code in C++, even alongside classes,
methods, and inheritance.

Naturally these are all meta-ideas. While I could give examples of things
done both the right way and the wrong way in games (and in other types of
programs too), the best advice I can really give is to use your head and
not make things any more complicated than they’ve got to be. It’s too
easy with games to code yourself into a corner compared with other types
of programs. If you’re careful to avoid that, suddenly everything starts
to seem easier.

And then you discover linear algebra, the joy of vector and matrix math,
and lose what vestiges of your sanity remained. =) If you survive this,
you may make it into an industry job where you work too long for too
little pay. =p If you’re really lucky, you don’t wind up at the next
company which ends as Loki and countless other game companies have. Well,
okay, Loki ended a bit worse than most, but that’s even further off topic
and a touchy subject for many former employees, so …

Okay, newbie lesson over, Ryan can have his soapbox back now. =)On Tue, May 21, 2002 at 04:00:45AM -0400, Ryan C. Gordon wrote:

–
Joseph Carter Sooner or later, BOOM!

knghtbrd can already envision: “Subject: [INTENT TO PREPARE TO PROPOSE
FILING OF BUG REPORT] Typos in the policy document”

-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 273 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20020521/0033f990/attachment.pgp

David_Olofson · May 21, 2002, 5:35pm

Hi,

where can I find some stuffs about how to make my games more ‘speddy’
in SDL?
I’m interested in video and input stuffs.
(like, converting from xx bpp to surface bpp will increase blitsurface
speed)

Someone should dedicate a web site to this… heh

We’re discussing this sort of stuff on the list every now and then, and
indeed, there are some important SDL related things to keep in mind, but
the most important stuff isn’t really SDL related.

To get real speed, regardless of API, you need to figure out how your
particular problem is best solved from a hardware POV, and then figure
out how to do it with whatever API you decide to use.

Because of the complexity (and occasionally braindead design) of PC
hardware, there isn’t a single “best way” to do things. Many games will
appear to work best with two or more methods at once, which usually means
that you’ll end up with some sort of compromize.

For example, on most targets, opaque and colorkeyed blits are best done
with everything in VRAM, using h/w acceleration - but as soon as you
start doing software rendering, or alpha blending, VRAM is the worst
place to work in! Where the surfaces are best placed depends on the
balance between normal and alpha blits, and which platform you’re on. In
some cases, you might have to support two or three different methods of
rendering to get maximum performance on all targets.

Another question: I see that SDL keyboard input is ‘message’ based, but
this I think is too slow in comparation with DirectInput (for example).

“message” here does not mean “message, as implemented by Microsoft on top
of a kernel with serious task switching problems”.

We are talking about a thread safe message queue, but it has nothing to
do with whatever a “message” might be on whatever platform you’re using.
AFAIK, the SDL message queue is implemented on top of the fastest and
lowest level synchronization objects provided by each platform, and as
such, it should be pretty much as fast as it gets on each platform.

BTW, the new audio engine of Kobo Deluxe is using “messages” (events)
internally, all the way into the voice mixers. Sending an event means
that you grab a fixed size struct from a preallocated pool and add it to
a linked list. There are no system calls involved, and sending/receiving
these “messages” costs very little. In fact, switching to an “event based
design” has made it possible to make the engine a lot more flexible and
accurate, without adding significant overhead. Some things are actually a
lot more effecient this way, since a lot of “explicit context switches”
can be avoided.

In short, if a “message” based system is slow, it’s either because of a
bad implementation, or because of the synchronization constructs (needed
for thread safe implementations) of the OS are slow.

Any improvement in the future?

Any ideas…? Dropping the thread safe queue?

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | -------------------------------------> http://olofson.net -'On Tuesday 21 May 2002 09:31, user at domain.invalid wrote:

David_Olofson · May 21, 2002, 5:49pm

Well, provided you have a playable frame rate, an FPS player should be
quite happy if the input response time is in the range of the duration of
one frame.

That said, it seems like at least on Linux, SDL’s keyboard input is quite
a bit faster than that. Using it to trigger sound effects (played by an
engine with ~5 ms latency), the average response time seem to be on par
with that of professional MIDI gear. No perceptible latency, that is.
(This is on standard Linux kernels; not anything with real time patches
and stuff.)

My qualified guess would be that the average keyboard latency (say 99% of
keyboard events) is way below 10 ms - and I would believe that’s the case
on most platforms.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | -------------------------------------> http://olofson.net -'On Tuesday 21 May 2002 08:33, Ryan C. Gordon wrote:

Another question: I see that SDL keyboard input is ‘message’ based,
but this I think is too slow in comparation with DirectInput (for
example). Any improvement in the future?

It’s fast enough, unless your on some really REALLY low end embedded
device, in which case you shouldn’t be coding to an abstraction layer.

Seriously. It’s good enough for first person shooters, it’s good enough
for anything. The real SDL bottlenecks are almost always video output.

David_Olofson · May 21, 2002, 6:09pm

[…]

I come from an era that used "short" instead of "int" when we could get away with it to save stack space, and had to be extremely nervous about "function call overhead", and even I don't think the event subsystem is slow.
In an age of dual CPUs and gigabytes of RAM, are we really going to
bust our balls over this interface? Most big commercial game engines
I’ve seen probably spend four times as long in unnecessary copy
constructors as they do handling input events, and no one considers
them inefficient.

Exactly. In the days of the C64, you had to worry about things done more
than 10 times per frame or so. On the Amiga, it was some 50-100 times per
frame. These days, that figure is probably beyond 1000 for most things.

Even so, new and malloc() (which includes copy constructors) in the game
loop still makes me nervous…

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | -------------------------------------> http://olofson.net -'On Tuesday 21 May 2002 10:00, Ryan C. Gordon wrote:

Rainer_Deyke · May 21, 2002, 6:47pm

“David Olofson” <david.olofson at reologica.se> wrote in message
news:mailman.1022029748.1083.sdl at libsdl.org…

Even so, new and malloc() (which includes copy constructors)
in the game loop still makes me nervous…

How so? Copy constructors don’t necessarily allocate any new heap
memory.

struct C {
C();
C(C const&) { }
};

C c1;
C c2(c);

‘c2’ is copy constructed from ‘c1’. Memory for ‘c2’ is allocated on
the stack. The actual copy constructor call is inlined and reduced to
nothing.

And languages such as Python actually get away with allocating memory
from the heap when calculating 100 + 100.–
Rainer Deyke | root at rainerdeyke.com | http://rainerdeyke.com

David_Moffatt · May 21, 2002, 7:05pm

Even so, new and malloc() (which includes copy constructors) in the game
loop still makes me nervous…

Wouldn’t that also cause memory fragmentation? On my machine that is a real
performance killer._________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.

David_Olofson · May 21, 2002, 7:19pm

“David Olofson” <david.olofson at reologica.se> wrote in message
news:mailman.1022029748.1083.sdl at libsdl.org…

Even so, new and malloc() (which includes copy constructors)
in the game loop still makes me nervous…

How so? Copy constructors don’t necessarily allocate any new heap
memory.

Yeah, you’re right about that, of course.

[…]

And languages such as Python actually get away with allocating memory
from the heap when calculating 100 + 100.

Yeah. That’s why they’re useless for hard real time work, and unsuitable
for any kind of real time applications. It’s not a matter of speed (no
problem there), but a matter of determinism. Memory allocating can
cause gigantic latency peaks, even on real time operating systems.

Whether or not a game is a hard real time application is a matter of how
tolerant the players are. The more shouting and cursing glitches cause,
the harder the real time requirement.

More seriously, most operating systems used for gaming these days have
enough problems with timing as it is, without asking for more.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | -------------------------------------> http://olofson.net -'On Wednesday 22 May 2002 03:45, Rainer Deyke wrote:

David_Olofson · May 21, 2002, 7:39pm

Even so, new and malloc() (which includes copy constructors) in the
game loop still makes me nervous…

Wouldn’t that also cause memory fragmentation?

Provided you have a decent suballocator (usually part of the runtime lib)
and sane allocation patterns, it shouldn’t be much of a problem - but
unfortunately, it’s impossible to design a memory manager that works
great for all applications.

On my machine that is a real performance killer.

Well, if you get fragmentation, two things may happen;

1) The memory manager will burn more cycles merging
   adjacent memory blocks.

2) The brk limit will start running away. Although
   that (hopefully) means that there are unused
   pages "in the middle" that can be swapped out,
   this still means that you have to talk to the
   virtual memory manager - which is where you risk
   getting into *real* timing problems.

3) And of course, as soon as you touch the memory
   manager at all (fragmentation or not), you risk
   hitting brk, and consequently, all of the above.

In short, this is hairy stuff, and it behaves in very different ways on
different OSes, so the easiest and safest way to avoid trouble is to
simply stay away from it. Do what most memory hungry games do; allocate a
big pool of memory, and use your own memory manager.

Note that you’ll usually benefit from using a deterministic memory
manager, rather than a “fast” memory manager! “Fast” just means “fast
most of the time” - which implies that it may occasionally take several
times longer to run. Still, this is insignificant next to the impact of
messing with the virtual memory subsystem.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | -------------------------------------> http://olofson.net -'On Wednesday 22 May 2002 04:05, David Moffatt wrote: