Nicholas Vining wrote:
Having written 3D rasterizing engines from scratch for about four years, I
get to lecture for a while.
Thanks for the advice.
I’ve written a 3D graphics library for SDL that does software
rendering. Unfortunately, Z buffer/Gouraud shading performance is still
abominable (I get 7 fps on a good day with a K6-2/450 with a Riva TNT on
SDL 1.0.6 fullscreen with most scenes), and I’m trying to work on
rewriting the 3D renderer to use a scan line algorithm (when my life
outside of side project development doesn’t catch up with me) which may
be more efficient for the smaller number of polygons I’ve always
intended this to be rendering. It’s just that I’m lazy, and Z buffer is
much easier to program. If the demand is great enough, I can write a
Umm… what does Z-Buffer have to do with scan line rendering? Scan line
rendering of polygons is, by most definitions, where you calculate all
slopes per side and store data into an array as min and max X coordinates,
than bang through your code with horizontal lines. (Which tends to be bad,
because of cache misses, memory reads, difficulty for pipeline optimization,
etc.) I think you MIGHT be talking about BSP trees. BSP trees are only good
for indoors scenes, a la Quake, where you have el grande mucho overdraw.
Anything where you want to do LOD, BSP trees suck for, and it starts to
descend into the realm of kludge. Ditto for outside 3D engines.
Well, the scan line renderer I’m talking about is the algorithm
described in Foley and van Dam. We place all the polygons in a buffer,
then use a variant of the regular polygon drawing code to perform
rendering. Every scanline, we see which is the closest polygon we
enter. I thought of using BSP trees, but the fact that my robotic
models change constantly every frame made me rethink this; the primary
advantage of BSP trees is lost.
You might be talking about sorting… if you want a fairly quick and easy to
use sorting method, look at STL as your option. I think STL’s pretty damn
sweet; I started working with it a few days ago and so far am really amazed
at what sort of power you can get from it. And I don’t even like
class-based code! (Kenton Varda, if he still hangs around here, is going to
be nagging me about this admission for ages)
Well, I haven’t got anything against class-based code. I do Java
sometimes. I don’t like C++. IMAO, it’s a much too large language and
I don’t have the money to purchase another book that describes how to
use it (probably at least eight times thicker than K&R by now!).
7 FPS is lousy, especially from a beast of a machine like that. A few
I’m not sure where the trouble lies, actually. It’s not easy to see
just how much overhead falls on the X and SDL interface and what part on
my code, even with a profiler like gprof. But oddly enough, I’ve tested
the same code on a Pentium-133
with a S3Trio64V+ and got a frame rate not much lower than my K6-2 (I
think around 5 fps).
- Use ASM. Trust me, ASM is good for things like this, and damn portability
to hell. Make sure you use the FPU, and that you always get FXCH in the
V-Pipe, so you don’t need to worry so much about this stack shit.
I can’t do this, because part of my purpose for actually doing all this
work was to write a more or less portable software renderer.
- If you’re not going to use ASM, make sure you keep everything as floats.
Compilers are stupid, and don’t realise that you should store EVERYTHING as
floats on Pentium systems because the speed of MUL to FMUL is something like
32+cycles for MUL, a guaranteed 3 for FMUL. Also, nowadays fixed point is
the work of the devil.
I use doubles for everything but the Z buffer (which is floats). I
tried using unsigned long ints for that and realized that the amount of
time my machine spends converting my floating point into integers is
excessive. I also designed my code to be easily convertible from
doubles to floats, and found that using floats for everything caused a
significant performance hit.
- What is the size of your viewport? 3D code speed decreases as view port
size increases, and it’s probably an exponential relationship, although I’ll
have to run that by the gurus in the math department.
If it’s 1600x1280, well, 7 FPS is looking not that bad. 320x200, you have
cause to worry.
The viewport is not so monstrous. It’s 800x600, and I think that it’s
still a pretty bad thing. But if what you’re saying is true, then why
doesn’t my performance increase significantly when moving down to
640x480? I get 9 fps there.
- Precalculate all trig functions. This is one case where the speedup is
ridiculous. A simple SIN/COS table can slash cycles and cycles off of your
The 3D transformations are primarily the domain of the linking
application. My library doesn’t care about such things at this point.
It only uses trigonometric functions whenever a new view is requested by
the user (it uses quaternions to express rotations of the view vectors),
and as such would probably have minimal impact on my frame rate.
Watch out for cache misses. You’ll get quite a few of these, especially
if you have a z-buffer and a 24-bit/32-bit render target with a big screen.
There’s not a whole lot you can do about this, except try to avoid making
Always remember that the best optimizer is between your ears!
Absolutely! Again, thanks for the advice. Perhaps I’m still doing too
many unnecessary calculations…–
| Rafael R. Sevilla @Rafael_R_Sevilla |
| Instrumentation, Robotics, and Control Laboratory |
|College of Engineering, University of the Philippines, Diliman