Diablo2 possible with sdl, without opengl

hi,

let me introduce myself shortly. i am working in the
www.fifengine.de project, a general purpose 2d iso
crpg engine. but i have no experience in the
field of game programming.

i keep comparing fife to the performance diablo2 was able
to offer without utilizing 3d accelerating hardware.
we will use opengl, when hardware acceleration is
available, but the following is about using only sdl.

without any lighting i can get the current rendering engine
up to ~100fps with a resolution of 800x600. targeting 25fps
with lighting working gives me the creeps :frowning:

i am basically asking for assistance or advise,
my current design looks like this. it is not implemented yet…

  1. blit all static background to a buffer 0

  2. get a lightmap somehow, probably using gouraud shading
    some 3d representation of the world.

  3. apply the lightmap to all static background to buffer 1

  4. light critters ( anything that moves ) with ambient value
    and blit them on a second buffer

  5. blit parts of the background (from buffer 1 ) over the second
    buffer to preserve obstruction order.

  6. blit second buffer over buffer 1 using colorkeying and rle accel

  7. blit first buffer to screen. back to step 1.

the lower bound for step 1+2 would be one multiplication per pixel
but it is probably 3 or 5 times more expensive.

steps 3 to 5 are more difficult to estimate, but i’d say roughly one quarter
of the steps 1+2

it would be nice if steps 5+6 could be done on hw surfaces, as this is
probably the only part where we could utilize hw acceleration.

the 100 fps are possible, when rendering the buffer 0 once and blitting
it to screen; trying to convince sdl along the line to keep it in hw
surfaces.

now, am i missing something completely, or are my estimates inherently
flawed? anything else i should keep in mind?

cheers phoku aka klaus blindert

Is it 100 fps because it’s being sync’d to your refresh rate (that’s a
high refresh rate), or is that the true speed?

Measure the true speed by not syncing to any refresh rate and measure
the time between, say, 60 frames. My gut instinct is try it and find
out. There are ways of optimizing, such as using streaming
instructions SIMD. I assume that will allow you to do some of the
block multiplication you are looking for, that would certainly help
with ambient lighting.

Matthew

Matt J schrieb:

Is it 100 fps because it’s being sync’d to your refresh rate (that’s a
high refresh rate), or is that the true speed?

Measure the true speed by not syncing to any refresh rate and measure
the time between, say, 60 frames. My gut instinct is try it and find
out. There are ways of optimizing, such as using streaming
instructions SIMD. I assume that will allow you to do some of the
block multiplication you are looking for, that would certainly help
with ambient lighting.

Matthew

Thanks for replying!

I didn’t think about vsync, so I investigated the fps a bit:

Without a SDL_Flip() the engine reaches something like 230 fps,
but then the surface is seemingly only put into video memory
once, that is i only see the first frame, no matter where the
viewport is.

With SDL_UpdateRect(screensurface,0,0,0,0) again roughly
100 fps are reached. The dga driver doesn’t wait for a retrace
in this case, while the x11 driver does … ^^

So i assume that one whole screen blit costs roughly 1/100 s.
And 60 frames return roughly 600 milliseconds -> ~ 1/100 s

I conclude that one 800x600 screen blit on my system costs
roughly 1/100 s and that this is not retrace related, the driver
probably only marks the screen as ready for update in the next
vsync and returns … ( see SDL_Flip in the docu wiki )

So drawing has to be done with max 4 complete screen blits,
when i want to reach 25 fps in the end.
With my machine as a reference low-end system…

And finally SIMD instructions are probably the way to
go, though I’d really like the compiler to figure out
which ones and how exactly …

klaus

[…]

So drawing has to be done with max 4 complete screen blits,
when i want to reach 25 fps in the end.
With my machine as a reference low-end system…

You’re forgetting something here: Blitting to the screen (that is,
into VRAM) is much more expensive than working in system memory.
(And reading from VRAM is even slower, so don’t even think about
massive blending directly into VRAM.)

Unless you have full h/w acceleration for everything you want to do
(which is highly unlikely unless you’re using plain OpenGL for
rendering), your best bet for massive blending or other
read-modify-write rendering is to do it all in a software surface
(system memory) and then do one final blit from there to the screen.

And finally SIMD instructions are probably the way to
go, though I’d really like the compiler to figure out
which ones and how exactly …

That’s only possible to some extent, and only with a few of the latest
generation compilers. It seems to be getting there, though…

//David Olofson - Programmer, Composer, Open Source Advocate

.------- http://olofson.net - Games, SDL examples -------.
| http://zeespace.net - 2.5D rendering engine |
| http://audiality.org - Music/audio engine |
| http://eel.olofson.net - Real time scripting |
’-- http://www.reologica.se - Rheology instrumentation --'On Thursday 16 February 2006 18:32, kla wrote:

David Olofson schrieb:

[…]

You’re forgetting something here: Blitting to the screen (that is,
into VRAM) is much more expensive than working in system memory.
(And reading from VRAM is even slower, so don’t even think about
massive blending directly into VRAM.)

Unless you have full h/w acceleration for everything you want to do
(which is highly unlikely unless you’re using plain OpenGL for
rendering), your best bet for massive blending or other
read-modify-write rendering is to do it all in a software surface
(system memory) and then do one final blit from there to the screen.

Hm, we’ll do that anyway, as lighting will require accessing single
pixels.

But what about hardware accelerated blits? This is really a wild
guess, but aren’t hw to hw surface blits, the ones with the highest
probability of being actually hw accelerated? It’s frustrating
to know that there is harware acceleration available, but you
can’t reach down and use it – in a portable manner.

And finally SIMD instructions are probably the way to
go, though I’d really like the compiler to figure out
which ones and how exactly …

That’s only possible to some extent, and only with a few of the latest
generation compilers. It seems to be getting there, though…

//David Olofson - Programmer, Composer, Open Source Advocate

We’ll try the vector extension of gcc and see how this will
turn out.

klaus

[…]

But what about hardware accelerated blits? This is really a wild
guess, but aren’t hw to hw surface blits, the ones with the highest
probability of being actually hw accelerated?

Yes, but there are virtually no SDL backends that implement
accelerated alpha blending at this point - and the primary reason is
probably that many of the underlying APIs don’t support it. Many
support opaque blits though, and I think most of those also support
colorkeyed blits.

It’s frustrating
to know that there is harware acceleration available, but you
can’t reach down and use it – in a portable manner.

Yeah. :-/

Well, there is OpenGL, but unfortunately, that’s not the Holy Grail
either for various (rather “stupid”) reasons. (See the SDL 1.3
thread.)

//David Olofson - Programmer, Composer, Open Source Advocate

.------- http://olofson.net - Games, SDL examples -------.
| http://zeespace.net - 2.5D rendering engine |
| http://audiality.org - Music/audio engine |
| http://eel.olofson.net - Real time scripting |
’-- http://www.reologica.se - Rheology instrumentation --'On Thursday 16 February 2006 19:38, kla wrote:

Hi klaus,

You get 230 fps. You say that when you update the screen it reaches
100 fps. My first thought is it is now bus limited, there is only so
much throughput (bandwidth) you can push on the screen. Im assuming
we’re still talking about software mode.

But that’s fine here. You want to add steps to the rendering engine.
All these steps add time before you actually render. From what I
gather, your frame could take over 3 times as long to compute the
final pixels and still see the same frame rate. In other words, these
final numbers don’t mean much. In software mode, until you push the
pixels onto the frame buffer, it is still just playing in memory and
CPU. So the goal is to make it utilize the CPU effectively, then the
L1/L2 cache and then any memory outside of that.

Sounds like your on the right track. Remember Quake2 uses lightmaps
and they have cameras that can rotate anywhere. I’m sure you can take
advantage of your little precomputed visibility here, your
fixed-camera angle(s) to have prerendered, anti-aliased tiles and
achieve high frame rates.

So just do it :slight_smile:

Matthew> >Is it 100 fps because it’s being sync’d to your refresh rate (that’s a

high refresh rate), or is that the true speed?

Measure the true speed by not syncing to any refresh rate and measure
the time between, say, 60 frames. My gut instinct is try it and find
out. There are ways of optimizing, such as using streaming
instructions SIMD. I assume that will allow you to do some of the
block multiplication you are looking for, that would certainly help
with ambient lighting.

Matthew

Thanks for replying!

I didn’t think about vsync, so I investigated the fps a bit:

Without a SDL_Flip() the engine reaches something like 230 fps,
but then the surface is seemingly only put into video memory
once, that is i only see the first frame, no matter where the
viewport is.

With SDL_UpdateRect(screensurface,0,0,0,0) again roughly
100 fps are reached. The dga driver doesn’t wait for a retrace
in this case, while the x11 driver does … ^^

So i assume that one whole screen blit costs roughly 1/100 s.
And 60 frames return roughly 600 milliseconds -> ~ 1/100 s

I conclude that one 800x600 screen blit on my system costs
roughly 1/100 s and that this is not retrace related, the driver
probably only marks the screen as ready for update in the next
vsync and returns … ( see SDL_Flip in the docu wiki )

So drawing has to be done with max 4 complete screen blits,
when i want to reach 25 fps in the end.
With my machine as a reference low-end system…

And finally SIMD instructions are probably the way to
go, though I’d really like the compiler to figure out
which ones and how exactly …

klaus