Techniques for dirty-updates (converting from 2D SDL to OpenGL)

I’ve recently ported an SDL 2D emulator to use OpenGL. I have a texture
sized at 512x256, and use glTexSubImage2D to update the approx. 320x200
framebuffer at 60 fps.

When using SDL, I used SDL_FillRect() on dirty areas of the screen.
This resulted in a maximum CPU usage of 4-5%, which is quite
respectable (this is for the whole emulator, sound included).

Now when using OpenGL, the usage has risen to 12-13%. I’ve narrowed the
increase down to the glTexSubImage2D() command. While I still fill the
SDL_Surface using a dirty-update scheme, I send the whole thing with
glTexSubImage2D(). I’m wondering if it would be more efficient to have
multiple calls to glTexSubImage2D().

I’d appreciate advice, web sites, etc, on how to do efficient 2D
graphics in a 3D environment. Specifically, if I have an SDL surface
that uses dirty-updates and is filled with SDL_FillRect, what is the
fastest equivalent in OpenGL.

Thanks,
Steve

I’ve recently ported an SDL 2D emulator to use OpenGL. I have a
texture sized at 512x256, and use glTexSubImage2D to update the
approx. 320x200 framebuffer at 60 fps.

When using SDL, I used SDL_FillRect() on dirty areas of the screen.
This resulted in a maximum CPU usage of 4-5%, which is quite
respectable (this is for the whole emulator, sound included).

Now when using OpenGL, the usage has risen to 12-13%. I’ve
narrowed the increase down to the glTexSubImage2D() command. While
I still fill the SDL_Surface using a dirty-update scheme, I send
the whole thing with glTexSubImage2D(). I’m wondering if it
would be more efficient to have multiple calls to
glTexSubImage2D().

It should be, since the actual transfer is the expensive part, but I
wouldn’t bet on all drivers getting it right. Either way, try to keep
the number of rects reasonable (whatever that is), since there will
always be some overhead. Maybe provide a “simple update” option so
users can try that if they’re having performance issues.

I’d appreciate advice, web sites, etc, on how to do efficient 2D
graphics in a 3D environment. Specifically, if I have an SDL
surface that uses dirty-updates and is filled with SDL_FillRect,
what is the fastest equivalent in OpenGL.

That depends on how the rendering is done.

Machines like the Atari ST and some 8 bit machines that lack sprites
and (ab)usable text modes are problematic as the games usually do
most rendering in software. You may not even be able to use dirty
rects without significant overhead. (You’d have to trap all VRAM
accesses the emulated CPU does, or do some brute force test on the
frame buffer after each frame.)

If you’re emulating a character generator + sprite based display, try
dropping the software rendering and doing it all in OpenGL.
Characters and sprites would be textures and normal rendering would
be done using “pure” OpenGL. The textures would become procedural
only if the emulated game/application keeps messing with the graphics
data.

Note that hardware scrolling of “true” graphic planes can be
implemented in pretty much the same way as on the real hardware. Make
the textures correspond to the VRAM; not the display. That way, you
can have the 3D accelerator emulate the plane + sprite combining
hardware, instead of emulating the whole thing pixel by pixel in
software.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se —On Tuesday 18 November 2003 21.29, Stephen Anthony wrote:

Now when using OpenGL, the usage has risen to 12-13%. I’ve
narrowed the increase down to the glTexSubImage2D() command. While
I still fill the SDL_Surface using a dirty-update scheme, I send
the whole thing with glTexSubImage2D(). I’m wondering if it
would be more efficient to have multiple calls to
glTexSubImage2D().

It should be, since the actual transfer is the expensive part, but I
wouldn’t bet on all drivers getting it right. Either way, try to keep
the number of rects reasonable (whatever that is), since there will
always be some overhead. Maybe provide a “simple update” option so
users can try that if they’re having performance issues.

OK, that’s what I was wondering. The actual transfer is the problem, and
not necessarily the call to glTexSubImage. I was actually considering
the other part you mentioned, since there will be times that it would be
faster to simply send the whole thing instead of multiple rects.

I’d appreciate advice, web sites, etc, on how to do efficient 2D
graphics in a 3D environment. Specifically, if I have an SDL
surface that uses dirty-updates and is filled with SDL_FillRect,
what is the fastest equivalent in OpenGL.

That depends on how the rendering is done.

The emulation core gives me two 160x300 framebuffers which have been
filled with color indices. These color indices reference a color table,
and I’ve already pre-cached that CLUT to hold the color in exactly the
format that the surface requires (32-bit). The two framebuffer represent
a front and back buffer, which are flipped every frame. If I want to see
any changes, I analyze these two buffers and note where they differ.

Machines like the Atari ST and some 8 bit machines that lack sprites
and (ab)usable text modes are problematic as the games usually do
most rendering in software. You may not even be able to use dirty
rects without significant overhead. (You’d have to trap all VRAM
accesses the emulated CPU does, or do some brute force test on the
frame buffer after each frame.)

If you’re emulating a character generator + sprite based display, try
dropping the software rendering and doing it all in OpenGL.
Characters and sprites would be textures and normal rendering would
be done using “pure” OpenGL. The textures would become procedural
only if the emulated game/application keeps messing with the graphics
data.

Note that hardware scrolling of “true” graphic planes can be
implemented in pretty much the same way as on the real hardware. Make
the textures correspond to the VRAM; not the display. That way, you
can have the 3D accelerator emulate the plane + sprite combining
hardware, instead of emulating the whole thing pixel by pixel in
software.

I understand (some!) of what you’re saying above, but I think that’s at a
lower level than I can go. This is a multi-platform emulator, and I
can’t change the core too much. So basically, to reiterate my original
question, I only have access to a 2D framebuffer of pixel values. At
this level, there is no concept of a sprite or anything like that, its
just raw data. What would be the fastest way to send that data to
OpenGL?

I agree that a pure OpenGL solution at a lower level would rock, and
that’s something I may look into at some point. But it could never be
integrated back into the core code, since it would mean that OpenGL is
required
. This emulator has run on everything from a Zaurus to a
mainframe, and most of the targets have no OpenGL capability.

Thanks,
SteveOn November 21, 2003 07:51 am, David Olofson wrote:

[…]

Machines like the Atari ST and some 8 bit machines that lack
sprites and (ab)usable text modes are problematic as the games
usually do most rendering in software. You may not even be able
to use dirty rects without significant overhead. (You’d have to
trap all VRAM accesses the emulated CPU does, or do some brute
force test on the frame buffer after each frame.)

If you’re emulating a character generator + sprite based display,
try dropping the software rendering and doing it all in OpenGL.
Characters and sprites would be textures and normal rendering
would be done using “pure” OpenGL. The textures would become
procedural only if the emulated game/application keeps messing
with the graphics data.

Note that hardware scrolling of “true” graphic planes can be
implemented in pretty much the same way as on the real hardware.
Make the textures correspond to the VRAM; not the display. That
way, you can have the 3D accelerator emulate the plane + sprite
combining hardware, instead of emulating the whole thing pixel by
pixel in software.

I understand (some!) of what you’re saying above, but I think
that’s at a lower level than I can go.

Yeah, that goes for most emulators, AFAIK.

This is a multi-platform
emulator, and I can’t change the core too much. So basically, to
reiterate my original question, I only have access to a 2D
framebuffer of pixel values. At this level, there is no concept of
a sprite or anything like that, its just raw data. What would be
the fastest way to send that data to OpenGL?

You could try something like microtiles arrays. Use a fixed grid when
checking the emulator buffers, and just mark the tiles that need
updating. Then the hard part: analyze the result and merge contigous
areas inte a minimal set of rectangles.

Note that if it’s common to get complex patterns of small changes, it
might be better to update a bigger area than to have more rects. The
number of rects where overhead becomes bigger than the transfer cost
is highly hardware and driver dependent, of course, but I think it
would be possible to come up with tile size and/or "fuzzines factors"
that work ok on most hardware of interest.

I agree that a pure OpenGL solution at a lower level would rock,
and that’s something I may look into at some point. But it could
never be integrated back into the core code, since it would mean
that OpenGL is required. This emulator has run on everything
from a Zaurus to a mainframe, and most of the targets have no
OpenGL capability.

Actually, what I described doesn’t have to be implemented in OpenGL.
It would work just fine with any 2D API (such as the ones supported
by SDL), h/w accelerated or not. So, if you can get the core to talk
about surfaces, tiled maps (for character modes), sprites and stuff,
you could move the actual rendering into rendering backends. The s/w
rendering backend that is currently integrated into the core would
just be another backend, possibly used as an intermediate backend for
use with various direct rendering drivers. (Anything that’s better of
just blitting full screen, basically.)

The problem is “procedural surfaces”. If the emulator exposes internal
graphics data, weird pixel formats may cause major overhead when
running code that does a lot of s/w rendering into sprites,
characters or graphic display buffers. Conversion has to be done
somewhere, but if it’s done early, it can be hard to avoid a sprite
being changed and converted multiple times per frames and stuff like
that.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se —On Friday 21 November 2003 13.14, Stephen Anthony wrote:

Sorry, but I keep seeing this technique refered to but have no idea what a
’dirty update’ actually is.
Thanks,
Jason

It means drawing on the screen only what has changed since the last
frame,
therefore saving much time.

Lic. Gabriel Gambetta
ARTech - GeneXus Development Team
ggambett at artech.com.uy