Help to increase performance in SDL_ttf marquee program

It also doesn’t work at all if you have any other layers that scroll at
a different speed or don’t scroll at all. Which is almost always the
case with modern games.

Any solution that doesn’t redraw the entire screen every frame is too
limited to be of any practical use, IMO.On 4/2/2010 09:17, David Olofson wrote:

On Friday 02 April 2010, at 16.14.25, Andreas Schiffler wrote:
[…]

  • when you scroll, perform a screen-to-screen blit to move the existing
    (i.e. already-rendered) text; then draw only the fragment that is new;
    this should just be a vertical slice of one of the pre-rendered
    character surfaces

Be warned though, that this can be extremely slow unless you have hardware
acceleration, or “real” shared video memory.


Rainer Deyke - rainerd at eldwood.com

[…]

Any solution that doesn’t redraw the entire screen every frame is too
limited to be of any practical use, IMO.

I would agree about anything that basically animates most of the screen most
of the time. Blits from the screen are nearly always very slow, and there is
generally very little to gain from tricks like these.

Howerer, if you consider something like Fixed Rate Pig (still backgrounds + a
bunch of animated sprites, text etc), the difference between brute force
redraw and smart updates is on the order of several magnitudes. IMHO, it’s
just ridiculous to require multi GHz CPUs or OpenGL acceleration to get a game
like that to run at 60+ fps, when it can do thousands of fps on pretty much
any old PC that’s still in working order…

It even applies to a “nearly full screen scroller” like Kobo Deluxe, where
actually updating only the playfield improves the frame rate about 50%,
compared to repainting the whole screen. Area is a tricky thing… Even a
small “border” that you don’t have to update makes a big difference.

Oh, and then there are GUI toolkits… There is just no defending brute force
full screen refreshes there, I think, unless you’re doing some rapid
prototyping on very fast hardware, maybe.On Friday 02 April 2010, at 22.35.02, Rainer Deyke wrote:


//David Olofson - Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://olofson.net http://kobodeluxe.com http://audiality.org |
| http://eel.olofson.net http://zeespace.net http://reologica.se |
’---------------------------------------------------------------------’

Howerer, if you consider something like Fixed Rate Pig (still backgrounds + a
bunch of animated sprites, text etc), the difference between brute force
redraw and smart updates is on the order of several magnitudes.

If you have such a game, maybe. But the moment you decide to add a
scrolling animated alpha-blended fog layer to one of the levels, all of
that hard optimization work goes down the drain.

IMHO, it’s
just ridiculous to require multi GHz CPUs or OpenGL acceleration to get a game
like that to run at 60+ fps, when it can do thousands of fps on pretty much
any old PC that’s still in working order…

IMO, it’s just ridiculous that you would need a multi GHz CPU in order
to get 60+ fps when redrawing the whole screen each frame. I remember
getting that kind of frame rate on my old 486, and that was with
redrawing the screen every frame.

(As for OpenGL acceleration, pretty much any old PC that’s still in
working order has the necessary hardware. If only they all had working
drivers…)On 4/2/2010 17:56, David Olofson wrote:


Rainer Deyke - rainerd at eldwood.com

Howerer, if you consider something like Fixed Rate Pig (still backgrounds

  • a bunch of animated sprites, text etc), the difference between brute
    force redraw and smart updates is on the order of several magnitudes.

If you have such a game, maybe. But the moment you decide to add a
scrolling animated alpha-blended fog layer to one of the levels, all of
that hard optimization work goes down the drain.

Of course - but you have to keep the design within the restrictions of
whatever target platforms you decide on.

It’s not that different from the 8/16 bit days; “target platforms” are just
a lot more diffuse, and apparently most developers have since decided that 20
fps is a good playable frame rate. :smiley:

IMHO, it’s
just ridiculous to require multi GHz CPUs or OpenGL acceleration to get a
game like that to run at 60+ fps, when it can do thousands of fps on
pretty much any old PC that’s still in working order…

IMO, it’s just ridiculous that you would need a multi GHz CPU in order
to get 60+ fps when redrawing the whole screen each frame. I remember
getting that kind of frame rate on my old 486, and that was with
redrawing the screen every frame.

(As for OpenGL acceleration, pretty much any old PC that’s still in
working order has the necessary hardware. If only they all had working
drivers…)

Indeed. One would think that as unreliable and hairy the hardware acceleration
deal is, software rendering should be a sensible, viable option - but no. :-/

I guess it’s either adding support for that other 3D API, or focus on Mac OS
X. (Seems like there are more Macs out there than there are PCs with working
OpenGL.)

Anyway, way off topic here, and I believe this has been beaten to death a few
times already. :-)On Saturday 03 April 2010, at 05.36.09, Rainer Deyke wrote:

On 4/2/2010 17:56, David Olofson wrote:


//David Olofson - Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://olofson.net http://kobodeluxe.com http://audiality.org |
| http://eel.olofson.net http://zeespace.net http://reologica.se |
’---------------------------------------------------------------------’

[…]

int w=1280; // Width: of window
int h=300;  // Height of window and font

[…]

Just thinking, is that resolution really necessary? That’s an awful lot of
pixels to shuffle in software. (Not really actually, but PCs are just not
built for software rendering these days…)

If you use fullscreen mode, you should be able to switch to a lower
resolution, thus covering the area with fewer pixels. For a marquee with a
huge font, 640x480 (updating only 640x150) seems more than adequate - and that
would give you about four times the frame rate right away!On Wednesday 31 March 2010, at 21.11.26, Ricardo Leite wrote:


//David Olofson - Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://olofson.net http://kobodeluxe.com http://audiality.org |
| http://eel.olofson.net http://zeespace.net http://reologica.se |
’---------------------------------------------------------------------’

Fellows, in SDL fonts i found this:

/*

  • Set up a blit between two surfaces – split into three parts:
  • The upper part, SDL_UpperBlit(), performs clipping and rectangle
  • verification. The lower part is a pointer to a low level
  • accelerated blitting function.*
  • These parts are separated out and each used internally by this
  • library in the optimimum places. They are exported so that if
  • you know exactly what you are doing, you can optimize your code
  • by calling the one(s) you need.
    */
    int SDL_LowerBlit (SDL_Surface *src, SDL_Rect *srcrect,
    SDL_Surface *dst, SDL_Rect *dstrect)
    {
    (…)

IMHO: SDL_LowBlit() is fast but the parameters must be cleaner.

Another idea, my hardware is fixed (atom, i386 compatible). The
SDL_BlitSurface() function may be rewrite in assembly:

And i found this in
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html :

#define mov_blk(src, dest, numwords)
asm volatile (
“cld\n\t”
“rep\n\t”
“movsl”
:
: “S” (src), “D” (dest), “c” (numwords)
: “%ecx”, “%esi”, “%edi”
)

2010/4/3 David Olofson

On Wednesday 31 March 2010, at 21.11.26, Ricardo Leite <@Ricardo_Leite> wrote:
[…]

int w=1280; // Width: of window
int h=300;  // Height of window and font

[…]

Just thinking, is that resolution really necessary? That’s an awful lot of
pixels to shuffle in software. (Not really actually, but PCs are just not
built for software rendering these days…)

If you use fullscreen mode, you should be able to switch to a lower
resolution, thus covering the area with fewer pixels. For a marquee with a
huge font, 640x480 (updating only 640x150) seems more than adequate - and
that
would give you about four times the frame rate right away!


//David Olofson - Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://olofson.net http://kobodeluxe.com http://audiality.org |
| http://eel.olofson.net http://zeespace.net http://reologica.se |
’---------------------------------------------------------------------’


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Fellows, in SDL fonts i found this:

/*

  • Set up a blit between two surfaces – split into three parts:
  • The upper part, SDL_UpperBlit(), performs clipping and rectangle
  • verification. The lower part is a pointer to a low level
  • accelerated blitting function.
  • These parts are separated out and each used internally by this
  • library in the optimimum places. They are exported so that if
  • you know exactly what you are doing, you can optimize your code
  • by calling the one(s) you need.
    */
    int SDL_LowerBlit (SDL_Surface *src, SDL_Rect *srcrect,
    SDL_Surface *dst, SDL_Rect *dstrect)
    {
    (…)

IMHO: SDL_LowBlit() is fast but the parameters must be cleaner.

This is about eliminating the tiny overhead of clipping. It’s only relevant if
you’re doing thousands of blits per frame.

Another idea, my hardware is fixed (atom, i386 compatible). The
SDL_BlitSurface() function may be rewrite in assembly:

Memory block copying (scanlines of a surface blit) are trivial for the
compiler to optimize, so I doubt you’ll improve the bandwidth by using asm.

The real problem here is that PC hardware, since the days of Pentium or so, is
not designed for software rendering. The expansion slot bus (ISA, VLB, PCI,
AGP, etc…) forms a serious bottleneck between the CPU and the VRAM, as the
chipsets are really designed to use those busses for DMA.

Now, you might think that an integrated shared memory video solution would
eliminate this problem, making VRAM as fast as normal RAM, but no! It seems
like most of the time, the driver will point the CPU at the area where the
video chip maps “its” VRAM (which is the only way to access it “directly” with
non-integrated video card), rather than directly to the physical RAM. Thus,
you’ll have both the bottleneck of the bus, and the slow-down caused by the
video chip fighting the CPU for RAM access while forwarding those "VRAM"
accesses.

You may be able to hack the driver to get it to tell you where the “VRAM
window” is, but that memory might be banked or interleaved in strange ways
that you can’t see when accessing through the video chip… Again, this is one
of those things that might be worth checking out if you’re coding for a
specific device, but this will be even more non-portable than hardware
scrolling.

//David Olofson - Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://olofson.net http://kobodeluxe.com http://audiality.org |
| http://eel.olofson.net http://zeespace.net http://reologica.se |
’---------------------------------------------------------------------'On Monday 05 April 2010, at 16.12.41, Ricardo Leite wrote:

David Olofson wrote:

The real problem here is that PC hardware, since the days of Pentium or so, is
not designed for software rendering. The expansion slot bus (ISA, VLB, PCI,
AGP, etc…) forms a serious bottleneck between the CPU and the VRAM, as the
chipsets are really designed to use those busses for DMA.

Thanks your message confirms what I read in the topic Is SDL slow or is it my code that sucks? (http://www.pouet.net/topic.php?which=2625) (pou?t.net) :

PCI has a bandwidth of 100 MB/s. So in 1024x768x32 the max. framerate is 32. In 640x480x32 it is 81. This somewhat corresponds to your numbers.

I just wonder why AGP isn’t used. AGP 1x has 0.5 GB/s (if I’m right). That would give you 159 FPS in 1024x768. Don’t ask me why this isn’t the case, maybe the back buffer can’t be accessed through AGP? I have no idea.

I’d go for a hardware accellerated backend anyways, as it will give you stretching, alpha-blending, bilinear filtering and what not for free, plus a dynamic texture that (hopefully) can be accessed through AGP.

An other example : In 1280x1024x32 you blit 5242880 bytes (5 MB) so with such a bandwidth the maximum FPS you get is 20. It seems surreal these days because I thought the SDL_Surface & SDL_HWSURFACE combo would allow me to fast blit but when the hardware is limited then the only thing you get is a snail-fast acceleration.

Welcome to the world of tomorrow (Futurama) !

Note : Sorry for reviving this zombie topic but I thought my contribution would help other coders to gain some precious time.