Blit speed problem

I wrote very simple "paint’ application. It works slow in bigger windows,
becouse it must Blit background to main surface every frame. I saw other
drawing apps and they work much faster. Can you tell me what I am doing wrong?
I create background surface with SDL_HWSURFACE, so it should Blit it fast…

Source code is here:
http://jpopl.w.interia.pl/SimpleSDLPaint.tar.gz

Thanks in advance.–
Every time I climb the mountain
And it turned into a hill
I promised me that I’d move on
And I will. “Strange Highways” - Ronnie James Dio

I wrote very simple "paint’ application. It works slow in bigger
windows,
becouse it must Blit background to main surface every frame. I saw
other
drawing apps and they work much faster. Can you tell me what I am
doing wrong?

I haven’t checked your code, but I’d guess from the wording you’re
updating whole window every time. If you’re painting with a tool
(instead of filling whole image) you should update only that area of the
image that’s updated. Commercial image manipulation programs use xor
fills and whatnot to increase rendering speed.

  • Mikko

I wrote very simple "paint’ application. It works slow in bigger
windows, becouse it must Blit background to main surface every frame. I
saw other drawing apps and they work much faster. Can you tell me what
I am doing wrong?

You’re updating the whole screen, rather than just the changed areas.
Either you’re doing that explicitly (by blitting with SDL_BlitSurface())
or implicitly, by flipping a double buffered screen surface that actually
is a “fake” double buffered surface, with a software back buffer. (That’s
what you’ll get in windowed mode on all current targets, with the
possible exceptions of Mac OS X and some high end X servers.)

I create background surface with SDL_HWSURFACE, so it
should Blit it fast…

It’s not that simple, unfortunately. In fact (if you actually get a
hardware surface - do check that!), blitting will be slower in all
cases except when you’re blitting from other hardware surfaces and
you’re doing it on a target that accelerates h/w->h/w blits.

Source code is here:
http://jpopl.w.interia.pl/SimpleSDLPaint.tar.gz

I’ll have a look…

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Tuesday 06 November 2001 15:20, Jacek Pop?awski wrote:

Well… When I draw line, I must update rectangle around that line. So drawing
short lines will be fast, but long lines will slow down a lot.
But in Gimp it’s fast like hell!On Tue, Nov 06, 2001 at 05:29:07PM +0200, Mikko Rantalainen wrote:

I haven’t checked your code, but I’d guess from the wording you’re
updating whole window every time.


Don’t talk to strangers
’Cause they’re only there to make you sad
Don’t dream of women “Don’t Talk To Strangers”
'Cause they’ll only bring you down - Ronnie James Dio

Back again.On Tuesday 06 November 2001 22:56, David Olofson wrote:

On Tuesday 06 November 2001 15:20, Jacek Pop?awski wrote:

Source code is here:
http://jpopl.w.interia.pl/SimpleSDLPaint.tar.gz

I’ll have a look…

Yeah, you’re doing a few things “wrong” here:

* Your "back" surface is not guaranteed to have the same pixel
  format as the display surface. This means that wherever the
  surfaces are located, software blitting + conversion is
  required. Very few 2D APIs - if any - support accelerated
  on-the-fly conversion.

* You're working as if you were able to actually access the
  hardware video surface, while this is not possible in windowed
  mode, and not possible at all on some targets.

* Blitting *from* VRAM to a "back" surface - very, very slow
  in all cases, except when you actually get 1) a real VRAM
  surface *and* 2) accelerated VRAM->VRAM blits. You won't get
  either on most target, and you certainly won't get acceleration
  if the source and destination pixel formats differ.

* Blitting the whole back buffer to the screen to remove the
  "tracking effect" while a mouse button is held down. Blit only
  the changed areas, or even faster, use an XOR "rubberband
  effect", which can be negated by simply drawing the same thing
  again at the same location. (Note however, that you should
  *not* do that with large filled areas directly in VRAM, as it
  requires reading VRAM. Blitting the whole bounding rect from
  a back buffer will almost certainly be faster in such cases.)

* Blitting the whole screen after each operation is done.
  Although this doesn't happen too often, it's still very slow
  if it's unaccelerated. Blit only the changed areas.

Hope that explains what you’re seeing. :slight_smile:

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -’

Jacek Pop?awski wrote:>On Tue, Nov 06, 2001 at 05:29:07PM +0200, Mikko Rantalainen wrote:

I haven’t checked your code, but I’d guess from the wording you’re
updating whole window every time.

Well… When I draw line, I must update rectangle around that line. So drawing
short lines will be fast, but long lines will slow down a lot.
But in Gimp it’s fast like hell!

(Very newbie to SDL) If you are talking about line drawing while the end
of line for the other end isn’t fixed yet, you could try
XOR drawing (if that’s possible in SDL. It should, because it’s so basic
GFX thing). That’s because you can erase XOR drawn line simply by XOR
drawing it again to the same position.

I haven’t checked your code, but I’d guess from the wording you’re
updating whole window every time.

Well… When I draw line, I must update rectangle around that line. So
drawing short lines will be fast, but long lines will slow down a lot.

You don’t really have to update a full rectangle, although that’s
certainly the easiest way… :slight_smile:

But in Gimp it’s fast like hell!

Yeah, because it’s doing stuff the way I suggested. Besides, as GIMP is
an image manipulation program, rather than a traditional “paint program”,
it has very few tools that affect more than the brush area when you move
the cursor around.

Also note that CIMP is not all that fast if you try moving a large brush
around the window…

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Tuesday 06 November 2001 23:15, Jacek Pop?awski wrote:

On Tue, Nov 06, 2001 at 05:29:07PM +0200, Mikko Rantalainen wrote:

Jacek Pop?awski wrote:

I haven’t checked your code, but I’d guess from the wording you’re
updating whole window every time.

Well… When I draw line, I must update rectangle around that line. So
drawing short lines will be fast, but long lines will slow down a
lot. But in Gimp it’s fast like hell!

(Very newbie to SDL) If you are talking about line drawing while the
end of line for the other end isn’t fixed yet, you could try
XOR drawing (if that’s possible in SDL. It should, because it’s so
basic GFX thing).

Actually, XOR rendering is a basic thing for 2D rendering libraries - and
SDL is not a 2D rendering library, but a hardware abstraction layer. That
is, no, SDL does not support XOR rendering.

(And as to whether or not it should support XOR blitting, the only
machine I’ve programmed that actually had both a usable fullscreen
rendering API and hardware with accelerated XOR blitting was the Amiga.
Sure, Win32 GDI + drivers support it as well, but to no avail - GDI is
pretty useless on DirectX surfaces.)

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Tuesday 06 November 2001 23:42, Sami N??t?nen wrote:

On Tue, Nov 06, 2001 at 05:29:07PM +0200, Mikko Rantalainen wrote:

David Olofson wrote:>On Tuesday 06 November 2001 23:42, Sami N??t?nen wrote:

Jacek Pop?awski wrote:

On Tue, Nov 06, 2001 at 05:29:07PM +0200, Mikko Rantalainen wrote:

I haven’t checked your code, but I’d guess from the wording you’re
updating whole window every time.

Well… When I draw line, I must update rectangle around that line. So
drawing short lines will be fast, but long lines will slow down a
lot. But in Gimp it’s fast like hell!

(Very newbie to SDL) If you are talking about line drawing while the
end of line for the other end isn’t fixed yet, you could try
XOR drawing (if that’s possible in SDL. It should, because it’s so
basic GFX thing).

Actually, XOR rendering is a basic thing for 2D rendering libraries - and
SDL is not a 2D rendering library, but a hardware abstraction layer. That
is, no, SDL does not support XOR rendering.

(And as to whether or not it should support XOR blitting, the only
machine I’ve programmed that actually had both a usable fullscreen
rendering API and hardware with accelerated XOR blitting was the Amiga.
Sure, Win32 GDI + drivers support it as well, but to no avail - GDI is
pretty useless on DirectX surfaces.)

Here we are too much Amiga in my mind… :wink:
I thought simply for line drawing, but forget that these damn PC’s have
blitters too :wink:
Not as good as Amiga had in her time, but still blitter. :wink:

  • Your “back” surface is not guaranteed to have the same pixel
    format as the display surface.

I give the same bpp to SDL_SetVideoMode and CreateRGBSurface, is it possible
that surface created with SDL_SetVideoMode has different bpp? Or you mean
"emulation" different depth by SDL (when I set bpp=24, but XFree86 is working
in Depth=16) ?

  • Blitting from VRAM to a “back” surface

It’s not very often in my app - only after “change”.
What is other way? Should I have 3 surfaces instead 2?
So when I draw I will change buffer in RAM and only blit it to VRAM (like in
old, good DOS&asm times…).

Hope that explains what you’re seeing. :slight_smile:

Thanks a lot for help! I am still not sure what to change (except no blitting
whole area), but I will play with it now…On Tue, Nov 06, 2001 at 11:34:28PM +0100, David Olofson wrote:


Every time I climb the mountain
And it turned into a hill
I promised me that I’d move on
And I will. “Strange Highways” - Ronnie James Dio

* Your "back" surface is not guaranteed to have the same pixel
format as the display surface.

I give the same bpp to SDL_SetVideoMode and CreateRGBSurface, is it
possible that surface created with SDL_SetVideoMode has different bpp?
Or you mean “emulation” different depth by SDL (when I set bpp=24, but
XFree86 is working in Depth=16) ?

If you request an unsupported format, you’ll get a software "shadow"
surface - and of course, h/w accerelation is gone, and the surface is in
system RAM.

Either way, there’s no guarantee that the screen surface will have the
same pixel format despite the bpp arguments being the same! The display
surface may be RGB, BGR for example - both 24 bits, but still
incompatible, which results in software blitting even if both surfaces
are in VRAM and h/m acceleration is present.

* Blitting *from* VRAM to a "back" surface

It’s not very often in my app - only after “change”.
What is other way? Should I have 3 surfaces instead 2?
So when I draw I will change buffer in RAM and only blit it to VRAM

Why not just perform the rendering operation on the “picture” buffer when
the button is released? I’m assuming that the code is solid enough that
you’ll get the same result if you perform the exact same operation on an
identical surface. :wink: (Not that this actually is a problem with “old
style” airbrush tools and other random/noise driven operations.)

(like in old, good DOS&asm times…).

Yes. In fact, that method is even more relevant these days, as
read-from-VRAM performance is only getting worse for every PC generation
ever since the days of the 286.

Hope that explains what you’re seeing. :slight_smile:

Thanks a lot for help! I am still not sure what to change (except no
blitting whole area), but I will play with it now…

Well, there’s no such thing as a single perfect solution, but I think I
have a pretty good idea of what you want to do, and how it can be done
reasonably well - ask if anything is still unclear. :slight_smile:

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Wednesday 07 November 2001 00:24, Jacek Pop?awski wrote:

On Tue, Nov 06, 2001 at 11:34:28PM +0100, David Olofson wrote:

Either way, there’s no guarantee that the screen surface will have the
same pixel format despite the bpp arguments being the same! The display
surface may be RGB, BGR for example - both 24 bits, but still
incompatible, which results in software blitting even if both surfaces
are in VRAM and h/m acceleration is present.

Is it a way to create surface with identical format like VRAM surface?

Why not just perform the rendering operation on the “picture” buffer when
the button is released?

You are 100% right.

I’m assuming that the code is solid enough

It was just an exercise, I much more interested how to code future 2D SDL
applications. I used a lot SDL with OpenGL, and wrote simple “demo 2D effects”
(you know, plasma, fire, tunnel…), I have never wrote any “2D editor”, so
it’s nice to know how to create it.

(like in old, good DOS&asm times…).

Yes. In fact, that method is even more relevant these days, as
read-from-VRAM performance is only getting worse for every PC generation
ever since the days of the 286.

Hm. I thought it’s better now. But RAM is very fast now, so VRAM can be much
slower…

Thanks again !On Wed, Nov 07, 2001 at 09:39:53PM +0100, David Olofson wrote:


High noon, oh I’d sell my soul for water
Nine years worth of breakin’ my back
There’s no sun in the shadow of the wizard
See how he glides, why he’s lighter than air? “Stargazer” - Ronnie James Dio

Either way, there’s no guarantee that the screen surface will have
the same pixel format despite the bpp arguments being the same! The
display surface may be RGB, BGR for example - both 24 bits, but still
incompatible, which results in software blitting even if both
surfaces are in VRAM and h/m acceleration is present.

Is it a way to create surface with identical format like VRAM surface?

Yes, use the info in screen->format instead of assuming any constant
values for the masks.

I’m assuming that the code is solid enough

It was just an exercise, I much more interested how to code future 2D
SDL applications. I used a lot SDL with OpenGL, and wrote simple “demo
2D effects” (you know, plasma, fire, tunnel…), I have never wrote any
"2D editor", so it’s nice to know how to create it.

Yeah, application GUIs (especially in high resolutions) are a bit
different from demos and action games, where you usually pump full
screens all the time.

(like in old, good DOS&asm times…).

Yes. In fact, that method is even more relevant these days, as
read-from-VRAM performance is only getting worse for every PC
generation ever since the days of the 286.

Hm. I thought it’s better now. But RAM is very fast now, so VRAM can be
much slower…

In fact, the VRAM on current 3D cards is running in circles around PC
SDRAM, and has been for quite a while. The latest cards have 128 bit wide
memory busses and DDR SDRAM running at up to 250 MHz (ie up to 500 MHz
"pulse"), to be compared with PC133 SDRAM on a 64 bit bus…

The problem is that the AGP bus and the chipsets involved (main board and
graphics card) aren’t optimized for CPU access of VRAM. CPU access is
possible, but very, very inefficient, especially reads.

The hardware is designed with bus master DMA transfers in mind - and
meanwhile, exactly that is impossible to do with any of the current
Linux drivers… :-/

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Wednesday 07 November 2001 21:59, Jacek Pop?awski wrote:

On Wed, Nov 07, 2001 at 09:39:53PM +0100, David Olofson wrote:

[clip]

(And as to whether or not it should support XOR blitting, the only
machine I’ve programmed that actually had both a usable fullscreen
rendering API and hardware with accelerated XOR blitting was the Amiga.
Sure, Win32 GDI + drivers support it as well, but to no avail - GDI is
pretty useless on DirectX surfaces.)

Be specific - it’s a driver issue g.
-ALL- 8514/X-descended cards have XOR. (all S3, all ATI afaik, all C&T,
and so on). Just 'cause the -drivers- can’t support it doesn’t mean the
-machine- can’t.

Actually you make me curious as the X core - at least under 8514-based
drivers - uses the XOR system (which is a subset of one of the pixel
control codes). Also I believe OpenGL has a way to do it - but I’m not
sure there.

G’day, eh? :slight_smile:
- Teunis

PS: If my manuals weren’t packed away at the moment I could quote
chapter and verse on this. I was the maintainer of some of the S3
drivers for GGI for a while - and wrote my own as well.On Wed, 7 Nov 2001, David Olofson wrote:

[clip]

(And as to whether or not it should support XOR blitting, the only
machine I’ve programmed that actually had both a usable fullscreen
rendering API and hardware with accelerated XOR blitting was the
Amiga. Sure, Win32 GDI + drivers support it as well, but to no avail

  • GDI is pretty useless on DirectX surfaces.)

Be specific - it’s a driver issue g.
-ALL- 8514/X-descended cards have XOR. (all S3, all ATI afaik, all
C&T, and so on). Just 'cause the -drivers- can’t support it doesn’t
mean the -machine- can’t.

Yes, but as we’re talking about SDL here (which is not the case with a
certain other thread - oops :-), that’s not the point - SDL is not a
driver architecture, and thus, can’t use any features not made available
by any drivers.

Now, if it is supported by a significant number of drivers/APIs,
though… Then again, how many games (or even multimedia applications
in general) use XOR blitting when there’s real colorkey blitting?

Actually you make me curious as the X core - at least under 8514-based
drivers - uses the XOR system (which is a subset of one of the pixel
control codes).

I would have been surprised if it didn’t as virtually all other 2D APIs
I’ve seen do support it.

However, low level APIs, such as DirectX, DGA, GGI, svgalib, fbdev etc,
and consequently SDL, rarely do - if they support anything at all but
opaque blits.

Also I believe OpenGL has a way to do it - but I’m not sure there.

I think so, yes - but I’m not expecting to find it accelerated in
"consumer" implementations - not even if the hardware supports it.

(Speaking of which, all GeForce2+ chips have virtually all features of
the Quadro chips. You can actually enable accelerated lines and stuff by
tweaking the registry, and/or messing with the GPU ID.)

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Thursday 08 November 2001 06:57, winterlion wrote:

On Wed, 7 Nov 2001, David Olofson wrote:

(And just becaues I’m too tired to write a proper reply…)

[clip]

(And as to whether or not it should support XOR blitting, the only
machine I’ve programmed that actually had both a usable fullscreen
rendering API and hardware with accelerated XOR blitting was the
Amiga.

By this I meant that the Amiga had both API and h/w support for XOR
blitting - which is not the case with most APIs that are actually usable
for serious animation.

Of course, many, many Amiga games still used the hardware directly -
which may in fact count as an “API” as well, as it was a documented and
supported way of programming that machine. (The fact that many coders
ignored the rules for how you should go about shutting the multitasking
off and taking over the hardware, causing various problems, has nothing
to do with it - that was just sloppy programming.)

[…]

Now, if it is supported by a significant number of drivers/APIs,
though… Then again, how many games (or even multimedia applications
in general) use XOR blitting when there’s real colorkey blitting?

What’s XOR blitting got to do with colorkey blitting? Well, that’s not
what I was thinking of actually, but if you have AND blitting, you can do
masked blits using that and either OR or XOR blitting. OR + XOR would
work as well… As would OR + AND, and a number of other variants.

Anyway, sorry about the confusion.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Thursday 08 November 2001 07:14, David Olofson wrote:

On Thursday 08 November 2001 06:57, winterlion wrote:

On Wed, 7 Nov 2001, David Olofson wrote: