SDL_Renderer: Reducing the number of render calls

Tero_Lindeman · August 8, 2014, 5:39am

Hello folks,

I tried to look for discussion about this but could not find any, pardon me
if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()
etc. call would actually add the drawn quad into a vertex buffer object (or
whatever it is that people use in 2014) and then draw the big bunch of
quads whenever RenderPresent() is called or when the used texture changes
between RenderCopy calls and so on (it at least USED to be important to
change the texture as few times as possible)? How the current code does it
looks quite inefficient to me, unless modern hardware and drivers do
something similar behind the scenes. It especially looks funny in the GLES
driver with an absolutely minimal VBO of two triangles.

Generally, my idea works like this this:

Frame starts, the quad buffer is zeroed.
RenderCopy() with texture 1, added to the buffer
Another RenderCopy() with texture 1, added to the buffer
RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed
and the new quad is added to the buffer
5 … more calls
RenderPresent() sends the buffer if there’s anything there
Go to 1

Have there been plans for something like this or is the consensus that if
one needs more performance, OpenGL etc. should be used directly?

-Tero

JonnyD · August 8, 2014, 2:10pm

This general idea has been discussed and it is good. It does take a bit of
work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

Jonny DOn Fri, Aug 8, 2014 at 1:39 AM, Tero Lindeman wrote:

Hello folks,

I tried to look for discussion about this but could not find any, pardon
me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()
etc. call would actually add the drawn quad into a vertex buffer object (or
whatever it is that people use in 2014) and then draw the big bunch of
quads whenever RenderPresent() is called or when the used texture changes
between RenderCopy calls and so on (it at least USED to be important to
change the texture as few times as possible)? How the current code does it
looks quite inefficient to me, unless modern hardware and drivers do
something similar behind the scenes. It especially looks funny in the GLES
driver with an absolutely minimal VBO of two triangles.

Generally, my idea works like this this:

Frame starts, the quad buffer is zeroed.

RenderCopy() with texture 1, added to the buffer

Another RenderCopy() with texture 1, added to the buffer

RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed
and the new quad is added to the buffer
5 … more calls

RenderPresent() sends the buffer if there’s anything there

Go to 1

Have there been plans for something like this or is the consensus that if
one needs more performance, OpenGL etc. should be used directly?

-Tero

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Tero_Lindeman · August 8, 2014, 4:09pm

OK, thank you for the tip, SDL_gpu seems promising. Though, what I really
like about the vanilla SDL 2.0 renderer is that it also supports Direct3D
which in my experience has better support in some drivers (I used to get
crashes when changing display modes under OpenGL/SDL). That of course might
just be because of my limited experience.

One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change to
build a bigger VBO that contains all the rectangles. And it’s not hard to
guess I mentioned this because it would be a good base for a similar function
that takes two more parameters: source rects and the texture. Then it would
be up to the user to build the rectangle lists and keep track of state
changes.

-Tero>This general idea has been discussed and it is good. It does take a bit of

work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

Sik_the_hedgehog · August 8, 2014, 4:21pm

2014-08-08 13:09 GMT-03:00, Tero Lindeman :

One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it’s not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it’s commonly used, it’s likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There’s also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you’re rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You’ll still need a large
amount of blits to actually cause slow down, but even then.

Tero_Lindeman · August 8, 2014, 5:49pm

The truth indeed is that it is not that bad to have a bunch of calls,
at least on a computer that was bought at least after 2004. But
I have a tiny suspicion this might be a relevant worry on Android and
other less-powerful platforms.

I think a comparable situation for RenderCopyMany vs. the
RenderDrawLines routine
is a tile-based map engine where you have a screen full of 16x16 rects.
Or, as in the case that prompted me to start talking about this, a font
renderer that takes all the characters from one texture filled with
characters. So, it probably wouldn’t be completely useless to have such
a built-in routine, if the lines routine is considered useful.

-Tero>Good point, though honestly I doubt it’s commonly used, it’s likely

most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There’s also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you’re rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You’ll still need a large
amount of blits to actually cause slow down, but even then.

Jared_Maddox · August 10, 2014, 9:32am

I spent so long looking for the link, I accidentally deleted my workspace file.> Date: Fri, 8 Aug 2014 10:10:52 -0400

From: Jonathan Dearborn
To: SDL Development List
Subject: Re: [SDL] SDL_Renderer: Reducing the number of render calls
Message-ID:
<CA+DSiHbqFYHtwNBaQ7yasPOjEumrddWmi9ioyVFDfSTR1uoFqg at mail.gmail.com>
Content-Type: text/plain; charset=“utf-8”

This general idea has been discussed and it is good. It does take a bit of
work, though, as SDL would have to take care to flush the VBO whenever a
state change is issued.

The SDL rendering subsystem and API is very good for porting old projects,
but if you really need more performance or flexibility at the moment,
either look to OpenGL directly or SDL_gpu, which wraps OpenGL in a 2D API
with this optimization already implemented.

Jonny D

On Fri, Aug 8, 2014 at 1:39 AM, Tero Lindeman wrote:

Hello folks,

I tried to look for discussion about this but could not find any, pardon
me if this has been discussed over an over again. Here goes.

Would it make sense to change the SDL_Renderer so that each RenderCopy()
etc. call would actually add the drawn quad into a vertex buffer object (or
whatever it is that people use in 2014) and then draw the big bunch of
quads whenever RenderPresent() is called or when the used texture changes
between RenderCopy calls and so on (it at least USED to be important to
change the texture as few times as possible)? How the current code does it
looks quite inefficient to me, unless modern hardware and drivers do
something similar behind the scenes. It especially looks funny in the GLES
driver with an absolutely minimal VBO of two triangles.

Generally, my idea works like this this:

Frame starts, the quad buffer is zeroed.

RenderCopy() with texture 1, added to the buffer

Another RenderCopy() with texture 1, added to the buffer

RenderCopy with texture 2, the buffer is sent to the GPU and is zeroed
and the new quad is added to the buffer
5 … more calls

RenderPresent() sends the buffer if there’s anything there

Go to 1

Have there been plans for something like this or is the consensus that if
one needs more performance, OpenGL etc. should be used directly?

-Tero

There have been suggetsions of this before, and I’ve spent the time to
dig up the most recent instance that I actually remember.
Here’s the month:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/date.html
And here’s the first message, which was honestly not very informative:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/653855.html
And here’s my suggestion for an algorithm:
http://lists.libsdl.org/pipermail/sdl-libsdl.org/2013-April/088109.html

tl;dr: If you want to buffer, then you probably want to batch, and for
SDL’s Renderer API that means that you need to be careful not to
accidentally perform render B to a point when render A was supposed to
be done to that point first. This can be handled by grouping renders
into host nodes, and simply pushing a new node every time that you
have a “rendering collision”. For some reason I was thinking that a
tree implementation was needed, so if you want a tree for it I can
provide you with one that I’ve been needing to write tests for (there
were objections to involving an external library).

krux · August 10, 2014, 4:31pm

I would love to have a SDL_RenderCopies that takes an array of source rects an array of target rects, but only one src texture, so that I can build up my own texture atlas. And don’t underestimate the performance increase here. On modern hardware there is practically no difference in rendering 200 triangles or 2 triangles when you draw them in one single draw call. So the potential here is at least a factor of 100 if it is done right. And please no tree structured buffering. Don’t declare the programmer stuped who can’t properly setup batched rendering.

Tero_Lindeman · August 11, 2014, 8:43am

Thanks for the links to the earlier discussions.

IMHO a tree-based approach would be overkill compared to a very simple “batch polys until the texture/blending mode changes or the user wants to read pixel data” style batching, considering the worst case performance would be close to what it is now. SDL_RenderCopies with a single texture would be a nice solution that sits between the two extremes. I think I’ll experiment with this and post results.

krux · August 12, 2014, 2:12am

kometbomb wrote:

Thanks for the links to the earlier discussions.

IMHO a tree-based approach would be overkill compared to a very simple “batch polys until the texture/blending mode changes or the user wants to read pixel data” style batching, considering the worst case performance would be close to what it is now. SDL_RenderCopies with a single texture would be a nice solution that sits between the two extremes. I think I’ll experiment with this and post results.

nice to have somebody who is willing to spend some time here. I also spent some time thinking about how this could be done right. If a geometry shader is available and the gpu can to integers, then all vertex creation could be done in the geometry shader, passing only the raw arrays of rectangles. But sadly that’s not an option for opengles

Jeffrey_Carpenter · August 13, 2014, 9:31pm

2014-08-08 13:09 GMT-03:00, Tero Lindeman :

One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it’s not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it’s commonly used, it’s likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There’s also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you’re rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You’ll still need a large
amount of blits to actually cause slow down, but even then.

If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly) ~15…30fps+ diff can be seen.

This doesn’t matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I’ve seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won’t see the light of day for some time.

Admittedly, I haven’t tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt?

P.S. Sorry for my brevity, I’m emailing from my phone.

Cheers!> On Aug 8, 2014, at 11:21, Sik the hedgehog <sik.the.hedgehog at gmail.com> wrote:

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

slouken · August 14, 2014, 5:57am

Don’t use SDL_RenderLine for gradient backgrounds. Instead you should have
a pre-filled gradient texture. If you want it dynamically a single color
you can make it greyscale and use texture color mod to color it. If you
want it multi-colored dynamically you could use a render target and render
lines into that and then use that as your gradient. The texture doesn’t
need to be very big, either.

This will save you a huge amount of performance. The SDL line and point API
functions are not designed for massive numbers of calls, just some
additional work to fill in gaps between textures or do a little background
decoration.

If you need something like a particle system or vector art, you should
probably use OpenGL directly.

Cheers!On Wed, Aug 13, 2014 at 2:31 PM, Jeffrey Carpenter wrote:

On Aug 8, 2014, at 11:21, Sik the hedgehog <sik.the.hedgehog at gmail.com> wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman :

One thing that might be quite easy to improve about the SDL_Renderer is
the

SDL_RenderFillRects() routine because it seems all implementations are
just

doing the same as RenderFillRect() many times over and missing the
change

to
build a bigger VBO that contains all the rectangles. And it’s not hard
to

guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it’s commonly used, it’s likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There’s also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you’re rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You’ll still need a large
amount of blits to actually cause slow down, but even then.

If you use a function that uses SDL_RenderLine for dithered, linear
gradient filled backgrounds, and then use it for three (roughly) 320x240
sized widgets (720ish +/- draw calls per frame if I remember right), you
can easily start seeing performance issues. A (roughly) ~15…30fps+ diff
can be seen.

This doesn’t matter much to me on my Macbook (Intel Graphics 3000 with
plenty of fps to spare), but certainly matters a great deal on my older
single core AMD64 windev box (Geforce 6200), where I’ve seen fps drop as
low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under
iOS, but unfortunately, that test won’t see the light of day for some time.

Admittedly, I haven’t tried very hard at optimizing the function, nor are
any of these tests done on a release build, so you might have to take my
comment with a grain of salt?

P.S. Sorry for my brevity, I’m emailing from my phone.

Cheers!

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Jeffrey_Carpenter · August 15, 2014, 2:33am

Don’t use SDL_RenderLine for gradient backgrounds. Instead you should have a pre-filled gradient texture. If you want it dynamically a single color you can make it greyscale and use texture color mod to color it. If you want it multi-colored dynamically you could use a render target and render lines into that and then use that as your gradient. The texture doesn’t need to be very big, either.

Thanks a lot for the suggestions – it helps confirm what I had in mind for optimization once the time rolls around for it (I have the luxury of simply ignoring the issue for the time being). It is nice to know that I was thinking in the right direction

During the time that I discovered the performance issue (a few months ago), I experimented with one-time rendering the gradient fill to a rendering target and then rendering from that, and sure enough, the performance improved dramatically for me. At this time, the thought occurred to me that I could probably just
use a regular texture for this.

I hadn’t thought of using a greyscale texture – interesting idea, I might just have to give it a shot… texture color modulation works wonders for bitmap fonts…

This will save you a huge amount of performance. The SDL line and point API functions are not designed for massive numbers of calls, just some additional work to fill in gaps between textures or do a little background decoration.

Indeed, the API has been wonderful for my other needs! (mostly as a bits and pieces decorator).

Cheers,
Jeffrey Carpenter
<@Jeffrey_Carpenter>On 2014/08/ 14, at 0:57, Sam Lantinga wrote:

On Wed, Aug 13, 2014 at 2:31 PM, Jeffrey Carpenter <@Jeffrey_Carpenter> wrote:

On Aug 8, 2014, at 11:21, Sik the hedgehog <sik.the.hedgehog at gmail.com> wrote:

2014-08-08 13:09 GMT-03:00, Tero Lindeman :

One thing that might be quite easy to improve about the SDL_Renderer is the
SDL_RenderFillRects() routine because it seems all implementations are just
doing the same as RenderFillRect() many times over and missing the change
to
build a bigger VBO that contains all the rectangles. And it’s not hard to
guess I mentioned this because it would be a good base for a similar
function
that takes two more parameters: source rects and the texture. Then it
would
be up to the user to build the rectangle lists and keep track of state
changes.

Good point, though honestly I doubt it’s commonly used, it’s likely
most programs just call SDL_RenderFillRect several times. Same deal
with SDL_RenderDrawRect(s).

There’s also SDL_RenderDrawLines, where the same situation applies.
However, that one may be more worth looking into, because rendering
multiple lines together in a single batch is actually pretty useful
(e.g. if you’re rendering a wireframe or a grid or something like
that).

How bad is this, anyway? They barely cause a state change, in contrast
with SDL_RenderCopy, which has a rather heavy state change (changing
the texture has a much more severe penalty). You’ll still need a large
amount of blits to actually cause slow down, but even then.

If you use a function that uses SDL_RenderLine for dithered, linear gradient filled backgrounds, and then use it for three (roughly) 320x240 sized widgets (720ish +/- draw calls per frame if I remember right), you can easily start seeing performance issues. A (roughly) ~15…30fps+ diff can be seen.

This doesn’t matter much to me on my Macbook (Intel Graphics 3000 with plenty of fps to spare), but certainly matters a great deal on my older single core AMD64 windev box (Geforce 6200), where I’ve seen fps drop as low as 4fps, and unable to peak greater than 10fps.

I certainly have slight concern with the performance of the code under iOS, but unfortunately, that test won’t see the light of day for some time.

Admittedly, I haven’t tried very hard at optimizing the function, nor are any of these tests done on a release build, so you might have to take my comment with a grain of salt?

P.S. Sorry for my brevity, I’m emailing from my phone.

Cheers!

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org