External dependencies in the renderer?

Mason_Wheeler · April 15, 2013, 6:46pm

Does anyone (particularly Sam and Ryan) have any objections to pulling an external library into SDL?? Because I have an idea that could significantly improve the performance of SDL’s 3d-accelerated rendering, but it would require a multimap.? Neither SDL nor the C standard library has a multimap implementation, but I could build one with uthash and utarray, which are both fairly small and BSD-licensed.

Mason

icculus · April 16, 2013, 1:20am

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

Jonathan_Greig · April 16, 2013, 1:44am

Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just
wondering since I was looking at it possibly using it recently for the
Embroidermodder 2 project. I came across it after looking at some hash
benchmarks and the license is appealing. It’s a single header so if the
interface isn’t to your liking, making a small wrapper around it should be
fairly straight forward. Have you or Sam done any work on an SDL hash
implementation?

Swyped from my droid.On Apr 15, 2013 8:20 PM, “Ryan C. Gordon” wrote:

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library int…
utarray <http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,
which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Sik_the_hedgehog · April 16, 2013, 1:47am

I think the problem is the fact it’s an extra dependency.

That said, I’m not very fond of its use of macros at all :S (I guess
this is one place where C++ wins by far, templates would make this
trivial) I wonder if that’s an issue too.

2013/4/15, Jonathan Greig :> Ryan,

Can you elaborate on the reason why uthash is not attractive to you? Just
wondering since I was looking at it possibly using it recently for the
Embroidermodder 2 project. I came across it after looking at some hash
benchmarks and the license is appealing. It’s a single header so if the
interface isn’t to your liking, making a small wrapper around it should be
fairly straight forward. Have you or Sam done any work on an SDL hash
implementation?

Swyped from my droid.

On Apr 15, 2013 8:20 PM, “Ryan C. Gordon” wrote:

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library int…
utarray
<http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,
which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Jonathan_Greig · April 16, 2013, 1:59am

Sik,
I completely understand about the extra dependency issue, although with it
being a single header, it should be hardly a problem shipping it with the
SDL sources. At least that’s the way I look at it.

I don’t particularly care for macros either so maybe that could be part of
it too.

Swyped from my droid.

I think the problem is the fact it’s an extra dependency.

That said, I’m not very fond of its use of macros at all :S (I guess
this is one place where C++ wins by far, templates would make this
trivial) I wonder if that’s an issue too.

2013/4/15, Jonathan Greig <@Jonathan_Greig>:

Ryan,
Can you elaborate on the reason why uthash is not attractive to you? Just
wondering sinc…

<http://troydhanson.github.io/**uthash/<
uthash: a hash table for C structures>>,

which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable imp…
_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org<
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org>On Apr 15, 2013 8:47 PM, “Sik the hedgehog” <sik.the.hedgehog at gmail.com> wrote:

SDL mailing list
SDL at lists.libsdl.org
http://lists…

John6 · April 16, 2013, 2:03am

What is the optimization?On 04/15/2013 02:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling an
external library into SDL? Because I have an idea that could significantly
improve the performance of SDL’s 3d-accelerated rendering, but it would require
a multimap. Neither SDL nor the C standard library has a multimap
implementation, but I could build one with uthash and utarray
http://troydhanson.github.io/uthash/, which are both fairly small and
BSD-licensed.

Mason

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Andreas_Schiffler · April 16, 2013, 2:08am

Same gut feel here - seems reasonable to extend SDL functionality via a
copy-and-add of the single uthash.h file as the uthash license allows
redistribution in source form. Judging from its test coverage, the code
seems reasonably stable so that SDL maintainers would not have to expect
a lot of future updates in the SDL source tree of this file either.

In my view it really comes down what the user benefit would actually be
over a custom implementation inside a SDL based App (if possible).On 4/15/2013 6:59 PM, Jonathan Greig wrote:

Sik,
I completely understand about the extra dependency issue, although
with it being a single header, it should be hardly a problem shipping
it with the SDL sources. At least that’s the way I look at it.

I don’t particularly care for macros either so maybe that could be
part of it too.

Swyped from my droid.

On Apr 15, 2013 8:47 PM, “Sik the hedgehog” <sik.the.hedgehog at gmail.com <mailto:sik.the.hedgehog at gmail.com>> wrote:

I think the problem is the fact it’s an extra dependency.

That said, I’m not very fond of its use of macros at all :S (I guess
this is one place where C++ wins by far, templates would make this
trivial) I wonder if that’s an issue too.

2013/4/15, Jonathan Greig <redteam316 at gmail.com
<mailto:redteam316 at gmail.com>>:

Ryan,
Can you elaborate on the reason why uthash is not attractive to
you? Just
wondering sinc…

<http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,

which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable imp…

_____________**
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>

http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists…

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Mason_Wheeler · April 16, 2013, 2:41am

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.? It does everything in Immediate Mode, which modern versions of OpenGL and Direct3D have moved away from because it’s so slow.? GLES doesn’t even support Immediate Mode, so if you look at SDL’s GLES renderer, it does the closest thing it can find to Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a minimum, and pass as much data as possible all at once in an array.? Of course, that’s not the way people use SDL; they use SDL to draw a bunch of sprites, one at a time.? So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing coordinates.? You turn SDL_RenderCopy into an operation that adds a pair of rects to a texture’s mapped list, and SDL_RenderPresent into an operation that iterates over the multimap and for each texture, builds two arrays of vertices (one for screen coordinates and one for texture coordinates) as buffers and passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly, about 3x faster than stock SDL rendering.? With a multimap in C, I could port this concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn’t, is Z-order.? If you’re no longer deterministically drawing in the order in which draw calls are received, but instead grouping them by texture, which are in turn sorted by hash order (essentially random,) you need a Z-order parameter to make sure the right things draw on top of the right things, and what you end up with is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred lines of code (plus the implementations of the hash and the dynamic array, because C doesn’t have them built in) and it makes rendering much faster.

Mason________________________________
From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL?? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated rendering,
but it would require a multimap.? Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Scott_Percival · April 16, 2013, 2:54am

Looking at the GLES2 renderer (which is probably the cleanest
implementation we’ve got right now), isn’t there an expectation with the
design of the API that the RenderCopy operation is carried out immediately?
So in theory, couldn’t someone could call RenderCopy with an SDL_Texture *
and two rects, then mess with the contents of the texture, then call
RenderPresent?On 16 April 2013 10:41, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly. It
does everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have moved away from because it’s so slow. GLES doesn’t even
support Immediate Mode, so if you look at SDL’s GLES renderer, it does the
closest thing it can find to Immediate Mode, sending one call to OpenGL
every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course, that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one at a time. So to be fast, SDL has to keep track of the
bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of
drawing coordinates. You turn SDL_RenderCopy into an operation that adds a
pair of rects to a texture’s mapped list, and SDL_RenderPresent into an
operation that iterates over the multimap and for each texture, builds two
arrays of vertices (one for screen coordinates and one for texture
coordinates) as buffers and passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about 3x faster than stock SDL rendering. With a multimap in C, I could
port this concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is Z-order. If you’re no longer deterministically drawing in
the order in which draw calls are received, but instead grouping them by
texture, which are in turn sorted by hash order (essentially random,) you
need a Z-order parameter to make sure the right things draw on top of the
right things, and what you end up with is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of code (plus the implementations of the hash and the dynamic array,
because C doesn’t have them built in) and it makes rendering much faster.

Mason

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

icculus · April 16, 2013, 2:55am

Can you elaborate on the reason why uthash is not attractive to you?

I haven’t even clicked on the link, so I can’t say anything about
uthash. As an external piece of code, I’m hesitant to add it to SDL,
since that has caused annoyances in the past, unless there was a really
good reason.

(Doubly-so for a hashtable. I mean, a hashtable? Do we really need to
scour the internet for a hashtable?)

I imagine it’s probably a fine piece of code in itself, though.

–ryan.

Sik_the_hedgehog · April 16, 2013, 3:01am

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you’re literally just adding entries to a queue, you don’t
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

Also yeah, I wonder about the textures too, although I guess you can
always force a flush in that case.

2013/4/15, Ryan C. Gordon :>

Can you elaborate on the reason why uthash is not attractive to you?

I haven’t even clicked on the link, so I can’t say anything about
uthash. As an external piece of code, I’m hesitant to add it to SDL,
since that has caused annoyances in the past, unless there was a really
good reason.

(Doubly-so for a hashtable. I mean, a hashtable? Do we really need to
scour the internet for a hashtable?)

I imagine it’s probably a fine piece of code in itself, though.

–ryan.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Mason_Wheeler · April 16, 2013, 3:09am

A hashtable is needed because this is not just a queue.? To get good performance out of it, it has to be grouped by texture.? The idea is that you select each texture once, and perform all of the drawing for it all at once.? What we have now is just a queue, and it’s horribly slow.? On a complicated scene, it’s the difference between a few dozen API calls, or a few tens of thousands of them. (Yes, I have rendered scenes that involved with SDL.)

Mason________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Monday, April 15, 2013 8:01 PM
Subject: Re: [SDL] External dependencies in the renderer?

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you’re literally just adding entries to a queue, you don’t
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

Also yeah, I wonder about the textures too, although I guess you can
always force a flush in that case.

2013/4/15, Ryan C. Gordon :

Can you elaborate on the reason why uthash is not attractive to you?

I haven’t even clicked on the link, so I can’t say anything about
uthash. As an external piece of code, I’m hesitant to add it to SDL,
since that has caused annoyances in the past, unless there was a really
good reason.

(Doubly-so for a hashtable. I mean, a hashtable? Do we really need to
scour the internet for a hashtable?)

I imagine it’s probably a fine piece of code in itself, though.

–ryan.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

John6 · April 16, 2013, 3:36am

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed primitive.
The implementation is effectively an “intermediate mode” layer unto itself. The
layer is a massive todo buffer with three phases: queue everything, analyze
(re-order) the queue, then execute the queue as a batch. If you don’t choose the
batch size wisely, it’s possible to lose any parallelism that you might have had
when GL calls were mixed in with scene graph calls. The second challenge is to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly. It does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D have
moved away from because it’s so slow. GLES doesn’t even support Immediate Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can find to
Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of course,
that’s not the way people use SDL; they use SDL to draw a bunch of sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation that
iterates over the multimap and for each texture, builds two arrays of vertices
(one for screen coordinates and one for texture coordinates) as buffers and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly, about
3x faster than stock SDL rendering. With a multimap in C, I could port this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in which
draw calls are received, but instead grouping them by texture, which are in turn
sorted by hash order (essentially random,) you need a Z-order parameter to make
sure the right things draw on top of the right things, and what you end up with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred lines of
code (plus the implementations of the hash and the dynamic array, because C
doesn’t have them built in) and it makes rendering much faster.

Mason

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated rendering,
but it would require a multimap. Neither SDL nor the C standard library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Gabriel_Jacobo · April 16, 2013, 3:45am

I’m beating my own drum by saying this, but the SDL_RenderGeometry function
I made may be a better compromise to enhance rendering speed, assuming the
task at hand implies rendering multiple parts of the same texture. If you
are rendering a low number of quads out of each texture, it’ll probably
give you the same performance as regular SDL_RenderCopy (it has no need for
a hash table though).

Anyway, it’ll probably come down to the same sort of arguments we saw
before, and the “why don’t you do it in OpenGL” will eventually pop up

2013/4/16 John > Ok, so the optimization assumes that a rendering bottleneck is the cost of

switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive. The implementation is effectively an “intermediate mode” layer
unto itself. The layer is a massive todo buffer with three phases: queue
everything, analyze (re-order) the queue, then execute the queue as a
batch. If you don’t choose the batch size wisely, it’s possible to lose any
parallelism that you might have had when GL calls were mixed in with scene
graph calls. The second challenge is to support transparency and other
effects that depend on multiple passes in a specific order, or that play
games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
It does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have
moved away from because it’s so slow. GLES doesn’t even support
Immediate Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to make
sure the right things draw on top of the right things, and what you end
up with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn’t have them built in) and it makes rendering much faster.

Mason

------------------------------------------------------------

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,
which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

_____________**
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
Gabriel.

Scott_Percival · April 16, 2013, 3:46am

Blimey, forgot about transparency. John’s right, if you start including
semitransparent objects into your queue, then you can’t just throw them in
the texture-centric batch and let the depth test sort them out; you’d have
to run a separate pass afterwards in sequential painting order.On 16 April 2013 11:36, John wrote:

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive. The implementation is effectively an “intermediate mode” layer
unto itself. The layer is a massive todo buffer with three phases: queue
everything, analyze (re-order) the queue, then execute the queue as a
batch. If you don’t choose the batch size wisely, it’s possible to lose any
parallelism that you might have had when GL calls were mixed in with scene
graph calls. The second challenge is to support transparency and other
effects that depend on multiple passes in a specific order, or that play
games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
It does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have
moved away from because it’s so slow. GLES doesn’t even support
Immediate Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to make
sure the right things draw on top of the right things, and what you end
up with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn’t have them built in) and it makes rendering much faster.

Mason

------------------------------------------------------------

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray <http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,
which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

_____________**
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Sik_the_hedgehog · April 16, 2013, 4:48am

Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And
yeah, considering the SDL renderer would be most likely used to render
sprites in 2D, proper transparency support is pretty much a must (even
if you don’t draw “proper” translucent stuff you may be bound to be
doing it with antialiased borders).

Coming to think on it, this also means sprites must be rendered in
order, otherwise the depth buffer will completely screw up the
transparency. Given the order of the primitives is completely up to
the GPU, there isn’t much that can be done short of multiple calls.

2013/4/16, Scott Percival :> Blimey, forgot about transparency. John’s right, if you start including

semitransparent objects into your queue, then you can’t just throw them in
the texture-centric batch and let the depth test sort them out; you’d have
to run a separate pass afterwards in sequential painting order.

On 16 April 2013 11:36, John wrote:

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive. The implementation is effectively an “intermediate mode” layer
unto itself. The layer is a massive todo buffer with three phases:
queue
everything, analyze (re-order) the queue, then execute the queue as a
batch. If you don’t choose the batch size wisely, it’s possible to lose
any
parallelism that you might have had when GL calls were mixed in with
scene
graph calls. The second challenge is to support transparency and other
effects that depend on multiple passes in a specific order, or that play
games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
It does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have
moved away from because it’s so slow. GLES doesn’t even support
Immediate Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to make
sure the right things draw on top of the right things, and what you end
up with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn’t have them built in) and it makes rendering much faster.

Mason

------------------------------------------------------------

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to
pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray
<http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,
which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

_____________**
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Mason_Wheeler · April 16, 2013, 4:58am

Yeah. That’s what the Z order is there for.________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Monday, April 15, 2013 9:48 PM
Subject: Re: [SDL] External dependencies in the renderer?

Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And
yeah, considering the SDL renderer would be most likely used to render
sprites in 2D, proper transparency support is pretty much a must (even
if you don’t draw “proper” translucent stuff you may be bound to be
doing it with antialiased borders).

Coming to think on it, this also means sprites must be rendered in
order, otherwise the depth buffer will completely screw up the
transparency. Given the order of the primitives is completely up to
the GPU, there isn’t much that can be done short of multiple calls.

2013/4/16, Scott Percival :

Blimey, forgot about transparency. John’s right, if you start including
semitransparent objects into your queue, then you can’t just throw them in
the texture-centric batch and let the depth test sort them out; you’d have
to run a separate pass afterwards in sequential painting order.

On 16 April 2013 11:36, John wrote:

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive. The implementation is effectively an “intermediate mode” layer
unto itself. The layer is a massive todo buffer with three phases:
queue
everything, analyze (re-order) the queue, then execute the queue as a
batch. If you don’t choose the batch size wisely, it’s possible to lose
any
parallelism that you might have had when GL calls were mixed in with
scene
graph calls. The second challenge is to support transparency and other
effects that depend on multiple passes in a specific order, or that play
games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
? It does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D have
moved away from because it’s so slow.? GLES doesn’t even support
Immediate Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array.? Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time.? So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates.? You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering.? With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order.? If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to make
sure the right things draw on top of the right things, and what you end
up with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn’t have them built in) and it makes rendering much faster.

Mason

------------------------------------------------------------

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:
? > Does anyone (particularly Sam and Ryan) have any objections to
pulling
? > an external library into SDL?? Because I have an idea that could
? > significantly improve the performance of SDL’s 3d-accelerated
rendering,
? > but it would require a multimap.? Neither SDL nor the C standard
library
? > has a multimap implementation, but I could build one with uthash and
? > utarray
<http://troydhanson.github.io/**uthash/http://troydhanson.github.io/uthash/>,
which are both fairly
? > small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

_____________**
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

_____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

? _____________**
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/**listinfo.cgi/sdl-libsdl.org http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Mason_Wheeler · April 16, 2013, 5:03am

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data.? It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by.? Things get drawn on top of each other in the prescribed order.? I’ve
been using this for a while now.? The system works.________________________________
From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed primitive.
The implementation is effectively an “intermediate mode” layer unto itself. The
layer is a massive todo buffer with three phases: queue everything, analyze
(re-order) the queue, then execute the queue as a batch. If you don’t choose the
batch size wisely, it’s possible to lose any parallelism that you might have had
when GL calls were mixed in with scene graph calls. The second challenge is to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.? It does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D have
moved away from because it’s so slow.? GLES doesn’t even support Immediate Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can find to
Immediate Mode, sending one call to OpenGL every time someone calls SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array.? Of course,
that’s not the way people use SDL; they use SDL to draw a bunch of sprites, one
at a time.? So to be fast, SDL has to keep track of the bookkeeping for them.

The way to do this is with a multimap, mapping textures to lists of drawing
coordinates.? You turn SDL_RenderCopy into an operation that adds a pair of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation that
iterates over the multimap and for each texture, builds two arrays of vertices
(one for screen coordinates and one for texture coordinates) as buffers and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly, about
3x faster than stock SDL rendering.? With a multimap in C, I could port this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL doesn’t, is
Z-order.? If you’re no longer deterministically drawing in the order in which
draw calls are received, but instead grouping them by texture, which are in turn
sorted by hash order (essentially random,) you need a Z-order parameter to make
sure the right things draw on top of the right things, and what you end up with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred lines of
code (plus the implementations of the hash and the dynamic array, because C
doesn’t have them built in) and it makes rendering much faster.

Mason

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:
? > Does anyone (particularly Sam and Ryan) have any objections to pulling
? > an external library into SDL?? Because I have an idea that could
? > significantly improve the performance of SDL’s 3d-accelerated rendering,
? > but it would require a multimap.? Neither SDL nor the C standard library
? > has a multimap implementation, but I could build one with uthash and
? > utarray http://troydhanson.github.io/uthash/, which are both fairly
? > small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Mason_Wheeler · April 16, 2013, 5:09am

Also, if having many textures "is bound to completely negate the advantage,"
why do I see a 3x performance improvement when rendering big, complex
scenes this way?

Don’t assume, don’t guess; measure.? I’ve measured it, and this way works.________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Monday, April 15, 2013 9:48 PM
Subject: Re: [SDL] External dependencies in the renderer?

Another thing is that scenes that complex will most likely have many
textures anyway which is bound to completely negate the advantage. And
yeah, considering the SDL renderer would be most likely used to render
sprites in 2D, proper transparency support is pretty much a must (even
if you don’t draw “proper” translucent stuff you may be bound to be
doing it with antialiased borders).

Coming to think on it, this also means sprites must be rendered in
order, otherwise the depth buffer will completely screw up the
transparency. Given the order of the primitives is completely up to
the GPU, there isn’t much that can be done short of multiple calls.

Sik_the_hedgehog · April 16, 2013, 5:25am

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler :> Not exactly. The optimization assumes that the principal rendering

bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I’ve
been using this for a while now. The system works.

From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t choose
the
batch size wisely, it’s possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn’t have them built in) and it makes rendering much faster.

Mason

From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.

SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

External dependencies in the renderer?

------------------------------------------------------------

------------------------------------------------------------

------------------------------------------------------------

------------------------------------------------------------