External dependencies in the renderer?

Somehow this turned into a scenegraph discussion, to which I recommend this reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn’t it kind of implicit in a 2D graphics API that your draw order is sacred? You usually want things to overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it only needs to skip issuing state calls that are the same, and some things that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the vertex buffer before drawing any of them, for a considerable savings in driver overhead).On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler :

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I’ve
been using this for a while now. The system works.


From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t choose
the
batch size wisely, it’s possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

If we want to be blunt, the real issue here isn’t switching to a
scenegraph (besides the complexity it may bring - it’s debatable
whether it’s worth it or just tell users to use OpenGL directly for
those extreme cases) but bringing in an external dependency to SDL…

2013/4/16, Forest Hale :> Somehow this turned into a scenegraph discussion, to which I recommend this

reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn’t it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it
only needs to skip issuing state calls that are the same, and some things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler :

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I’ve
been using this for a while now. The system works.


From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto
itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t
choose
the
batch size wisely, it’s possible to lose any parallelism that you might
have
had
when GL calls were mixed in with scene graph calls. The second challenge
is
to
support transparency and other effects that depend on multiple passes in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support
Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to
a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to
make
sure the right things draw on top of the right things, and what you end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to
pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both
fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Message-ID:
<CAEyBR+VZ6JeZ451fzAVUmMmXeDw8nF2GDpYFDTK2F4a4jd-tUw at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you’re literally just adding entries to a queue, you don’t
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

As long as you have access to uintptr_t, there should be few to zero
concerns about the range of anything that can be described as an id
number, and it’s pretty easy to describe z-order as an id in this
case.> Date: Tue, 16 Apr 2013 00:01:55 -0300

From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Subject: Re: [SDL] External dependencies in the renderer?

Date: Mon, 15 Apr 2013 23:58:24 -0700
From: Forest Hale
To: sdl at lists.libsdl.org
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID: <516CF690.5060103 at ghdigital.com>
Content-Type: text/plain; charset=ISO-8859-1

Somehow this turned into a scenegraph discussion, to which I recommend this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn’t it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it
only needs to skip issuing state calls that are the same, and some things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

After double checking the header files to make certain I was
remembering correctly, it looks like a dirty-rect queue should work
fairly well. Algorithm:

  1. Check the current render command against the dirty-rect(s) of the
    current queue node, if
    1a: They overlap, then create a new queue node and make this commend
    the first entry in that node, else
    1b: Add this command to the current node, and expand the node’s
    dirty-rect(s) to cover the new area.

You can use a new node every time you use a different texture, you can
combine multiple textures in a single node (after all, you know they
won’t overlap), you can send point & line data either the same way, or
with custom nodes, you can provide hook functions to shoe-horn your
own rendering system into the queue, etc. This would also be a first
step towards the oft-requested (albeit somewhat bone-headed) feature
of issuing rendering calls from whichever thread you want.

The main issue is how the system would work. I think that what Mason’s
suggesting would require that you look through nodes until you find a
dirty-rect collision (even a partial dirty-rect collision would count,
and the only thing that gets looked at is the actual coordinates,
whether e.g. the texture is or isn’t the same doesn’t matter). At that
point you go back to the most-recently-checked node that used the same
texture and add your command there, or add your command in a new node
if there wasn’t a previous node.

That should (I think) provide the correct sequencing, while also
ensuring that you reuse textures as few times as you can get away
with.

If SDL Renderer is turning into a beast, it should be punted to its own library, much like SDL_mixer and so on.

As a matter of practicality however, I think it is fine being in the core, so long as it stays simple and direct.

My understanding of the problem that warranted this discussion is that it is abusive about Draw calls and texture switches or some such? That has nothing at all to do with draw order, and if the draw
order is less important than performance then the app should take care of sorting them first, it isn’t the duty of SDL to fix an app performance issue.

As far as some underlying technical details of GL and D3D APIs, I would recommend implementing a draw queue (fully buffered API) that is flushed to real calls to the driver after enough vertex data
has accumulated to make it worthwhile, this also allows multiple consecutive draws to be merged if their state is the same, a lot of optimizations can be done once you have that “lookahead” capability
inherent in the flush routine.On 04/16/2013 12:30 AM, Sik the hedgehog wrote:

If we want to be blunt, the real issue here isn’t switching to a
scenegraph (besides the complexity it may bring - it’s debatable
whether it’s worth it or just tell users to use OpenGL directly for
those extreme cases) but bringing in an external dependency to SDL…

2013/4/16, Forest Hale <@Forest_Hale>:

Somehow this turned into a scenegraph discussion, to which I recommend this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn’t it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing, it
only needs to skip issuing state calls that are the same, and some things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler :

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I’ve
been using this for a while now. The system works.


From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto
itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t
choose
the
batch size wisely, it’s possible to lose any parallelism that you might
have
had
when GL calls were mixed in with scene graph calls. The second challenge
is
to
support transparency and other effects that depend on multiple passes in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support
Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to
a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter
to
make
sure the right things draw on top of the right things, and what you end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to
pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both
fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

The problem is that SDL issues calls for every thing you draw, i.e. it
does things the naive way rather than being 100% optimized for GPUs
(which really are more optimized towards rendering entire complex 3D
scenes than rendering generic 2D stuff). He wants to change the way
the renderer works so it fits that better, the problem being he wants
to pull in an external dependency (uthash).

2013/4/16, Forest Hale :> If SDL Renderer is turning into a beast, it should be punted to its own

library, much like SDL_mixer and so on.

As a matter of practicality however, I think it is fine being in the core,
so long as it stays simple and direct.

My understanding of the problem that warranted this discussion is that it is
abusive about Draw calls and texture switches or some such? That has
nothing at all to do with draw order, and if the draw
order is less important than performance then the app should take care of
sorting them first, it isn’t the duty of SDL to fix an app performance
issue.

As far as some underlying technical details of GL and D3D APIs, I would
recommend implementing a draw queue (fully buffered API) that is flushed to
real calls to the driver after enough vertex data
has accumulated to make it worthwhile, this also allows multiple consecutive
draws to be merged if their state is the same, a lot of optimizations can be
done once you have that “lookahead” capability
inherent in the flush routine.

On 04/16/2013 12:30 AM, Sik the hedgehog wrote:

If we want to be blunt, the real issue here isn’t switching to a
scenegraph (besides the complexity it may bring - it’s debatable
whether it’s worth it or just tell users to use OpenGL directly for
those extreme cases) but bringing in an external dependency to SDL…

2013/4/16, Forest Hale :

Somehow this turned into a scenegraph discussion, to which I recommend
this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn’t it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing,
it
only needs to skip issuing state calls that are the same, and some
things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler :

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying
primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple
expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I’ve
been using this for a while now. The system works.


From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the
cost
of
switching textures, and intends to minimize the number texture
switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto
itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t
choose
the
batch size wisely, it’s possible to lose any parallelism that you
might
have
had
when GL calls were mixed in with scene graph calls. The second
challenge
is
to
support transparency and other effects that depend on multiple passes
in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it
bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support
Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it
can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls
to
a
minimum, and pass as much data as possible all at once in an array.
Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping
for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as
buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could
port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order
in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order
parameter
to
make
sure the right things draw on top of the right things, and what you
end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few
hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to
pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash
and
utarray http://troydhanson.github.io/uthash/, which are both
fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine -
http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

If you buffer draws, you get higher performance.

If you additionally sort them, you get even higher performance but break the most basic assumption of a 2D graphics API - that things occur in the order specified.

I see no reason to use uthash here, I do see great reason to buffer things.

Why is uthash still the subject of this discussion? We’re not going to reach a conclusion on the broad topic of outside dependencies, it’s better to focus on the specific problem at hand.On 04/16/2013 02:08 AM, Sik the hedgehog wrote:

The problem is that SDL issues calls for every thing you draw, i.e. it
does things the naive way rather than being 100% optimized for GPUs
(which really are more optimized towards rendering entire complex 3D
scenes than rendering generic 2D stuff). He wants to change the way
the renderer works so it fits that better, the problem being he wants
to pull in an external dependency (uthash).

2013/4/16, Forest Hale <@Forest_Hale>:

If SDL Renderer is turning into a beast, it should be punted to its own
library, much like SDL_mixer and so on.

As a matter of practicality however, I think it is fine being in the core,
so long as it stays simple and direct.

My understanding of the problem that warranted this discussion is that it is
abusive about Draw calls and texture switches or some such? That has
nothing at all to do with draw order, and if the draw
order is less important than performance then the app should take care of
sorting them first, it isn’t the duty of SDL to fix an app performance
issue.

As far as some underlying technical details of GL and D3D APIs, I would
recommend implementing a draw queue (fully buffered API) that is flushed to
real calls to the driver after enough vertex data
has accumulated to make it worthwhile, this also allows multiple consecutive
draws to be merged if their state is the same, a lot of optimizations can be
done once you have that “lookahead” capability
inherent in the flush routine.

On 04/16/2013 12:30 AM, Sik the hedgehog wrote:

If we want to be blunt, the real issue here isn’t switching to a
scenegraph (besides the complexity it may bring - it’s debatable
whether it’s worth it or just tell users to use OpenGL directly for
those extreme cases) but bringing in an external dependency to SDL…

2013/4/16, Forest Hale <@Forest_Hale>:

Somehow this turned into a scenegraph discussion, to which I recommend
this
reading material:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Scene%20Graphs%20-%20just%20say%20no]]

On a topic of correctness however, isn’t it kind of implicit in a 2D
graphics API that your draw order is sacred? You usually want things to
overlap in a specific way.

An order-preserving technique has no need of hashes or any such thing,
it
only needs to skip issuing state calls that are the same, and some
things
that are not order-dependent can be combined
regardless (like using glBufferSubData to write multiple quads into the
vertex buffer before drawing any of them, for a considerable savings in
driver overhead).

On 04/15/2013 10:25 PM, Sik the hedgehog wrote:

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler :

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying
primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple
expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I’ve
been using this for a while now. The system works.


From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the
cost
of
switching textures, and intends to minimize the number texture
switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto
itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t
choose
the
batch size wisely, it’s possible to lose any parallelism that you
might
have
had
when GL calls were mixed in with scene graph calls. The second
challenge
is
to
support transparency and other effects that depend on multiple passes
in
a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it
bluntly.
It
does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support
Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it
can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls
to
a
minimum, and pass as much data as possible all at once in an array.
Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping
for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as
buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could
port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order
in
which
draw calls are received, but instead grouping them by texture, which
are
in turn
sorted by hash order (essentially random,) you need a Z-order
parameter
to
make
sure the right things draw on top of the right things, and what you
end
up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few
hundred
lines of
code (plus the implementations of the hash and the dynamic array,
because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to
pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash
and
utarray http://troydhanson.github.io/uthash/, which are both
fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine -
http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged
demo." - James Klass
"A game is a series of interesting choices." - Sid Meier


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

to my mind came the following idea:

have a todo queue, but only for one texture. Flush it when it’s full or when another texture should be rendered.
-> if speed is an issue for a specific program, the programmer can sort the calls by texture and get the speedup
-> we don’t need a complicated system or an external dependency in SDL

Optimization is a task of the programmer, not the library.
That said, SDL’s interface is too high-level to enable the programmer to optimize render performance.

I see a couple options here:

  1. Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes, const SDL_Rect *);

  1. Add a “sprite batch” API:

Code:

typedef struct SDL_SpriteBatch SDL_SpriteBatch;
SDL_SpriteBatch * SDL_CreateSpriteBatch(SDL_Renderer *);
void SDL_DestroySpriteBatch(SDL_SpriteBatch *);
int SDL_BatchCopy(SDL_SpriteBatch *, SDL_Texture *, const SDL_Rect *);
int SDL_BatchFlush(SDL_SpriteBatch *);------------------------
Nate Fries

GL likes to generate texture ids incrementing from 1. I don’t recall whether
that’s standard or reliable. If it is, you wouldn’t want a general purpose hash
table to map texture ids.On 04/15/2013 11:09 PM, Mason Wheeler wrote:

A hashtable is needed because this is not just a queue. To get good
performance out of it, it has to be grouped by texture. The idea is that you
select each texture once, and perform all of the drawing for it all at once.
What we have now is just a queue, and it’s horribly slow. On a complicated
scene, it’s the difference between a few dozen API calls, or a few tens of
thousands of them. (Yes, I have rendered scenes that involved with SDL.)

Mason


From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Monday, April 15, 2013 8:01 PM
Subject: Re: [SDL] External dependencies in the renderer?

Um, is a hashtable needed for this idea as opposed to a regular array?
I mean, you’re literally just adding entries to a queue, you don’t
even need to retrieve them back. As for the Z order, just assign an
unique Z to each entry and be done with it. Sure, you may run out of
range, but at that point you probably have queued up enough primitives
to be worth flushing the batch.

Also yeah, I wonder about the textures too, although I guess you can
always force a flush in that case.

2013/4/15, Ryan C. Gordon <icculus at icculus.org <mailto:icculus at icculus.org>>:

Can you elaborate on the reason why uthash is not attractive to you?

I haven’t even clicked on the link, so I can’t say anything about
uthash. As an external piece of code, I’m hesitant to add it to SDL,
since that has caused annoyances in the past, unless there was a really
good reason.

(Doubly-so for a hashtable. I mean, a hashtable? Do we really need to
scour the internet for a hashtable?)

I imagine it’s probably a fine piece of code in itself, though.

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

It’s not “ordering by Z and texture” but “grouping by Z and texture”.? Every render with a Z of 1 will get sent before every render with a Z of 2, and so on.? That’s why I said you end up with an array of multimaps.

Mason________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: Mason Wheeler <@Mason_Wheeler>; SDL Development List
Sent: Monday, April 15, 2013 10:25 PM
Subject: Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <@Mason_Wheeler>:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data.? It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by.? Things get drawn on top of each other in the prescribed order.? I’ve
been using this for a while now.? The system works.


? From: John
To: sdl at lists.libsdl.org
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t choose
the
batch size wisely, it’s possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.? It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it’s so slow.? GLES doesn’t even support Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array.? Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time.? So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates.? You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering.? With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order.? If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon
To: SDL Development List
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:
? > Does anyone (particularly Sam and Ryan) have any objections to pulling
? > an external library into SDL?? Because I have an idea that could
? > significantly improve the performance of SDL’s 3d-accelerated
rendering,
? > but it would require a multimap.? Neither SDL nor the C standard
library
? > has a multimap implementation, but I could build one with uthash and
? > utarray http://troydhanson.github.io/uthash/, which are both fairly
? > small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number of
switches that will be saved by this optimization is easy to calculate, it’s
roughly the number of primitives minus the number of textures.On 04/16/2013 12:41 PM, Mason Wheeler wrote:

It’s not “ordering by Z and texture” but “grouping by Z and texture”. Every
render with a Z of 1 will get sent before every render with a Z of 2, and so
on. That’s why I said you end up with an array of multimaps.

Mason


From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: Mason Wheeler ; SDL Development List

Sent: Monday, April 15, 2013 10:25 PM
Subject: Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>>:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple expedient
of flushing the to-do buffer if a call comes in that changes the GL state.
All you need to keep cached is the map of textures to arrays of coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order. I’ve
been using this for a while now. The system works.


From: John <@John6 mailto:John6>
To: sdl at lists.libsdl.org <mailto:sdl at lists.libsdl.org>
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost of
switching textures, and intends to minimize the number texture switches by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t choose
the
batch size wisely, it’s possible to lose any parallelism that you might have
had
when GL calls were mixed in with scene graph calls. The second challenge is
to
support transparency and other effects that depend on multiple passes in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly. It
does
everything in Immediate Mode, which modern versions of OpenGL and Direct3D
have
moved away from because it’s so slow. GLES doesn’t even support Immediate
Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it can
find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to a
minimum, and pass as much data as possible all at once in an array. Of
course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping for
them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a pair
of
rects to a texture’s mapped list, and SDL_RenderPresent into an operation
that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as buffers
and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering significantly,
about
3x faster than stock SDL rendering. With a multimap in C, I could port
this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order in
which
draw calls are received, but instead grouping them by texture, which are
in turn
sorted by hash order (essentially random,) you need a Z-order parameter to
make
sure the right things draw on top of the right things, and what you end up
with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few hundred
lines of
code (plus the implementations of the hash and the dynamic array, because
C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon <icculus at icculus.org <mailto:icculus at icculus.org>>
To: SDL Development List <sdl at lists.libsdl.org <mailto:sdl at lists.libsdl.org>>
Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to pulling
an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash and
utarray http://troydhanson.github.io/uthash/, which are both fairly
small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
<mailto:SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>>

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

The problem is that I think the idea is to use a single batch for
everything… Again, I’m not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we’re trying to avoid…

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I’m not sure about the details).

2013/4/16, John :> That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number
of
switches that will be saved by this optimization is easy to calculate, it’s

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

It’s not “ordering by Z and texture” but “grouping by Z and texture”.
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on. That’s why I said you end up with an array of multimaps.

Mason


From: Sik the hedgehog <@Sik_the_hedgehog>
To: Mason Wheeler ; SDL Development List

Sent: Monday, April 15, 2013 10:25 PM
Subject: Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <masonwheeler at yahoo.com
<mailto:masonwheeler at yahoo.com>>:

Not exactly. The optimization assumes that the principal rendering
bottleneck is the overhead involved in sending scene data to the
graphics card, which assumption is borne out by testing data. It
intends to minimize the number of drawing calls by delaying
primitives
and sending them in batches, ordered by Z and texture.

You avoid having to cache “the entire GL state” by the simple
expedient
of flushing the to-do buffer if a call comes in that changes the GL
state.
All you need to keep cached is the map of textures to arrays of
coordinates.
And transparency works fine as long as you have a Z parameter to order
by. Things get drawn on top of each other in the prescribed order.
I’ve
been using this for a while now. The system works.


From: John <john at leafygreengames.com
<mailto:john at leafygreengames.com>>
To: sdl at lists.libsdl.org <mailto:sdl at lists.libsdl.org>
Sent: Monday, April 15, 2013 8:36 PM
Subject: Re: [SDL] External dependencies in the renderer?

Ok, so the optimization assumes that a rendering bottleneck is the cost
of
switching textures, and intends to minimize the number texture switches
by
delaying primitives, then re-ordering them by texture and Z.

I’ve seen this before. It can be done, but there are caveats. The
biggest
challenge is you need to cache the entire GL state for each delayed
primitive.
The implementation is effectively an “intermediate mode” layer unto
itself.
The
layer is a massive todo buffer with three phases: queue everything,
analyze
(re-order) the queue, then execute the queue as a batch. If you don’t
choose
the
batch size wisely, it’s possible to lose any parallelism that you might
have
had
when GL calls were mixed in with scene graph calls. The second
challenge is
to
support transparency and other effects that depend on multiple passes
in a
specific order, or that play games with the z-buffer (or other tests.)

On 04/15/2013 10:41 PM, Mason Wheeler wrote:

Here’s the basic idea.

The internals of SDL’s rendering API are atrocious, to put it bluntly.
It

does
everything in Immediate Mode, which modern versions of OpenGL and
Direct3D

have
moved away from because it’s so slow. GLES doesn’t even support
Immediate

Mode,
so if you look at SDL’s GLES renderer, it does the closest thing it
can

find to
Immediate Mode, sending one call to OpenGL every time someone calls
SDL_RenderCopy.

The way to do rendering fast is to keep the number of library calls to
a

minimum, and pass as much data as possible all at once in an array.
Of

course,
that’s not the way people use SDL; they use SDL to draw a bunch of
sprites, one
at a time. So to be fast, SDL has to keep track of the bookkeeping
for

them.

The way to do this is with a multimap, mapping textures to lists of
drawing
coordinates. You turn SDL_RenderCopy into an operation that adds a
pair

of
rects to a texture’s mapped list, and SDL_RenderPresent into an
operation

that
iterates over the multimap and for each texture, builds two arrays of
vertices
(one for screen coordinates and one for texture coordinates) as
buffers

and
passes them to the renderer all at once.

I’ve got a Delphi implementation that sped up my rendering
significantly,

about
3x faster than stock SDL rendering. With a multimap in C, I could
port

this
concept to the SDL internals.

The one tricky thing here, the concept that my renderer has that SDL
doesn’t, is
Z-order. If you’re no longer deterministically drawing in the order
in

which
draw calls are received, but instead grouping them by texture, which
are

in turn
sorted by hash order (essentially random,) you need a Z-order
parameter to

make
sure the right things draw on top of the right things, and what you
end up

with
is an array of multimaps.

I know it probably sounds very complicated, but it’s only a few
hundred

lines of
code (plus the implementations of the hash and the dynamic array,
because

C
doesn’t have them built in) and it makes rendering much faster.

Mason


From: Ryan C. Gordon <icculus at icculus.org
<mailto:icculus at icculus.org>>

To: SDL Development List <sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>

Sent: Monday, April 15, 2013 6:20 PM
Subject: Re: [SDL] External dependencies in the renderer?

On 4/15/13 2:46 PM, Mason Wheeler wrote:

Does anyone (particularly Sam and Ryan) have any objections to
pulling

an external library into SDL? Because I have an idea that could
significantly improve the performance of SDL’s 3d-accelerated
rendering,
but it would require a multimap. Neither SDL nor the C standard
library
has a multimap implementation, but I could build one with uthash
and

utarray http://troydhanson.github.io/uthash/, which are both
fairly

small and BSD-licensed.

I’d rather we have a simple hashtable implementation in SDL.

What’s the plan?

–ryan.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
<mailto:SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>>

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

I see a couple options here:

  1. Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes,
const SDL_Rect *);

Just my 2 cents, but this would be very lovely to have in any case.–
driedfruit

OK, since it’s apparently not clear from my original proposal, I wasn’t
talking about sending Z coordinates to OpenGL or Direct3D in any way.
I was talking about using them on the SDL side.? You’d end up with a
certain number of layers, (most 2D games draw 4 or 5 distinct layers
IME,) and each layer would have its own Z number.

Each Z layer would have its own texture-to-coordinates multimap.
When it’s time to render everything, it looks like this (pseudocode):

for each multimap in layers:
?? for each texture in multimap:
??? CreateCoordArrays(multimap[texture])
??? SelectTexture(texture)
??? RenderArrays

It’s really that simple, in concept.? Everything draws on top of what
it’s supposed to draw on top of.? There’s no need to send Z ordering
to the GPU.? There’s no atrociously slow one-API-render-per-call.
I’ve tested it.? It works, and it’s about 3x faster than the current system
on large, complicated scenes.

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one(a map and a dynamic array).
And 2) SDL_RenderCopy does not currently have a Z parameter on
it, whichis needed to make layering work correctly.

Mason________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Tuesday, April 16, 2013 4:37 PM
Subject: Re: [SDL] External dependencies in the renderer?

The problem is that I think the idea is to use a single batch for
everything… Again, I’m not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we’re trying to avoid…

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I’m not sure about the details).

2013/4/16, John :

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The number
of
switches that will be saved by this optimization is easy to calculate, it’s

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

It’s not “ordering by Z and texture” but “grouping by Z and texture”.
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on.? That’s why I said you end up with an array of multimaps.

Mason


From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: Mason Wheeler <@Mason_Wheeler>; SDL Development List

Sent: Monday, April 15, 2013 10:25 PM
Subject: Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <@Mason_Wheeler
mailto:Mason_Wheeler>:
? > Not exactly. The optimization assumes that the principal rendering
? > bottleneck is the overhead involved in sending scene data to the
? > graphics card, which assumption is borne out by testing data.? It
? > intends to minimize the number of drawing calls by delaying
primitives
? > and sending them in batches, ordered by Z and texture.
? >
? >
? > You avoid having to cache “the entire GL state” by the simple
expedient
? > of flushing the to-do buffer if a call comes in that changes the GL
state.
? > All you need to keep cached is the map of textures to arrays of
coordinates.
? > And transparency works fine as long as you have a Z parameter to order
? > by.? Things get drawn on top of each other in the prescribed order.
I’ve
? > been using this for a while now.? The system works.
? >
? >
? >
? > ________________________________
? >? From: John <john at leafygreengames.com
<mailto:john at leafygreengames.com>>
? > To: sdl at lists.libsdl.org <mailto:sdl at lists.libsdl.org>
? > Sent: Monday, April 15, 2013 8:36 PM
? > Subject: Re: [SDL] External dependencies in the renderer?
? >
? >
? > Ok, so the optimization assumes that a rendering bottleneck is the cost
of
? > switching textures, and intends to minimize the number texture switches
by
? > delaying primitives, then re-ordering them by texture and Z.
? >
? > I’ve seen this before. It can be done, but there are caveats. The
biggest
? > challenge is you need to cache the entire GL state for each delayed
? > primitive.
? > The implementation is effectively an “intermediate mode” layer unto
itself.
? > The
? > layer is a massive todo buffer with three phases: queue everything,
? > analyze
? > (re-order) the queue, then execute the queue as a batch. If you don’t
choose
? > the
? > batch size wisely, it’s possible to lose any parallelism that you might
have
? > had
? > when GL calls were mixed in with scene graph calls. The second
challenge is
? > to
? > support transparency and other effects that depend on multiple passes
in a
? > specific order, or that play games with the z-buffer (or other tests.)
? >
? >
? >
? > On 04/15/2013 10:41 PM, Mason Wheeler wrote:
? >> Here’s the basic idea.
? >>
? >> The internals of SDL’s rendering API are atrocious, to put it bluntly.
? It
? >> does
? >> everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
? >> have
? >> moved away from because it’s so slow.? GLES doesn’t even support
Immediate
? >> Mode,
? >> so if you look at SDL’s GLES renderer, it does the closest thing it
can
? >> find to
? >> Immediate Mode, sending one call to OpenGL every time someone calls
? >> SDL_RenderCopy.
? >>
? >> The way to do rendering fast is to keep the number of library calls to
a
? >> minimum, and pass as much data as possible all at once in an array.
Of
? >> course,
? >> that’s not the way people use SDL; they use SDL to draw a bunch of
? >> sprites, one
? >> at a time.? So to be fast, SDL has to keep track of the bookkeeping
for
? >> them.
? >>
? >> The way to do this is with a multimap, mapping textures to lists of
? >> drawing
? >> coordinates.? You turn SDL_RenderCopy into an operation that adds a
pair
? >> of
? >> rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
? >> that
? >> iterates over the multimap and for each texture, builds two arrays of
? >> vertices
? >> (one for screen coordinates and one for texture coordinates) as
buffers
? >> and
? >> passes them to the renderer all at once.
? >>
? >> I’ve got a Delphi implementation that sped up my rendering
significantly,
? >> about
? >> 3x faster than stock SDL rendering.? With a multimap in C, I could
port
? >> this
? >> concept to the SDL internals.
? >>
? >> The one tricky thing here, the concept that my renderer has that SDL
? >> doesn’t, is
? >> Z-order.? If you’re no longer deterministically drawing in the order
in
? >> which
? >> draw calls are received, but instead grouping them by texture, which
are
? >> in turn
? >> sorted by hash order (essentially random,) you need a Z-order
parameter to
? >> make
? >> sure the right things draw on top of the right things, and what you
end up
? >> with
? >> is an array of multimaps.
? >>
? >> I know it probably sounds very complicated, but it’s only a few
hundred
? >> lines of
? >> code (plus the implementations of the hash and the dynamic array,
because
? >> C
? >> doesn’t have them built in) and it makes rendering much faster.
? >>
? >> Mason
? >>
? >>

? >> From: Ryan C. Gordon <icculus at icculus.org
<mailto:icculus at icculus.org>>
? >> To: SDL Development List <sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>
? >> Sent: Monday, April 15, 2013 6:20 PM
? >> Subject: Re: [SDL] External dependencies in the renderer?
? >>
? >> On 4/15/13 2:46 PM, Mason Wheeler wrote:
? >>? > Does anyone (particularly Sam and Ryan) have any objections to
pulling
? >>? > an external library into SDL?? Because I have an idea that could
? >>? > significantly improve the performance of SDL’s 3d-accelerated
? >> rendering,
? >>? > but it would require a multimap.? Neither SDL nor the C standard
? >> library
? >>? > has a multimap implementation, but I could build one with uthash
and
? >>? > utarray http://troydhanson.github.io/uthash/, which are both
fairly
? >>? > small and BSD-licensed.
? >>
? >> I’d rather we have a simple hashtable implementation in SDL.
? >>
? >> What’s the plan?
? >>
? >> --ryan.
? >>
? >>
? >>
? >> _______________________________________________
? >> SDL mailing list
? >> SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
<mailto:SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>>
? >> http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
? >>
? >>
? >>
? >>
? >> _______________________________________________
? >> SDL mailing list
? >> SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
? >> http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
? >>
? > _______________________________________________
? > SDL mailing list
? > SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
? > http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Not just SDL_RenderCopy, nothing in the rendering API has.

It looks like basically what you’re doing is just telling the API that
the order doesn’t matter as long as specific groups are, if I’m
understanding correctly (which isn’t how the SDL API works). Indeed
that’s a valid optimization but not one that would work with the
current API, if that’s the case. Is that correct?

2013/4/16, Mason Wheeler :> OK, since it’s apparently not clear from my original proposal, I wasn’t

talking about sending Z coordinates to OpenGL or Direct3D in any way.
I was talking about using them on the SDL side.? You’d end up with a
certain number of layers, (most 2D games draw 4 or 5 distinct layers
IME,) and each layer would have its own Z number.

Each Z layer would have its own texture-to-coordinates multimap.
When it’s time to render everything, it looks like this (pseudocode):

for each multimap in layers:
?? for each texture in multimap:
??? CreateCoordArrays(multimap[texture])
??? SelectTexture(texture)
??? RenderArrays

It’s really that simple, in concept.? Everything draws on top of what
it’s supposed to draw on top of.? There’s no need to send Z ordering
to the GPU.? There’s no atrociously slow one-API-render-per-call.
I’ve tested it.? It works, and it’s about 3x faster than the current system
on large, complicated scenes.

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one(a map and a dynamic array).
And 2) SDL_RenderCopy does not currently have a Z parameter on
it, whichis needed to make layering work correctly.

Mason


From: Sik the hedgehog <@Sik_the_hedgehog>
To: SDL Development List
Sent: Tuesday, April 16, 2013 4:37 PM
Subject: Re: [SDL] External dependencies in the renderer?

The problem is that I think the idea is to use a single batch for
everything… Again, I’m not sure at all that this kind of Z ordering
is reliable in that case. The problem is that the safest way is
sending one thing at a time, i.e. one draw call per SDL function,
which is the very thing we’re trying to avoid…

Also yeah, the Z range is why I said we could run out of them. On PCs
we have 24-bit depth buffer, OK (though somebody could still attempt
to set 16-bit, and I guess on 2D this could make sense), but on mobile
I wonder how the Z range is handled (especially on referred renderers
as opposed to standard rasterizer ones).

And yes, OpenGL numerates textures from 1 onwards (this is true for
all objects, really), but remember you can create gaps by deleting
textures, and OpenGL will attempt to fill those if I recall correctly
(I’m not sure about the details).

2013/4/16, John :

That sounds like ordering by Z to me, no?

The GLES device vendors advise against implementing your own depth sorting
because the GPU depth test does it much faster, more efficiently, can
correctly
handle overlaps, and runs in parallel with the CPU.

Also, z is floating point in transformed view coordinates which means
there
may
not be many duplicate z values to group by.

Have you measured the cost of switching the active texture unit? The
number
of
switches that will be saved by this optimization is easy to calculate,
it’s

roughly the number of primitives minus the number of textures.

On 04/16/2013 12:41 PM, Mason Wheeler wrote:

It’s not “ordering by Z and texture” but “grouping by Z and texture”.
Every
render with a Z of 1 will get sent before every render with a Z of 2, and
so
on.? That’s why I said you end up with an array of multimaps.

Mason


From: Sik the hedgehog <@Sik_the_hedgehog>
To: Mason Wheeler ; SDL Development List

Sent: Monday, April 15, 2013 10:25 PM
Subject: Re: [SDL] External dependencies in the renderer?

Is there any guarantee in OpenGL at all that primitives are drawn in
the order they appear in the buffer (which would seem inefficient)?
Otherwise ordering by Z is pretty much eventually going to break in
the future.

2013/4/16, Mason Wheeler <masonwheeler at yahoo.com
<mailto:masonwheeler at yahoo.com>>:
? > Not exactly. The optimization assumes that the principal rendering
? > bottleneck is the overhead involved in sending scene data to the
? > graphics card, which assumption is borne out by testing data.? It
? > intends to minimize the number of drawing calls by delaying
primitives
? > and sending them in batches, ordered by Z and texture.
? >
? >
? > You avoid having to cache “the entire GL state” by the simple
expedient
? > of flushing the to-do buffer if a call comes in that changes the GL
state.
? > All you need to keep cached is the map of textures to arrays of
coordinates.
? > And transparency works fine as long as you have a Z parameter to order
? > by.? Things get drawn on top of each other in the prescribed order.
I’ve
? > been using this for a while now.? The system works.
? >
? >
? >
? > ________________________________
? >? From: John <john at leafygreengames.com
<mailto:john at leafygreengames.com>>
? > To: sdl at lists.libsdl.org <mailto:sdl at lists.libsdl.org>
? > Sent: Monday, April 15, 2013 8:36 PM
? > Subject: Re: [SDL] External dependencies in the renderer?
? >
? >
? > Ok, so the optimization assumes that a rendering bottleneck is the
cost
of
? > switching textures, and intends to minimize the number texture
switches
by
? > delaying primitives, then re-ordering them by texture and Z.
? >
? > I’ve seen this before. It can be done, but there are caveats. The
biggest
? > challenge is you need to cache the entire GL state for each delayed
? > primitive.
? > The implementation is effectively an “intermediate mode” layer unto
itself.
? > The
? > layer is a massive todo buffer with three phases: queue everything,
? > analyze
? > (re-order) the queue, then execute the queue as a batch. If you don’t
choose
? > the
? > batch size wisely, it’s possible to lose any parallelism that you
might
have
? > had
? > when GL calls were mixed in with scene graph calls. The second
challenge is
? > to
? > support transparency and other effects that depend on multiple passes
in a
? > specific order, or that play games with the z-buffer (or other tests.)
? >
? >
? >
? > On 04/15/2013 10:41 PM, Mason Wheeler wrote:
? >> Here’s the basic idea.
? >>
? >> The internals of SDL’s rendering API are atrocious, to put it
bluntly.
? It
? >> does
? >> everything in Immediate Mode, which modern versions of OpenGL and
Direct3D
? >> have
? >> moved away from because it’s so slow.? GLES doesn’t even support
Immediate
? >> Mode,
? >> so if you look at SDL’s GLES renderer, it does the closest thing it
can
? >> find to
? >> Immediate Mode, sending one call to OpenGL every time someone calls
? >> SDL_RenderCopy.
? >>
? >> The way to do rendering fast is to keep the number of library calls
to
a
? >> minimum, and pass as much data as possible all at once in an array.
Of
? >> course,
? >> that’s not the way people use SDL; they use SDL to draw a bunch of
? >> sprites, one
? >> at a time.? So to be fast, SDL has to keep track of the bookkeeping
for
? >> them.
? >>
? >> The way to do this is with a multimap, mapping textures to lists of
? >> drawing
? >> coordinates.? You turn SDL_RenderCopy into an operation that adds a
pair
? >> of
? >> rects to a texture’s mapped list, and SDL_RenderPresent into an
operation
? >> that
? >> iterates over the multimap and for each texture, builds two arrays of
? >> vertices
? >> (one for screen coordinates and one for texture coordinates) as
buffers
? >> and
? >> passes them to the renderer all at once.
? >>
? >> I’ve got a Delphi implementation that sped up my rendering
significantly,
? >> about
? >> 3x faster than stock SDL rendering.? With a multimap in C, I could
port
? >> this
? >> concept to the SDL internals.
? >>
? >> The one tricky thing here, the concept that my renderer has that SDL
? >> doesn’t, is
? >> Z-order.? If you’re no longer deterministically drawing in the order
in
? >> which
? >> draw calls are received, but instead grouping them by texture, which
are
? >> in turn
? >> sorted by hash order (essentially random,) you need a Z-order
parameter to
? >> make
? >> sure the right things draw on top of the right things, and what you
end up
? >> with
? >> is an array of multimaps.
? >>
? >> I know it probably sounds very complicated, but it’s only a few
hundred
? >> lines of
? >> code (plus the implementations of the hash and the dynamic array,
because
? >> C
? >> doesn’t have them built in) and it makes rendering much faster.
? >>
? >> Mason
? >>
? >>

? >> From: Ryan C. Gordon <icculus at icculus.org
<mailto:icculus at icculus.org>>
? >> To: SDL Development List <sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>
? >> Sent: Monday, April 15, 2013 6:20 PM
? >> Subject: Re: [SDL] External dependencies in the renderer?
? >>
? >> On 4/15/13 2:46 PM, Mason Wheeler wrote:
? >>? > Does anyone (particularly Sam and Ryan) have any objections to
pulling
? >>? > an external library into SDL?? Because I have an idea that could
? >>? > significantly improve the performance of SDL’s 3d-accelerated
? >> rendering,
? >>? > but it would require a multimap.? Neither SDL nor the C standard
? >> library
? >>? > has a multimap implementation, but I could build one with uthash
and
? >>? > utarray http://troydhanson.github.io/uthash/, which are both
fairly
? >>? > small and BSD-licensed.
? >>
? >> I’d rather we have a simple hashtable implementation in SDL.
? >>
? >> What’s the plan?
? >>
? >> --ryan.
? >>
? >>
? >>
? >> _______________________________________________
? >> SDL mailing list
? >> SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
<mailto:SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>>
? >> http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
? >>
? >>
? >>
? >>
? >> _______________________________________________
? >> SDL mailing list
? >> SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
? >> http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
? >>
? > _______________________________________________
? > SDL mailing list
? > SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
? > http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Can you guys please trim your replies? The quotes are getting way too long.> Date: Tue, 16 Apr 2013 02:23:02 -0700

From: Forest Hale
To: sdl at lists.libsdl.org
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID: <516D1876.8030506 at ghdigital.com>
Content-Type: text/plain; charset=ISO-8859-1

If you buffer draws, you get higher performance.

If you additionally sort them, you get even higher performance but break the
most basic assumption of a 2D graphics API - that things occur in the order
specified.

I see no reason to use uthash here, I do see great reason to buffer things.

Why is uthash still the subject of this discussion? We’re not going to
reach a conclusion on the broad topic of outside dependencies, it’s better
to focus on the specific problem at hand.

Because Mason wants to SORT things, instead of just buffering them.

Date: Tue, 16 Apr 2013 07:32:38 -0700
From: “Nathaniel J Fries”
To: sdl at lists.libsdl.org
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID: <1366122758.m2f.36641 at forums.libsdl.org>
Content-Type: text/plain; charset=“iso-8859-1”

Optimization is a task of the programmer, not the library.
That said, SDL’s interface is too high-level to enable the programmer to
optimize render performance.

I see a couple options here:

  1. Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes, const
SDL_Rect *);

I think that’s a good idea.

  1. Add a “sprite batch” API:

Code:

typedef struct SDL_SpriteBatch SDL_SpriteBatch;
SDL_SpriteBatch * SDL_CreateSpriteBatch(SDL_Renderer *);
void SDL_DestroySpriteBatch(SDL_SpriteBatch *);
int SDL_BatchCopy(SDL_SpriteBatch *, SDL_Texture *, const SDL_Rect *);
int SDL_BatchFlush(SDL_SpriteBatch *);

Eh, I think this should really go in an external library.

Date: Tue, 16 Apr 2013 09:41:29 -0700 (PDT)
From: Mason Wheeler
To: Sik the hedgehog <sik.the.hedgehog at gmail.com>, SDL Development
List
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
<1366130489.95579.YahooMailNeo at web122502.mail.ne1.yahoo.com>
Content-Type: text/plain; charset=“iso-8859-1”

It’s not “ordering by Z and texture” but “grouping by Z and texture”.? Every
render with a Z of 1 will get sent before every render with a Z of 2, and so
on.? That’s why I said you end up with an array of multimaps.

Mason

A multimap is a high-level structure, and this would be written in C,
so you’re thinking about this incorrectly. I don’t support your
particular perspective on this (I prefer Forest’s "buffer it"
approach, because it preserves API behavior), but what you’re talking
about can be done trivially with any data structure, and if you use
sorting data structures (such as balanced binary search trees) then
you can do it pretty quickly.

However, regardless of that, there’s something that still true: you’re
talking about ordering by Z, as well as ordering by texture. If you
don’t understand why we keep repeating this then you need to go back
to whatever dictionary you’re using and try to find a way to reconcile
"Every render with a Z of 1 will get sent before every render with a Z
of 2" and “Sorting by Z”. If you don’t understand how those two things
are the same, then you should drop this line of enquiry until you do
understand it.

Date: Tue, 16 Apr 2013 17:48:58 -0700 (PDT)
From: Mason Wheeler
To: SDL Development List
Subject: Re: [SDL] External dependencies in the renderer?
Message-ID:
<1366159738.92011.YahooMailNeo at web122502.mail.ne1.yahoo.com>
Content-Type: text/plain; charset=“iso-8859-1”

Each Z layer would have its own texture-to-coordinates multimap.
When it’s time to render everything, it looks like this (pseudocode):

for each multimap in layers:
?? for each texture in multimap:
??? CreateCoordArrays(multimap[texture])
??? SelectTexture(texture)
??? RenderArrays

This is, in fact, ordering by Z and texture, which you said was not
being done. You need to recheck your terminology.

It’s really that simple, in concept.? Everything draws on top of what
it’s supposed to draw on top of.? There’s no need to send Z ordering
to the GPU.? There’s no atrociously slow one-API-render-per-call.
I’ve tested it.? It works, and it’s about 3x faster than the current system
on large, complicated scenes.

The current SDL2 api is a 2d api. As a result, call 2 draws on top of
call 1, meaning that render calls do actually matter. So this isn’t
how you do things? That’s fine, but don’t try to force your system
down everyone else’s throats. I think that the api should be improved
to reduce the number of calls, but preserving draw order is required.

In case you’ve forgotten, the api is currently “locked”, and since
this has the potential to break api behavior for existing games, the
change is not acceptable. Buffering (such as that provided by the
queue method that I suggested earlier) is fine as long as it preserves
draw order, but what you’re suggesting is not reliably acceptable.

There are only two real downsides: 1) it requires a multimap to work
properly, which we need a library for because libc provides neither a
multimap implementation nor the fundamental primitives needed to
build one(a map and a dynamic array).

Actually, all that you need is a searchable data structure. This
covers everything from arrays, to linked lists, to trees, and doesn’t
even have to be sortable. Even then, C does provide some array sorting
functions (e.g. qsort) which can be used to implement this. Thus, C
provides a route to a concept demo.

If you want decent speeds then you want a sorted tree, so that rules
out C’s standard library, but at the end of the day a customizable
tree (where you can have multiple customizations) is all that’s
needed. After all, a map is just the association of one value with a
data slot, and any searchable data structure does that fine, including
trees. A dynamic array is a general enough term that it can cover any
extensible data structure, and since tree insertions are quicker than
copying a large block of memory when you need to perform an extension,
you might as well use a tree there too.

So, no need for a “map” nor for a “dynamic array”, all that’s needed
for YOUR preference is a balanced tree.

And 2) SDL_RenderCopy does not currently have a Z parameter on
it, whichis needed to make layering work correctly.

Adding Z would be a backwards-compatibility break, which is now forbidden.

Just in case it got lost in the conversation, here’s my suggestion
again, which unlike Mason’s should presumably maintain compatibility
with the current version:

  1. Search through the queue, from most recent node to oldest node,
    looking for collisions between the current call’s bounding box and the
    bounding box of the queue nodes.
  2. If a collision is found, or the oldest node is reached without a
    collision, add the current command to the node that was most recently
    encountered, which also used the same texture as the command, and
    expand that node’s bounding box.
  3. If no node has been found that uses the same texture, add the
    command in a new node.

Point, line, and rectangle render commands would go into the same
queue. The main issue would be where the queue should be flushed, I
figure that belongs in SDL_RenderPresent. That one is required on all
platforms, right? It looks like (with the possible exception of the
software renderer) all of the platforms need that to reliably render.

2013/4/16, Jared Maddox :

Point, line, and rectangle render commands would go into the same
queue. The main issue would be where the queue should be flushed, I
figure that belongs in SDL_RenderPresent. That one is required on all
platforms, right? It looks like (with the possible exception of the
software renderer) all of the platforms need that to reliably render.

I believe the software renderer still needs SDL_RenderPresent
(otherwise how does SDL know that it can safely draw the surface on
the window?).

As for flushing, there’d be two points where this should happen for
correct behavior:

  1. In SDL_RenderPresent, right before it does its job.
  2. When the buffer becomes so big that going any bigger would nullify
    the benefits.

I don’t actually think point 2 is valid.? The bigger and more complicated the
scene, the more this scheme benefits it.? If you’re only drawing 20 sprites
per frame, it doesn’t matter how inefficient your drawing techniques are; on
modern hardware you’ll get good performance anyway.? But if you’re drawing
20,000 or 200,000, that’s when you’ll really see the benefit of something
like this.

The only real “hard limit” you’d see for the whole thing getting too big is
when the whole thing gets too big, when you start to run into system-level
limitations.? And at that point, you’ve got bigger problems to worry about.

The second point at which you would want to flush the buffer is when the
rendering state changes.? To keep complexity down, the buffer operates
under the assumption that everything draws in the same way. If you change
the transparency settings, for example, or (even more obviously) change the
active rendering target, you need to execute all existing buffered draw
commands first and then start over with a clean slate.

Mason________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Tuesday, April 16, 2013 8:02 PM
Subject: Re: [SDL] External dependencies in the renderer?

2013/4/16, Jared Maddox :

Point, line, and rectangle render commands would go into the same
queue. The main issue would be where the queue should be flushed, I
figure that belongs in SDL_RenderPresent. That one is required on all
platforms, right? It looks like (with the possible exception of the
software renderer) all of the platforms need that to reliably render.

I believe the software renderer still needs SDL_RenderPresent
(otherwise how does SDL know that it can safely draw the surface on
the window?).

As for flushing, there’d be two points where this should happen for
correct behavior:

  1. In SDL_RenderPresent, right before it does its job.
  2. When the buffer becomes so big that going any bigger would nullify
    the benefits.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Point 2 does matter in real hardware, sadly. If you try to send too
big of a batch, it’ll end up being slower as you’ll overwhelm the GPU
trying to transfer all that data into its own memory (in particular,
memory latency will become a massive issue here). I recall the general
suggestion is to not use values larger than 16-bit for indices (i.e.
that makes for 64K entries max in a buffer object), to give an idea.

(there’s a debate about whether anybody will ever reach that point -
but I guess that 10,000~15,000 entries probably make a good place to
break up, if you consider most of them will be quads and thereby eat
up four vertices each, although I guess you can optimize this to reuse
primitives and use transformations to work around it instead, but even
then that just doubles the acceptable limit)

I don’t think translucency parameters affect the state though. You
could just feed those in the buffer itself and let the shader handle
it (if you’re doing this method you definitely are going the shader
route anyway). In this sense textures really should be the only state
change, unless I’m missing something. (oh, and yes, changing the
shader is bad too as it can’t be parallelized at all)

2013/4/17, Mason Wheeler :> I don’t actually think point 2 is valid.? The bigger and more complicated

the
scene, the more this scheme benefits it.? If you’re only drawing 20 sprites
per frame, it doesn’t matter how inefficient your drawing techniques are; on
modern hardware you’ll get good performance anyway.? But if you’re drawing
20,000 or 200,000, that’s when you’ll really see the benefit of something
like this.

The only real “hard limit” you’d see for the whole thing getting too big is
when the whole thing gets too big, when you start to run into system-level
limitations.? And at that point, you’ve got bigger problems to worry about.

The second point at which you would want to flush the buffer is when the
rendering state changes.? To keep complexity down, the buffer operates
under the assumption that everything draws in the same way. If you change
the transparency settings, for example, or (even more obviously) change the
active rendering target, you need to execute all existing buffered draw
commands first and then start over with a clean slate.

Mason


From: Sik the hedgehog <@Sik_the_hedgehog>
To: SDL Development List
Sent: Tuesday, April 16, 2013 8:02 PM
Subject: Re: [SDL] External dependencies in the renderer?

2013/4/16, Jared Maddox :

Point, line, and rectangle render commands would go into the same
queue. The main issue would be where the queue should be flushed, I
figure that belongs in SDL_RenderPresent. That one is required on all
platforms, right? It looks like (with the possible exception of the
software renderer) all of the platforms need that to reliably render.

I believe the software renderer still needs SDL_RenderPresent
(otherwise how does SDL know that it can safely draw the surface on
the window?).

As for flushing, there’d be two points where this should happen for
correct behavior:

  1. In SDL_RenderPresent, right before it does its job.
  2. When the buffer becomes so big that going any bigger would nullify
    the benefits.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

This optimization is a good thing, but the benefit does not scale directly
with the number of sprites drawn. When you get into the thousands of
sprites, flushing the buffer every few thousand sprites is of negligible
cost. This approach helps avoid the incremental cost of several OpenGL
calls and a blocking “flush” for every single sprite (the way SDL currently
does it).

Sik’s point 2 is relevant and applies even more clearly when your buffer is
a simple fixed-size array. At some point, you have to decide how much
memory you want to allocate for this buffer, and it must be flushed before
it overflows. As far as that goes, I’d rather not be allocating memory as
I assume the map does. Also, Mason is right that you have to flush before
every state change that could change the rendering.

As was said before, we need to guarantee rendering order because the OpenGL
depth test is not enough to make alpha blending work in the right order. Z
layers is an okay concept, but not terribly widespread in practice. It
would be strange to make the SDL API embrace such a high level concept that
doesn’t apply to most applications.

Jonny DOn Mon, Apr 15, 2013 at 8:04 PM, Driedfruit wrote:

I see a couple options here:

  1. Add another function for rendering the same texture multiple times:

Code:

int SDL_RenderCopyMulti(SDL_Renderer *, SDL_Texture *, int nTimes,
const SDL_Rect *);

Just my 2 cents, but this would be very lovely to have in any case.


driedfruit


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org