To be honest guys I’m expecting something much simpler. Lets give an example
of following functions:
Code:
SDL_RenderCopies(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect)
Code:
SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)
The only difference between those functions and their equivalents of
SDL_RenderCopy and SDL_RenderCopyEx is taking an array of source and
destination rectangles. The rest can be taken care of by the programmer.
That way the implemention of these functions should be straight forward and
should take no more than 10 minutes for both OpenGL and Direct3D. I can make
it myself, yes, but I would like to stay up to date with the SDL2 itself and
I do not like modifying external libraries I’m relying on.
D3D_RenderCopy uses DrawPrimitiveUP of Direct3D which is almost ready for
batch drawing - just add more vertices of the rest rectangles. Desktop
OpenGL uses old immediate mode (draw arrays would be much better), but it’s
easy to implement it as well. All available renderers within SDL2 are pretty
much ready to add batching of the same texture and they require minor
changes to SDL_RenderCopy and SDL_RenderCopyEx to make new functions out of
them.
I agree that it should be simpler than XNA and I personally like this
line of thinking.
I think the goal of the batching API should be stated explicitly so
everybody is on the same page. In my opinion, the goal should be to
allow performance optimizations and that’s it. (XNA conflates multiple
things…performance plus read-my-mind-do-everything-I-want which
ultimately makes things more complicated.) Convenience wrappers can
always be written on the outside, but you can’t wrap around
API/performance bottlenecks and expect them go faster.
Additionally, I suspect that a good SIMD backend could make the
software renderer go a lot faster too. (Watch Handmade Hero for a
great demonstration of how he made a chunky software renderer go to
60fps at 1080p using SSE2.)
I would suggest constraining the API as much as possible for speed. To
make SIMD or any vectorization to go fast, you generally want
predictable data layouts and no branches.
So for example, with
SDL_RenderCopiesEx(SDL_Renderer* renderer,
SDL_Texture* texture,
const SDL_Rect** srcrect,
const SDL_Rect** dstrect,
const double angle,
const SDL_Point* center,
const SDL_RendererFlip flip)
I might suggest that individual array elements for src/dst rect must
always have values and can’t be NULL, that way the code that is trying
to shuffle things into registers isn’t needing to check for NULL all
the time.
Along that line of thought, we may want explicit array sizes for
srcrect/dstrect as additional parameters. Algorithms may want to
compute up front how it is going to deal with odd number cases where
the number of objects doesn’t perfectly divide evenly into the wide
registers. This may have an additional convenience for when the user
has a large array of rects already, but only needs a subset,
preventing the need to make a new copy.
srcrect or dstrect arrays themselves being NULL/empty probably could
be handled efficiently by separating into specialized versions early
before entering into the inner loops.
I’m a little ambivalent about flip. It seems like for performance, the
user should have pre-oriented the texture. On the otherhand, since it
is already in the core SDL API, consistency is nice, and I don’t think
this needs to incur a noticeable cost as it can also be separated out
into specialized versions early.
-EricOn 8/1/15, .3lite wrote: