Feasibility/correctness of calling GL in another thread

godlike · January 14, 2014, 8:05pm

Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:

Code:
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | …);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(…);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}

In the rendering thread:

Code:
SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}

The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?------------------------
Panagiotis Christopoulos Charitos
AnKi 3D Engine (http://www.anki3d.org/)

Stefanos_A · January 16, 2014, 8:25am

This should work, provided your GPU drivers can do context sharing without
going belly up. (This includes first-gen Atoms with PowerVR IGPs and some
Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.

That said, why do you need two OpenGL contexts?

2014/1/14 godlike > Hi all,

In the game engine that I am working on, I am designing a rendering threadthat essentially executes all OpenGL calls (including SDL_GL_SwapWindow)
instead of the main thread. The problem is that I am not quite sure if the
scenario that I have in mind is something safe or it will lead to undefined
behavior.

The idea is that in the main thread will:

Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | …);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable
context sharing
window = SDL_CreateWindow(…);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}

In the rendering thread:

Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}

The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in
another thread and that I am doing some SDL stuff in the main thread and
others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?

Panagiotis Christopoulos Charitos
AnKi 3D Engine http://www.anki3d.org/

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Jonas_Kulla · January 16, 2014, 10:08am

2014/1/14 godlike

Hi all,

In the game engine that I am working on, I am designing a rendering threadthat essentially executes all OpenGL calls (including SDL_GL_SwapWindow)
instead of the main thread. The problem is that I am not quite sure if the
scenario that I have in mind is something safe or it will lead to undefined
behavior.

The idea is that in the main thread will:

Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | …);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable
context sharing
window = SDL_CreateWindow(…);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}

In the rendering thread:

Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}

The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in
another thread and that I am doing some SDL stuff in the main thread and
others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?

I’m doing almost exactly the same thing as you described in my engine: do
polling/processing of
SDL events and setting state of the window in the main thread, and doing
the rendering in another
dedicated thread. The only difference is that I create the window in the
main thread, pass that
pointer into the rendering thread, and create the GL context there (I also
use only one thread).

Haven’t had any problems with this setup on Mac/Linux (Windows untested,
but should be fine).

Jonas_Kulla · January 16, 2014, 10:09am

2014/1/16 Jonas Kulla <@Jonas_Kulla>

I’m doing almost exactly the same thing as you described in my engine: do
polling/processing of
SDL events and setting state of the window in the main thread, and doing
the rendering in another
dedicated thread. The only difference is that I create the window in the
main thread, pass that
pointer into the rendering thread, and create the GL context there (I also
use only one thread).

Haven’t had any problems with this setup on Mac/Linux (Windows untested,
but should be fine).

Whoops, meant to say “I also only use one GL context”.

slimshader · January 16, 2014, 2:29pm

Stefanos A. wrote:

This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.

That said, why do you need two OpenGL contexts?

2014/1/14 godlike <godlike at ancient-ritual.com (godlike at ancient-ritual.com)>
  Hi all,
In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:
Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | ...);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(…);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
? ? poll_input_and_joystick_events_using_SDL()
? ? do_other_things()
}

In the rendering thread:
Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
? ? execute_GL_calls()
? ? SDL_GL_SwapWindow(window);
}

The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?

Panagiotis Christopoulos Charitos
AnKi 3D Engine (http://www.anki3d.org/)

SDL mailing list
SDL at lists.libsdl.org (SDL at lists.libsdl.org)
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org (http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org)

I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled background uploding (and 2nd ctx) in the end on iOS, didn’t have enoych time to investigate

Forest_Hale · January 16, 2014, 7:22pm

Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don’t use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I’m sure this isn’t free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that’s some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn’t gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.On 01/16/2014 06:29 AM, slimshader wrote:

Stefanos A. wrote:
This should work, provided your GPU drivers can do context sharing without going belly up. (This includes first-gen Atoms with PowerVR IGPs and some Core / Core2 mobile IGPs with old drivers.)

MonoGame does the exact same thing and it appears to be working fine.

That said, why do you need two OpenGL contexts?

2014/1/14 godlike <>

Quote:
Hi all,

In the game engine that I am working on, I am designing a rendering thread that essentially executes all OpenGL calls (including SDL_GL_SwapWindow) instead of the main thread. The problem is that I am
not quite sure if the scenario that I have in mind is something safe or it will lead to undefined behavior.

The idea is that in the main thread will:

Code:

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_JOYSTICK | SDL_INIT_EVENTS | …);
SDL_GL_SetAttribute(SDL_GL_SHARE_WITH_CURRENT_CONTEXT, 1); // Enable context sharing
window = SDL_CreateWindow(…);
context_A = SDL_GL_CreateContext(window); // Create context A
context_B = SDL_GL_CreateContext(window); // Create context B
SDL_GL_MakeCurrent(window, context_A); // Make context A current
start_rendering_thread()
// Game loop begins
while(true) {
poll_input_and_joystick_events_using_SDL()
do_other_things()
}

In the rendering thread:

Code:

SDL_GL_MakeCurrent(window, context_B); // Make context B current
while(true) {
execute_GL_calls()
SDL_GL_SwapWindow(window);
}

The thing that bugs me is that I am calling SDL_GL_SwapWindow(window) in another thread and that I am doing some SDL stuff in the main thread and others in the rendering thread (polling events).

What do are your thoughts? Will this work or not?

Panagiotis Christopoulos Charitos
AnKi 3D Engine http://www.anki3d.org/

SDL mailing list

http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

I had no problems with 2 contexts on Windows and Mac but I got crashes on iOS. I used 2nd GL context to upload textures in the background, while main thread was doing the rendering. I disabled
background uploding (and 2nd ctx) in the end on iOS, didn’t have enoych time to investigate

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

slimshader · January 21, 2014, 9:31am

Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?

That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.

Forest Hale wrote:> Short version:

Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don’t use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I’m sure this isn’t free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that’s some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn’t gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.

Forest_Hale · January 21, 2014, 10:23am

The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.

Hence don’t use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people
won’t care about microstutter/hitching on a loading screen, it will still be pretty smooth because you’re still running all your file I/O and other heavy operations on the other thread.On 01/21/2014 01:31 AM, slimshader wrote:

Very interesting stuff, thanks a lot for sharing. Is there anything more you could provide on the topic (links possibly) ?

That said, I do not intend to use it for performance critical stuff but rather for loading screen. Main thread renders loading animation while background thread uploads whole level along with
textures. In fact I did notice that this takes slightly longer than doing everything in main but user experience is much better with main thread still operational, showing anims and gameplay tips.

Forest Hale wrote:
Short version:
Never use shared contexts for performance-conscious code, it costs way more than the failed (more on that later) overlap of the texture uploads.

Long version:
During early development of a major product (Steam Big Picture Mode) in the past that used multiple contexts for background uploading of OpenGL textures, we were told by multiple desktop GPU vendors
that the drivers flatly mutex every OpenGL call when you have shared contexts, this can result in major (~20%) fps loss even if you don’t use the other context at all, it gets worse if you do, and in
particular the texture upload does NOT happen in parallel with rendering due to that mutexing.

So my advice is never do this, we changed the product to not do this before launch because it was completely not performant, we had been struggling to keep up 60fps until we did, then it easily
exceeded 200fps with that one change.

The hitching of texture uploads is pretty much unavoidable in OpenGL ES (iOS, Android, etc), on desktop OpenGL you can somewhat hide it with GL_ARB_pixel_buffer_object - where you glMapBuffer on the
main thread and then write the pixels from another thread, when done you glUnmapBuffer on the main thread and then issue the glTexImage2D with the pixel buffer object bound, so that it sources its
pixels from that object rather than blocking on a client memory copy, but I’m sure this isn’t free and I have not tried it in practice, it also requires that you more or less queue your uploads for
the main thread to prepare in stages so that’s some lovely ping-pong there.

While I too would greatly appreciate the addition of some background object upload functionality in OpenGL, or even an entire deferred command buffer system (I proposed this in a hardware-agnostic way
but it didn’t gain traction), the reality today is that OpenGL contexts and threading are completely non-viable.

I should note that Doom 3 BFG Edition seems to use a glMapBuffer on each of 3 buffer objects (vertex, index, uniforms) at the beginning of the frame, queue jobs for all of the processing it wants to
do, so that threads write into those mapped buffers, and then at end of frame it does the glUnmapBuffer and walks its own command list to issue all the real GL calls that depend on that data - this
works very well, but is out of the scope of most OpenGL threading discussions.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

slimshader · January 21, 2014, 11:35am

Forest Hale wrote:

The problem is that as long as there are shared contexts, you incur the massive performance penalty - even if all calls are from one thread.

Hence don’t use them - even if this means you have to queue texture uploads and vertex/index buffer creation and such for the main thread (showing the loading screen) to handle at its leisure, people
won’t care about microstutter/hitching on a loading screen, it will still be pretty smooth because you’re still running all your file I/O and other heavy operations on the other thread.

What you are saying is really scary. You mean that even after I loaded a level using 2nd shared ctx and it is not used anymore, mere fact that it exists causes main thread context to go through some kind of locking mechanism? What if I then destroy 2nd ctx? Does the lock go too?

Is it specific to a driver or a platform? I deal with Win and iOS. Is it specific to GL version used? I am still limiting myself to GL 1.0 as there is too much driver issues on Win with anything above and I am doing 2D games.

Jared_Maddox · January 21, 2014, 6:24pm

thread
Message-ID: <1390304128.m2f.41586 at forums.libsdl.org>
Content-Type: text/plain; charset=“iso-8859-1”

Forest Hale wrote:

The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.

Hence don’t use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won’t care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you’re still running all your file I/O and other
heavy operations on the other thread.

What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?

There is simply no way for the driver to know that you won’t be using
that context if it’s still around, so how can it do otherwise?
Graphics card vendors don’t normally sell programs intended to
optimize your NON-graphics code, and knowing that you won’t be using
the context again basically falls into the same category of things as
that.

What if I then destroy 2nd ctx? Does the lock go too?

That will depend on the driver. Thus, you should assume “No”.

Is it specific to a driver or a platform?

I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who’s involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.

I deal with Win and iOS. Is it
specific to GL version used?

It’s possible that it could happen in DirectX as well. I don’t know if
they have any “lockless” APIs, but even if they do it doesn’t mean
that everyone implements it without locking.> Date: Tue, 21 Jan 2014 11:35:28 +0000

From: “slimshader” <szymon.gatner at gmail.com>
To: sdl at lists.libsdl.org
Subject: Re: [SDL] Feasibility/correctness of calling GL in another

Forest_Hale · January 21, 2014, 10:25pm

For Direct3D the HAL always locks (like OpenGL’s shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice
compared to OpenGL (which goes “full throttle” in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one
additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!).

Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered
exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series
and AMD FirePro series) but I do not have data on those.

I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded
usage, I don’t know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly
broken; I’m not placing bets.On 01/21/2014 10:24 AM, Jared Maddox wrote:

Date: Tue, 21 Jan 2014 11:35:28 +0000
From: “slimshader” <szymon.gatner at gmail.com>
To: sdl at lists.libsdl.org
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Message-ID: <1390304128.m2f.41586 at forums.libsdl.org>
Content-Type: text/plain; charset=“iso-8859-1”

Forest Hale wrote:

The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.

Hence don’t use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won’t care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you’re still running all your file I/O and other
heavy operations on the other thread.

What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?

There is simply no way for the driver to know that you won’t be using
that context if it’s still around, so how can it do otherwise?
Graphics card vendors don’t normally sell programs intended to
optimize your NON-graphics code, and knowing that you won’t be using
the context again basically falls into the same category of things as
that.

What if I then destroy 2nd ctx? Does the lock go too?

That will depend on the driver. Thus, you should assume “No”.

Is it specific to a driver or a platform?

I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who’s involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.

I deal with Win and iOS. Is it
specific to GL version used?

It’s possible that it could happen in DirectX as well. I don’t know if
they have any “lockless” APIs, but even if they do it doesn’t mean
that everyone implements it without locking.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

slimshader · January 22, 2014, 10:17am

Great stuff guys, thanks. I removed my 2nd ctx and now all do all texture operations on the main thread (2nd thread pushes texture data to queue and waits for main thread for upload). It was actually (surprisingly) easy to do with std::promise/future.

Level loads bit faster now and I have additional benefit of things working the same way on Win and iOS. Good to know that I should do the same for D3D implementation.

Forest Hale wrote:> For Direct3D the HAL always locks (like OpenGL’s shared contexts) but the locks are on resources rather than API entry points, so there is a performance loss inherent in that API design choice

compared to OpenGL (which goes “full throttle” in the single threaded case), this gives some scalability with threading but performance gains fall off sharply with additional threads (so one
additional thread may be justified but not more, unless you like wasting electricity on spin locks - and that second thread just brings you up to OpenGL performance!).

Multiple vendors for PC drivers directly told me that their OpenGL drivers lock on every call in case of shared contexts, they make no attempt at overlapping operations like this, it is considered
exotic behavior in the context of OpenGL API usage, something that games and other consumer apps do not do, it could be accelerated somewhat on their CAD-specific drivers (such as NVIDIA Quadro series
and AMD FirePro series) but I do not have data on those.

I would be quite wary of shared contexts on mobile operating systems such as iOS and Android as the driver vendors have been known to have countless bugs throughout their API even in single-threaded
usage, I don’t know how they handle shared contexts and it might vary by make and model. Or it could be the unicorn feature in their driver that always works despite everything else being randomly
broken; I’m not placing bets.

On 01/21/2014 10:24 AM, Jared Maddox wrote:

Date: Tue, 21 Jan 2014 11:35:28 +0000
From: “slimshader” <@slimshader>
To: sdl at lists.libsdl.org
Subject: Re: [SDL] Feasibility/correctness of calling GL in another
thread
Message-ID: <1390304128.m2f.41586 at forums.libsdl.org>
Content-Type: text/plain; charset=“iso-8859-1”

Forest Hale wrote:

The problem is that as long as there are shared contexts, you incur the
massive performance penalty - even if all calls are from one thread.

Hence don’t use them - even if this means you have to queue texture
uploads and vertex/index buffer creation and such for the main thread
(showing the loading screen) to handle at its leisure, people
won’t care about microstutter/hitching on a loading screen, it will still
be pretty smooth because you’re still running all your file I/O and other
heavy operations on the other thread.

What you are saying is really scary. You mean that even after I loaded a
level using 2nd shared ctx and it is not used anymore, mere fact that it
exists causes main thread context to go through some kind of locking
mechanism?

There is simply no way for the driver to know that you won’t be using
that context if it’s still around, so how can it do otherwise?
Graphics card vendors don’t normally sell programs intended to
optimize your NON-graphics code, and knowing that you won’t be using
the context again basically falls into the same category of things as
that.

What if I then destroy 2nd ctx? Does the lock go too?

That will depend on the driver. Thus, you should assume “No”.

Is it specific to a driver or a platform?

I believe that Forest (or was it someone else? it was a few days ago)
already said that he was told by someone who’s involved in the
production of video cards that it happens with everything. Indeed, it
would surely be extremely difficult, and maybe impossible, for it to
be otherwise.

I deal with Win and iOS. Is it
specific to GL version used?

It’s possible that it could happen in DirectX as well. I don’t know if
they have any “lockless” APIs, but even if they do it doesn’t mean
that everyone implements it without locking.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
LordHavoc
Author of DarkPlaces Quake1 engine - http://icculus.org/twilight/darkplaces
Co-designer of Nexuiz - http://alientrap.org/nexuiz
"War does not prove who is right, it proves who is left." - Unknown
"Any sufficiently advanced technology is indistinguishable from a rigged demo." - James Klass
"A game is a series of interesting choices." - Sid Meier

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Nathaniel_J_Fries · January 22, 2014, 5:52pm

You can do this with drivers that support context sharing, sure. But it would make for simpler code and be more portable to do the opposite: render in the main thread, process events in a secondary thread.

SDL_PumpEvents will still need to be called from the main thread for most OSes. But except for a couple of user-initiated loops on Windows, this should have no effect on framerate (I’ve benchmarked the equivalent code of SDL_PumpEvents and it usually takes about 5microsec to run on a Pentium 4 - 60fps requires loop time < 16ms, or 3000x that)------------------------
Nate Fries

Nathaniel_J_Fries · January 23, 2014, 5:29am

You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you’re developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren’t doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It’s also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).------------------------
Nate Fries

Jonathan_Greig · January 23, 2014, 6:33am

Having a tail pointer in a singly linked list is always a good idea when
optimizing for performance. It makes all items appended to the end of the
list or removed from the end of the list faster because it takes constant
time O(1) and if you are accessing the last element frequently, that’s
icing on the cake :)On Jan 22, 2014 11:29 PM, “Nathaniel J Fries” wrote:

You should also question whether you need a second thread at all. In the
times that processor frequencies averaged in the lower megahertz range, it
made sense to do non-graphcal processing on another thread which was
capable of simulating concurrency with the graphical thread (gaming systems
were single-pprocessor back then). But that was the '90s, and this is the
2010s - processor frequencies on gaming rigs can be as much as 20x higher
than in the nineties, and while multi-core processors have made the use of
threads even less costly, they have done nothing to alleviate the design
issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread
depending on your needs. But unless this engine you’re developing is
strictly in-house, needs are for the programmer using the engine to decide,
and not the engine itself - the aim of the engine merely ought to be to
provide an easier means to meeting such needs.

If you aren’t doing this to leverage multicore execution (which would most
likely be a premature optimization; the root of all programming evils), but
for concurrency, there are also better options. You might consider a task
queue (which carries the benefit that it can easily be made multi-threaded
if the programmer using the engine does find need to leverage multicore
execution, without at all necessitating it; it also carries lower execution
overhead than context switching, which is necessary for multithreading on a
uniprocessor or overburdened multiprocessor system; and may even wind up
costing less memory [once you consider all the locks, thread-local
variables, and the memory for the context itself]. It’s also extremely
simple to implement - in C or C++, it can be implemented as nothing more
than a singly-linked-list of function pointers [having a tail pointer may
make things even simpler and faster]).

Nate Fries

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

slimshader · January 23, 2014, 10:13am

I hear you, but I am in the niche of older machines, in fact I am having trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in fact I think it is about time I implement D3D renderer)), also mobile devices are not what you’d consider 2010s “gaming rigs”. That being said, secondary thread is (as I said before) not used to speed-up level loading but rather to keep main (event processing, rendering) thread responsive.

I do use task queues (double-buffered) but they are per-thread. Since cross-thread tasks are very seldom I don’t want them having locks all the time.

BTW. Since we are on threading / GL topic: do you guys render from main thread? What are your update vs render step strategies? If you do them on separate threads how do you sync later (condition vars seem obvious choice)?.

Nathaniel J Fries wrote:> You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you’re developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren’t doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It’s also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).

slimshader · January 23, 2014, 10:18am

I might be missing something here but how do you even implement a list without tail pointer? You always keep at least one end otherwise it would be inaccessible. In any case, node-based lists suck

Jonathan Greig wrote:> Having a tail pointer in a singly linked list is always a good idea when optimizing for performance. It makes all items appended to the end of the list or removed from the end of the list faster because it takes constant time O(1) and if you are accessing the last element frequently, that’s icing on the cake On Jan 22, 2014 11:29 PM, “Nathaniel J Fries” <nfries88 at yahoo.com (nfries88 at yahoo.com)> wrote:

   	You should also question whether you need a second thread at all. In the times that processor frequencies averaged in the lower megahertz range, it made sense to do non-graphcal processing on another thread which was capable of simulating concurrency with the graphical thread (gaming systems were single-pprocessor back then). But that was the '90s, and this is the 2010s - processor frequencies on gaming rigs can be as much as 20x higher than in the nineties, and while multi-core processors have made the use of threads even less costly, they have done nothing to alleviate the design issues associated with it or the limitations in graphics drivers.
Which is not to say that it makes no sense to have another thread depending on your needs. But unless this engine you’re developing is strictly in-house, needs are for the programmer using the engine to decide, and not the engine itself - the aim of the engine merely ought to be to provide an easier means to meeting such needs.

If you aren’t doing this to leverage multicore execution (which would most likely be a premature optimization; the root of all programming evils), but for concurrency, there are also better options. You might consider a task queue (which carries the benefit that it can easily be made multi-threaded if the programmer using the engine does find need to leverage multicore execution, without at all necessitating it; it also carries lower execution overhead than context switching, which is necessary for multithreading on a uniprocessor or overburdened multiprocessor system; and may even wind up costing less memory [once you consider all the locks, thread-local variables, and the memory for the context itself]. It’s also extremely simple to implement - in C or C++, it can be implemented as nothing more than a singly-linked-list of function pointers [having a tail pointer may make things even simpler and faster]).

Nate Fries

SDL mailing list
SDL at lists.libsdl.org (SDL at lists.libsdl.org)
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org (http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org)

Stefanos_A · January 23, 2014, 11:02am

2014/1/23 slimshader <szymon.gatner at gmail.com>

I hear you, but I am in the niche of older machines, in fact I am having
trouble even getting users to get OpenGL 1.4 work correctly (for VBOs (in
fact I think it is about time I implement D3D renderer)), also mobile
devices are not what you’d consider 2010s “gaming rigs”. That being said,
secondary thread is (as I said before) not used to speed-up level loading
but rather to keep main (event processing, rendering) thread responsive.

I do use task queues (double-buffered) but they are per-thread. Since
cross-thread tasks are very seldom I don’t want them having locks all the
time.

BTW. Since we are on threading / GL topic: do you guys render from main
thread? What are your update vs render step strategies? If you do them on
separate threads how do you sync later (condition vars seem obvious
choice)?.

After trying several threading strategies, my current preference is to keep
rendering and window management to the main thread, but handle input on a
secondary thread. So far, this has proven the best method to maintain
responsiveness without impacting compatibility.

Regarding D3D… I prefer to use ANGLE to get OpenGL ES 2.0 on systems
without proper OpenGL support. This way, I only need to maintain two
renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows &
~(Nvidia | AMD)).

This way, I can also use shaders across the board. ANGLE works all the way
down to GMA 950 (and probably GMA 500/Poulsbo, although I haven’t tested
that), so there’s very little reason to use the fixed-function pipeline.
Microsoft recently announced they will be working with Google to port ANGLE
on WinPhones and Metro, so D3D will be strictly unnecessary going forward -
as an indie developer, this suits me perfectly.>

Nathaniel J Fries wrote:

You should also question whether you need a second thread at all. In the
times that processor frequencies averaged in the lower megahertz range, it
made sense to do non-graphcal processing on another thread which was
capable of simulating concurrency with the graphical thread (gaming systems
were single-pprocessor back then). But that was the '90s, and this is the
2010s - processor frequencies on gaming rigs can be as much as 20x higher
than in the nineties, and while multi-core processors have made the use of
threads even less costly, they have done nothing to alleviate the design
issues associated with it or the limitations in graphics drivers.

Which is not to say that it makes no sense to have another thread
depending on your needs. But unless this engine you’re developing is
strictly in-house, needs are for the programmer using the engine to decide,
and not the engine itself - the aim of the engine merely ought to be to
provide an easier means to meeting such needs.

If you aren’t doing this to leverage multicore execution (which would most
likely be a premature optimization; the root of all programming evils), but
for concurrency, there are also better options. You might consider a task
queue (which carries the benefit that it can easily be made multi-threaded
if the programmer using the engine does find need to leverage multicore
execution, without at all necessitating it; it also carries lower execution
overhead than context switching, which is necessary for multithreading on a
uniprocessor or overburdened multiprocessor system; and may even wind up
costing less memory [once you consider all the locks, thread-local
variables, and the memory for the context itself]. It’s also extremely
simple to implement - in C or C++, it can be implemented as nothing more
than a singly-linked-list of function pointers [having a tail pointer may
make things even simpler and faster]).

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

slimshader · January 23, 2014, 12:46pm

Stefanos A. wrote:

After trying several threading strategies, my current preference is to keep rendering and window management to the main thread, but handle input on a secondary thread. So far, this has proven the best method to maintain responsiveness without impacting compatibility.

But how that does help? If main thread is blocked, then you don’t refresh your screen to show the impact of processed events. In my experience event handling is tiny fraction of a frame. Do you mean that with 2nd thread event handling you avoid “busy” system cursor / window appearing to hang?

Regarding D3D… I prefer to use ANGLE to get OpenGL ES 2.0 on systems without proper OpenGL support. This way, I only need to maintain two renderers: OpenGL everywhere and OpenGL ES for smartphones and (Windows & ~(Nvidia | AMD)).

This way, I can also use shaders across the board. ANGLE works all the way down to GMA 950 (and probably GMA 500/Poulsbo, although I haven’t tested that), so there’s very little reason to use the fixed-function pipeline. Microsoft recently announced they will be working with Google to port ANGLE on WinPhones and Metro, so D3D will be strictly unnecessary going forward - as an indie developer, this suits me perfectly.
?

I had ANGLE on my radar but now you really got me interesting in this. I am only really interested in 2 platforms: Win and iOS, this means I would only need to maintain GL ES renderer. That would be great. Definitely going to look into it.

Stefanos_A · January 23, 2014, 2:27pm

2014/1/23 slimshader <szymon.gatner at gmail.com>

Stefanos A. wrote:

After trying several threading strategies, my current preference is to
keep rendering and window management to the main thread, but handle input
on a secondary thread. So far, this has proven the best method to maintain
responsiveness without impacting compatibility.

But how that does help? If main thread is blocked, then you don’t refresh
your screen to show the impact of processed events. In my experience event
handling is tiny fraction of a frame. Do you mean that with 2nd thread
event handling you avoid “busy” system cursor / window appearing to hang?

The point is not to improve performance but to minimize latency between the
user pressing a button and the world reacting to that button press.

If you handle input in your rendering thread, then any dip in the framerate
will increase input latency, which can be jarring (esp. on slower systems
that cannot maintain a stable framerate.) By spawning a separate thread for
input, the OS scheduler will “smoothen out” input latency even when your
framerate dips below 10 fps.

Of course, this only helps if your world update rate is decoupled from your
framerate. In my case, I will skip up to 12 frames in order to guarantee a
pseudo-fixed update rate. In other words, I prioritize world updates (60
updates/sec no matter what) and only render frames as a best-effort.

This way, if the player presses the “fire” trigger then she will shoot the
enemy immediately even if she is running at 5 fps.

If the input was handled in the same thread, then the “fire” button would
take up to 200ms to register - or it would be skipped completely, if the
player lifted her finger before the 200ms mark. This would place the player
at a severe disadvantage (hi, Diablo 3!)