[PATCH] SDL 1.3: Drawing from one texture to another

That might be helpful, but what I’d really like is to not have to keep track of it at
all, especially for setup functions that don’t actually involve drawing anything.
Look at SDL_video.h and check out how many functions operate on “the current
rendering context.” Once we get render targets put in, trying to keep track of
what the current rendering context is could end up becoming very hectic
very quickly.

I think the key here is that it looks like Mason’s Delphi unit would
like to change the context sometimes (when there’s more than one
SDL_Window objects embedded in the same Delphi form, he was saying),
so he’d like to set it back the way it was after, but there’s no
SDL_GetRenderer() to know which SDL_WindowID is currently active, so
he needs to fake it, by keeping the last value passed into his
SDL_SelectRenderer() wrapper in some global variable of his wrapper,
when this information is most certainly available inside of SDL (did I
get this correctly, Mason?).

There is a glXGetCurrentContext, he’d need the SDL equivalent.From: pphaneuf@gmail.com (Pierre Phaneuf)
Subject: Re: [SDL] [PATCH] SDL 1.3: Drawing from one texture to another

That might be helpful, but what I’d really like is to not have to keep track
of it at
all, especially for setup functions that don’t actually involve drawing
anything.
Look at SDL_video.h and check out how many functions operate on “the current
rendering context.”? Once we get render targets put in, trying to keep track
of
what the current rendering context is could end up becoming very hectic
very quickly.

I think what I’d say is to look at how OpenGL does things with its
framebuffer objects (or uniform buffer objects? or whatever it is that
you can render to and then blit elsewhere, I’m not that much of a GL
guy!)…

Also, to keep in mind, from what I understand of the upcoming “Longs
Peak” new OpenGL version, there’s supposed to be less (possibly
nothing?) in the global context, and more that is passed in
parameters, kind of like we’re saying now, to the point where they
might be support multi-threading in OpenGL (I guess it only make sense
if you access one drawable per thread or something?).

So change is in the making…On Fri, Jul 3, 2009 at 7:35 PM, Mason Wheeler wrote:


Bob Pendleton wrote:

You might be tempted to say my explanation boils down to “because it
has always been done that way” which is th worst possible answer
(Admiral Grace would haunt me forever for saying that.) What I am
saying is that back in the bad old days there were real good reasons
for doing that way and as a result it might still be a good idea to do
it that way even though it might be possible to do it a complete
different way now.

Ironically, the global state tends to lead to extra state changes.

With global state:

  • Call SetRenderer (which always causes a state change, because that’s
    the way the system is designed).
  • Call rendering functions.

Without global state:

  • Call rendering functions, passing a reference to the renderer as an
    argument.
  • Inside the rendering function, compare the passed renderer to the
    current renderer, and only perform a state change if they differ.

Now, you could perform this check even with global state, either inside
SetRenderer or in the client code. However, because there are two
places where the check can take place, it’s easy to accidentally perform
the check twice, or not at all.

In conclusion: the one supposed advantage of using global state is
actually a disadvantage. Global state is evil, and should never be used.–
Rainer Deyke - rainerd at eldwood.com

Global state keeps it simpler for when you just have one Screen, one Mouse,
one Keyboard - and is how SDL 1.2 acts (since it only has one of each).

However, for joysticks in SDL 1.2 you pass in the joystick ID for each
function. So with an OO wrapper for it, you store the joystick id, so the
call becomes… joystick.method() rather than joy_method(joystick_id).
Just as easy as global state in an OO wrapper.

Would be nice to be consistent across all of SDL, rather than having
separate .

Global state is required for some drivers. Eg, opengl uses state for the
context.

It might be useful to start writing down the options as proposals, and
enumerate all of the pros/cons for each.

Proposal 1 (global state with default NULL args):======================================

Make passing in NULL, call the getCurrent* call.

getCurrentMouseId(), getCurrentContext(), getCurrentKeyId() etc would allow
easy emulation of global state.

That way each video API call just requires to put in a NULL if you want it
to use the current.

It also allows external modules to store the ids for which ones they want to
work on. So they don’t need to
- call getCurrent*(),
- compare current to what I want,
- then setCurrent*().

The getCurrent* calls and state are required anyway, since some drivers
require global state.

Proposal 2 (No global state):

All state is stored by the application.

Pass in the context/ID for every call.

No need for getCurrent*/setCurrent* calls in SDL (or is there?).

No need for every SDL driver to implement state(video, keyboard, joystick,
etc on every platform). Only the drivers that require state to be stored,
store state. Eg, optimising state changes.

Note, for OO wrappers of SDL storing the state in a Window/Keyboard/Joystick
object is almost a no-op.

Ren? Dudfield wrote:

Proposal 1 (global state with default NULL args):
Proposal 2 (No global state):

I much prefer proposal 2 over proposal 1, because:

  • Proposal 1 is bug-prone on the client side. If you accidentally
    pass a null argument, you get global state instead of a clean error.
  • Proposal 1 is bug-prone on the implementation side, since there is
    more functionality to implement and test.
  • Proposal 1 encourages sloppy client code. By having global state in
    the library, the client is encouraged to use it.
  • In a way, proposal 1 is even more dangerous than just having global
    state, since the use of global state can lurk undetected.–
    Rainer Deyke - rainerd at eldwood.com

I much prefer proposal 2 over proposal 1, because:
?- Proposal 1 is bug-prone on the client side. ?If you accidentally
pass a null argument, you get global state instead of a clean error.
?- Proposal 1 is bug-prone on the implementation side, since there is
more functionality to implement and test.
?- Proposal 1 encourages sloppy client code. ?By having global state in
the library, the client is encouraged to use it.
?- In a way, proposal 1 is even more dangerous than just having global
state, since the use of global state can lurk undetected.

+1On Mon, Jul 6, 2009 at 1:55 AM, Rainer Deyke wrote:


Global state does have a few advantages:

  1. Fewer arguments. This doesn’t seem like much, and it isn’t really, but
    each additional argument adds a PUSH in the calling function as well as
    additional code in the function itself.
  2. When translating a passed state system to global state hardware or system
    calls, there is an additional (potentially very high) cost for switching
    states, or at the very least, checking to see if a state change is required.
    Going the other way around (global state to passed state) is nowhere near as
    expensive.
  3. Backwards compatibility. When switching from a single state system to a
    multistate system, global state requires the addition of functions to get and
    set state but the functions which make use of that state need no changes to
    their declaration. With passed state you need to add an additional parameter
    to each function that uses that state.

It should also be noted that these are not mutually exclusive. You can make a
global state wrapper around a passed state function or vice versa. This may
be the best option for backwards compatibility.On Monday, 6 July 2009 02:21:10 Pierre Phaneuf wrote:

On Mon, Jul 6, 2009 at 1:55 AM, Rainer Deyke wrote:

I much prefer proposal 2 over proposal 1, because:

  • Proposal 1 is bug-prone on the client side. If you accidentally
    pass a null argument, you get global state instead of a clean error.
  • Proposal 1 is bug-prone on the implementation side, since there is
    more functionality to implement and test.
  • Proposal 1 encourages sloppy client code. By having global state in
    the library, the client is encouraged to use it.
  • In a way, proposal 1 is even more dangerous than just having global
    state, since the use of global state can lurk undetected.

+1

Kenneth Bull wrote:

Global state does have a few advantages:

  1. Fewer arguments. This doesn’t seem like much, and it isn’t really, but
    each additional argument adds a PUSH in the calling function as well as
    additional code in the function itself.

I agree, this isn’t much.

  1. When translating a passed state system to global state hardware or system
    calls, there is an additional (potentially very high) cost for switching
    states, or at the very least, checking to see if a state change is required.
    Going the other way around (global state to passed state) is nowhere near as
    expensive.

When the global state in question is a 32 bit handle, this is dirt
cheap. No need to query the underlying system, and no need to make
change the underlying state unless a state change is necessary.

static handle global_handle;

inline void set_global_handle(handle h) {
if (h != global_handle) {
set_underlying_state(h);
global_handle = h;
}
}

void do_something(handle h, …) {
set_global_handle(h);

}

  1. Backwards compatibility. When switching from a single state system to a
    multistate system, global state requires the addition of functions to get and
    set state but the functions which make use of that state need no changes to
    their declaration. With passed state you need to add an additional parameter
    to each function that uses that state.

Note that SDL 1.3 already breaks backwards compatibility. Now is the
ideal time to make compatibility-breaking changes.–
Rainer Deyke - rainerd at eldwood.com

Kenneth Bull wrote:

Global state does have a few advantages:

  1. Fewer arguments. This doesn’t seem like much, and it isn’t really,
    but each additional argument adds a PUSH in the calling function as well
    as additional code in the function itself.

I agree, this isn’t much.

It adds up. And while I understand the general trend is for hardware to get
faster over time, there’s no reason why we should slow software down at the
same rate. It really would be nice to see optimised code once in a while at
least, especially in a multimedia library.

  1. When translating a passed state system to global state hardware or
    system calls, there is an additional (potentially very high) cost for
    switching states, or at the very least, checking to see if a state change
    is required. Going the other way around (global state to passed state) is
    nowhere near as expensive.

When the global state in question is a 32 bit handle, this is dirt
cheap. No need to query the underlying system, and no need to make
change the underlying state unless a state change is necessary.

It may be only a 32 bit handle at your end, but on the system side that handle
is often associated with large blocks of memory, a lot of port IO and who
knows what else. Even if on the system side all that’s involved is setting a
pointer, there’s still a system call involved which is potentially quite
expensive all by it’s self. And, of course, you still have the if statement
to check if the change is required at all.

  1. Backwards compatibility. When switching from a single state system to
    a multistate system, global state requires the addition of functions to
    get and set state but the functions which make use of that state need no
    changes to their declaration. With passed state you need to add an
    additional parameter to each function that uses that state.

Note that SDL 1.3 already breaks backwards compatibility. Now is the
ideal time to make compatibility-breaking changes.

SDL 1.3 is supposed to break ABI compatibility, not API compatibility in
general. That is, code must be recompiled to work with the new library, but
no other change should be necessary. Changing a function’s declaration breaks
both, adding new functions does not.On Monday, 6 July 2009 16:26:32 Rainer Deyke wrote:

Kenneth Bull wrote:

  1. When translating a passed state system to global state hardware or
    system calls, there is an additional (potentially very high) cost for
    switching states, or at the very least, checking to see if a state change
    is required. Going the other way around (global state to passed state) is
    nowhere near as expensive.

When the global state in question is a 32 bit handle, this is dirt
cheap. No need to query the underlying system, and no need to make
change the underlying state unless a state change is necessary.

It may be only a 32 bit handle at your end, but on the system side that handle
is often associated with large blocks of memory, a lot of port IO and who
knows what else. Even if on the system side all that’s involved is setting a
pointer, there’s still a system call involved which is potentially quite
expensive all by it’s self. And, of course, you still have the if statement
to check if the change is required at all.

Wrong and wrong.

Part of the point of using a 32-bit handle is that all you have to
check is the handle itself, not the associated data structure. That only has to be
referenced if the handle has changed, which would happen anyway even if you kept
global state.

As for the “statement to check if the change is required at all”, the overhead there
is negligible. This if statement translates to three ASM statements at most: Load
one value into a register (32 bits, that works pretty well), compare the register value
against a second value, which is most likely in the L1 or L2 cache, and a JZ or JNZ
opcode to implement the branching. That’s all. Nobody’s going to notice the
performance hit from that unless they’re in a very, very tight loop, and if you’re
switching contexts in a tight loop you’re probably doing something wrong.From: llubnek@gmail.com (Kenneth Bull)

Subject: Re: [SDL] [PATCH] SDL 1.3: Drawing from one texture to another

On Monday, 6 July 2009 16:26:32 Rainer Deyke wrote:

It adds up. ?And while I understand the general trend is for hardware to get
faster over time, there’s no reason why we should slow software down at the
same rate. ?It really would be nice to see optimised code once in a while at
least, especially in a multimedia library.

The general trend is actually for hardware to get more parallel, and
global state kills that right off, neat and simple.

SDL 1.3 is supposed to break ABI compatibility, not API compatibility in
general. ?That is, code must be recompiled to work with the new library, but
no other change should be necessary. ?Changing a function’s declaration breaks
both, adding new functions does not.

Isn’t that already broken? Changes to the layout of some structures,
for example?On Mon, Jul 6, 2009 at 6:16 PM, Kenneth Bull wrote:


Layout’s an ABI matter. Have the signatures of any of the functions from the public
interface changed? I think that’s what he’s referring to.

SDL 1.3 is supposed to break ABI compatibility, not API compatibility in
general. That is, code must be recompiled to work with the new library, but
no other change should be necessary. Changing a function’s declaration breaks
both, adding new functions does not.

Isn’t that already broken? Changes to the layout of some structures,
for example?From: pphaneuf@gmail.com (Pierre Phaneuf)
Subject: Re: [SDL] [PATCH] SDL 1.3: Drawing from one texture to another

Layout’s an ABI matter.? Have the signatures of any of the functions from
the public
interface changed?? I think that’s what he’s referring to.

I thought a few members disappeared, actually, but those might have
been bugs, and possibly corrected since then.

Isn’t anything being deprecated/removed? SDL_PumpEvents? The “get
keyboard/joystick/mouse/foobar state” family of functions? Please? :-)On Mon, Jul 6, 2009 at 7:50 PM, Mason Wheeler wrote:


Wrong and wrong.

Part of the point of using a 32-bit handle is that all you have to
check is the handle itself, not the associated data structure. That only
has to be referenced if the handle has changed, which would happen anyway
even if you kept global state.

This is true. But with global state it is much more obvious to the programmer
using the library when such a change will occur (that is, only when they ask
for it).

As for the “statement to check if the change is required at all”, the
overhead there is negligible. This if statement translates to three ASM
statements at most: Load one value into a register (32 bits, that works
pretty well), compare the register value against a second value, which is
most likely in the L1 or L2 cache, and a JZ or JNZ opcode to implement the
branching. That’s all. Nobody’s going to notice the performance hit from
that unless they’re in a very, very tight loop, and if you’re switching
contexts in a tight loop you’re probably doing something wrong.

That check occurs even when the state is not changed. So while I agree that
changing state in a tight loop is bad (which also means state changes should
be more obvious), the performance loss from that if statement is still there.

Here’s the logic with global state:

doSomethingRelatedToState();
storedstate = getState();
setState(requiredstate);
for (i = 0; i < 5; ++i)
doSomethingRelatedToState();
setState(storedstate);
doSomethingRelatedToState();

which expands to:

doSomethingRelatedToState();
storedstate = getState();
setState(requiredstate);
doSomethingRelatedToState();
doSomethingRelatedToState();
doSomethingRelatedToState();
doSomethingRelatedToState();
doSomethingRelatedToState();
setState(storedstate);
doSomethingRelatedToState();

Here’s the logic with passed state:

doSomethingRelatedToState(state0);
for (i = 0; i < 5; ++i)
doSomethingRelatedToState(state1);
doSomethingRelatedToState(state0);

but this expands to:

state = state0;
_doSomethingRelatedToState();
for (i = 0; i < 5; ++i) {
if (state != state1)
state = state1;
_doSomethingRelatedToState();
}
if (state != state0)
state = state0;
_doSomethingRelatedToState();

which expands to:
state = state0;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state0)
state = state0;
_doSomethingRelatedToState();

Which of these looks better to you?On Monday, 6 July 2009 18:27:47 Mason Wheeler wrote:

The general trend is actually for hardware to get more parallel, and
global state kills that right off, neat and simple.

you could have a separate state per thread, as long as the hardware/system
calls aren’t global state. Otherwise, you really should just use a mutex.

Isn’t that already broken? Changes to the layout of some structures,
for example?

Mouse state functions break API compatibility because they add an index
parameter (I’ve already submitted a bug report for this and, surprise, it’s
still open, assigned, and apparently being worked on). Changing the layout of
structures breaks ABI compatibility, not API compatibility, except in cases
where the user code is doing stuff it really shouldn’t anyway.On Monday, 6 July 2009 19:47:40 Pierre Phaneuf wrote:

Wrong and wrong.

Part of the point of using a 32-bit handle is that all you have to
check is the handle itself, not the associated data structure. That
only
has to be referenced if the handle has changed, which would happen anyway
even if you kept global state.

This is true. But with global state it is much more obvious to the
programmer
using the library when such a change will occur (that is, only when they
ask
for it).

As for the “statement to check if the change is required at all”, the
overhead there is negligible. This if statement translates to three ASM
statements at most: Load one value into a register (32 bits, that works
pretty well), compare the register value against a second value, which is
most likely in the L1 or L2 cache, and a JZ or JNZ opcode to implement
the
branching. That’s all. Nobody’s going to notice the performance hit
from
that unless they’re in a very, very tight loop, and if you’re switching
contexts in a tight loop you’re probably doing something wrong.

That check occurs even when the state is not changed. So while I agree
that
changing state in a tight loop is bad (which also means state changes
should
be more obvious), the performance loss from that if statement is still
there.

Here’s the logic with global state:

doSomethingRelatedToState();
storedstate = getState();
setState(requiredstate);
for (i = 0; i < 5; ++i)
doSomethingRelatedToState();
setState(storedstate);
doSomethingRelatedToState();

which expands to:

doSomethingRelatedToState();
storedstate = getState();
setState(requiredstate);
doSomethingRelatedToState();
doSomethingRelatedToState();
doSomethingRelatedToState();
doSomethingRelatedToState();
doSomethingRelatedToState();
setState(storedstate);
doSomethingRelatedToState();

Here’s the logic with passed state:

doSomethingRelatedToState(state0);
for (i = 0; i < 5; ++i)
doSomethingRelatedToState(state1);
doSomethingRelatedToState(state0);

but this expands to:

state = state0;
_doSomethingRelatedToState();
for (i = 0; i < 5; ++i) {
if (state != state1)
state = state1;
_doSomethingRelatedToState();
}
if (state != state0)
state = state0;
_doSomethingRelatedToState();

which expands to:
state = state0;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state1)
state = state1;
_doSomethingRelatedToState();
if (state != state0)
state = state0;
_doSomethingRelatedToState();

Which of these looks better to you?

The first one is buggy, since you need to check the state. So they both
expand to the same in the end, unless you want bugs.

Likely bugs with first approach:

  • might be using in a thread.
  • separate components might change the state. So not passing in the state
    explicitly, or checking the state causes bugs.On Tue, Jul 7, 2009 at 10:29 AM, Kenneth Bull wrote:

On Monday, 6 July 2009 18:27:47 Mason Wheeler wrote:

This is what mutexes are for. You should really be using them already anyway.On Monday, 6 July 2009 20:28:46 Ren? Dudfield wrote:

The first one is buggy, since you need to check the state. So they both
expand to the same in the end, unless you want bugs.

Likely bugs with first approach:

  • might be using in a thread.
  • separate components might change the state. So not passing in the
    state explicitly, or checking the state causes bugs.

The general trend is actually for hardware to get more parallel, and
global state kills that right off, neat and simple.

you could have a separate state per thread, as long as the hardware/system
calls aren’t global state. ?Otherwise, you really should just use a mutex.

Well, global state is global, right, not per thread. And using a mutex
kinds of negates the point of threads (especially as you’d have to
keep it for long periods of time, or randomly insert “give up the
mutex, re-take it, and re-select the context” in your long
operations).

For example, you could have two separate windows (or a window and a
texture target, or two texture targets, soon enough) that you could
draw to using two threads, and there’s really no reason why I’d hold a
single common mutex for the two separate resources. It’s not obvious
at all to me that you really should use a mutex. If the underlying
platform can’t support this kind of thing, then SDL can hide that, but
by having the global state, it entirely precludes the possibility of
ever doing it, even on platforms that are fully multi-thread safe.

That said, I’m playing both sides here, because I’m not very big on
threads, and I think that there’s good value in having an API that
can’t be abused (if the API had all the trappings of being
thread-safe, but the implementation wasn’t really, it would work “most
of the time”, then fail in some difficult to debug ways, but a clearly
thread-unsafe API like this will fail spectacularly and reliably at
the first attempt to use it wrong). I just want people to realize that
this is painting SDL into a corner.

Maybe it’s a fine corner to be in, but when you do these things, you
should think about it first. :wink:

Isn’t that already broken? Changes to the layout of some structures,
for example?

Mouse state functions break API compatibility because they add an index
parameter (I’ve already submitted a bug report for this and, surprise, it’s
still open, assigned, and apparently being worked on). ?Changing the layout of
structures breaks ABI compatibility, not API compatibility, except in cases
where the user code is doing stuff it really shouldn’t anyway.

Hmm, I think you meant the reverse, right? ABI <=> API?

There’s no “select mouse” function, like for the rest? Or is that the
fix? Haven’t used that bit yet…On Mon, Jul 6, 2009 at 8:36 PM, Kenneth Bull wrote:


Well, global state is global, right, not per thread. And using a mutex
kinds of negates the point of threads (especially as you’d have to
keep it for long periods of time, or randomly insert “give up the
mutex, re-take it, and re-select the context” in your long
operations).

I think in some cases it ends up thread local anyway.

For example, most of the “state like” variables in SDL would end up copied to
a new thread. Once in that new thread, changes there would generally not be
reflected back to the main thread.

If these variables were referred to through pointers instead, then it would
remain global, but they aren’t now.

For example, you could have two separate windows (or a window and a
texture target, or two texture targets, soon enough) that you could
draw to using two threads, and there’s really no reason why I’d hold a
single common mutex for the two separate resources. It’s not obvious
at all to me that you really should use a mutex. If the underlying
platform can’t support this kind of thing, then SDL can hide that, but
by having the global state, it entirely precludes the possibility of
ever doing it, even on platforms that are fully multi-thread safe.

You’d probably end up holding a mutex for the video system in general. This I
do not recommend, but fortunately the texture/renderer/multi-window system is
new in 1.3 so backwards compatibility at least is not an issue. Using a
passed state system there does not break any existing API from 1.2.

That said, I’m playing both sides here, because I’m not very big on
threads, and I think that there’s good value in having an API that
can’t be abused (if the API had all the trappings of being
thread-safe, but the implementation wasn’t really, it would work “most
of the time”, then fail in some difficult to debug ways, but a clearly
thread-unsafe API like this will fail spectacularly and reliably at
the first attempt to use it wrong). I just want people to realize that
this is painting SDL into a corner.

Maybe it’s a fine corner to be in, but when you do these things, you
should think about it first. :wink:

Personally, I like threads. If I could I’d have a separate thread for
networking, another for video, a third for sound, fourth for events, and
fifth, sixth, whatever, for everything else. But I know better.

In general, I prefer to assume the programmer is fairly intelligent and knows
what he/she is doing. That said, I prefer API’s that don’t put many
restrictions on how they can be used. I personally believe SDL should support
multithreading, but let the user handle mutexes and such to avoid the
performance penalty in single threaded applications. Where such mutexes might
be necessary should, of course, be thoroughly documented though…

Hmm, I think you meant the reverse, right? ABI <=> API?

ABI => Binary
API => Programming/source
changing position of members in structures breaks ABI, but not API.
changing function declarations breaks both.

There’s no “select mouse” function, like for the rest? Or is that the
fix? Haven’t used that bit yet…

There is actually a SDL_SelectMouse() function in SDL 1.3, but the state it
sets is only ever used by the cursor functions, not by the rest of the mouse
API (though the source documentation says otherwise). It’s listed under bug
758.On Monday, 6 July 2009 20:47:13 Pierre Phaneuf wrote:

Most threads use a loop of some sort to repeatedly perform some operation.
Usually, you want the operation itself to happen fairly quickly, but how much
time passes between each cycle is not particularly important.

Considering all that, how to set up your mutexes and state changes is fairly
obvious:

while(running) {
lockMutex();
setState(mystate);

performOperationUsingState();

unlockMutex();

}

Any code in another thread that you’re likely to care about then runs only
after unlockMutex and before the next lockMutex.

And again, you need that mutex even if you’re using passed state. The mutex
code could be placed in whatever library function you’re calling, but that
would be very bad, since locking a mutex is fairly expensive.On Monday, 6 July 2009 20:47:13 Pierre Phaneuf wrote:

On Mon, Jul 6, 2009 at 8:36 PM, Kenneth Bull<@Kenneth_Bull> wrote:

The general trend is actually for hardware to get more parallel, and
global state kills that right off, neat and simple.

you could have a separate state per thread, as long as the
hardware/system calls aren’t global state. Otherwise, you really should
just use a mutex.

Well, global state is global, right, not per thread. And using a mutex
kinds of negates the point of threads (especially as you’d have to
keep it for long periods of time, or randomly insert “give up the
mutex, re-take it, and re-select the context” in your long
operations).