“Sam” == Sam Lantinga writes:
Sam> [NOTE: I’m being blunt, so I’m sending this to you only. If
Sam> you’d like a less-blunt version sent to the list, I can do it
Sam> sometime after vacation.]
>> I don't see where the global variable comes in. I assume Sam's
>> thead library maintains some structure per thread. Add an error
>> pointer to that structure, and initialize it to zero. It is
>> meant to point to an instance of an error structure, but on
>> thread launch will contain NULL. A thread will then allocate an
>> error structure on its own, and call the new SDL_Thread_Error
>> function (or whatever it is called) which updates that thread's
>> error structure pointer. On error a thread will look for that
>> pointer, and if it is non zero the error structure it points to
>> will be filled in. If the programmer doesn't call
>> SDL_Thread_Error with a unique error structure, he deserves the
>> resultant bug(s).
Sam> That's the current implementation. Part of the trickiness is
Sam> the fact that per-thread variables aren't easy to implement
Sam> in a safe way.
>> The talk of lock/unlock/mutex...It leaves me thinking of race
>> conditions, lockups where both threads are waiting for the
>> other to unlock something (I won't release mine until he
>> releases his :^). In my mind all this is theory anyway. In my
>> bomber game where I replaced my sound fork() code with SDL's
>> sound callback mechanism, I had the main program pass sound
>> commands in a circular buffer with a put pointer that wraps
>> around. The callback function, in another thread (sdl audio is
>> done with threads I assume), looks into the same buffer but
>> with a different pointer ("take"). When take!=put, the audio
>> callback function will keep fetching commands until the
>> pointers are the same again. There is no locking, no conflicts,
>> and no problems.
Most likely bomber is broken. It may be broken in such a way that the
error won’t show up on a non SMP machine, and here’s even a chance
that the compiler has been nice to him and set things up in such a way
that he won’t see the problem at all, but more likely there are little
glitches that are happening here and there that he isn’t aware of.
The callback function has to look at both take and put in order to
determine whether or not they are equal. Since they are operating
asynchronously without any synchronization, there may be times when
the value of put is incorrectly read from the callback function. Most
likely this just results in a slight glitch of the sound, so it’s not
a big deal. If the hardware he’s using does atomic writes of 4-byte
quantities, it may be that synchronization isn’t necessary, but since
his is presumably a C program and C doesn’t guarantee that the pointer
will be updated atomicly, his program is incorrect. A port or a new
piece of hardware may suddenly introduce bugs that he’s not aware of.
>> I would imagine most thread stuff that requires locking could
>> be implemented in some other way so as not to require
>> locking. But it is just a feeling, not from actual
>> experience. I've been programming for 23 years and my instincts
>> have usually served me well :^).
What hubris.
It’s highly unlikely that this guy has been programming SMP machines
for 23 years. I doubt he’s even been programming anything that is
marginally threaded for very long. He clearly doesn’t understand what
C specifies. He’s wearing his ignorance on his sleeve.
I too have been programming 23 years, including substantial time
programming OS-level things. I know how things should work, so I push
them hard and find tons of bugs both by running into them and visual
inspection of code. I have a good understanding of the difference
between what a language specifies and what an actual implementation
does. If the language (+ runtime) doesn’t specify it, you can’t count
on it happening just because you get lucky.
Sam> What you have in bomberman is busy waiting. If you wanted to
Sam> wait on a mutex when the buffer was full, it would get a
Sam> little more complex. :) Just an observation.
Indeed.
>> All I really care about is that the SDL library function calls
>> don't change. Now, if they must change because they weren't set
>> up correctly with threads in mind, then I'd say make the
>> minimum changes necessary to fix the problem. But speaking of
>> changing every SDL function call is a scary prospect. I picture
>> a nightmare of tech support emails, "My program won't compile
>> with the new version but it works with the old," "I'm getting
>> core dumps on programs that worked before I upgraded to a new
>> version of SDL," etc, etc.
Sam> I am ambivalent about the change. It would be the "Right
Sam> Thing To Do", but the API gets more complicated, and the
Sam> benefit only touches a few people:
Sam> How many people actually care about the precise return values
Sam> of functions?
Everyone does. They just don’t realize it. They care because there
are going to be people who will encounter real problems due to faulty
architectures and they won’t even know what’s going wrong. Let’s take
the mystery _vsnprintf bug as an example. NOBODY, (not even you or I)
has yet tracked that puppy down. Instead everyone is waiting for
someone else to do it, although the waiting is implicit for most of
the SDL users.
People may not realize their SDL-based Apps (at least ones using the
version of SDL that Executor ships with) won’t run on some NT
machines, so they may not think that they benefit from clean
architectures that help eliminate mystery bugs, but they do.
Sam> Part of this is lack of planning on the design of the API,
Sam> i.e. I didn't have a set way of returning error codes and a
Sam> set of meanings for those error codes. Adding an error
Sam> pointer parameter adds an official way of representing both
Sam> the error code and the raw information needed to display an
Sam> error message in a thread-safe way. However, it adds
Sam> complexity to the API which isn't necessarily desireable.
I disagree. You’re confusing two separate issues. One is whether you
pass information back via a non-reentrant way, the other is how much
information you pass back. Even if you were just returning a number,
like errno, the correct way to do it is via passing in a pointer.
Just because there are tons of applications that don’t do this (and
are rarely multi-threaded) doesn’t mean it’s not an issue.
Sam> So. The question is, what are the benefits v.s. the pains?
Sam> Pro: Thread-safe error handling Internationalized error
Sam> messages Function-specific errors (i.e. no global errno
Sam> value) Better defined error conditions
I think you really need to separate the issues.
Sam> Con: Adds complexity to the API
Adds optional complexity. You know that I’m a big fan of backward
compatibility. You should be able to make it so that the mod is
bakward compatible at the source level. (Something you didn’t do when
you changed the the ReadLE32 routines … grr…).
Sam> So, what is the actual effect of the proposed change on the
Sam> API?
Sam> Let's take for example SDL_BlitSurface()
Sam> Old method: if ( SDL_BlitSurface(src, &srcrect, dst,
Sam> &dstrect) < 0 ) { fprintf(stderr, "Blit failed: %s\n",
Sam> SDL_GetError()); }
Sam> New method: if ( SDL_BlitSurface(&error, src, &srcrect, dst,
Sam> &dstrect) < 0 ) { fprintf(stderr, "Blit failed: %s\n",
Sam> SDL_GetError(&error)); /* This is actually a possible error
Sam> */ if ( error.code == SDL_ERR_LOSTDIRECTXVIDEO ) {
Sam> ReloadSurfaces(); } } or if you don't care about
Sam> thread-safety (using a global error location): if (
Sam> SDL_BlitSurface(NULL, src, &srcrect, dst, &dstrect) < 0 ) {
Sam> fprintf(stderr, "Blit failed: %s\n", SDL_GetError(NULL)); }
Sam> or if you don't care about errors at all: SDL_BlitSurface(0,
Sam> src, &srcrect, dst, &dstrect);
DO NOT DO WHAT YOU PROPOSED.
IT IS EVIL TO CHANGE API’S OUT FROM UNDER PEOPLE.
You did this with SDL_ReadLE16 and friends. That was not the right
way to do it, and it broke code (mine).
Add “Err” to discriminate between the versions of the routines that take
an error pointer and the ones that don’t. Allow your existing users to
use the ones that don’t.
You’ll then have
SDL_BlitSurface
and SDL_BlitSurfaceErr
No code of consequence will break.
Sam> By the way, to put things in perspective, the new POSIX
Sam> thread-safe way of returning errors is to return -errno and
Sam> have errno be an index into a static table of error strings.
Sam> I want SDL to be a little more flexible in that you can have
Sam> informative error messages like: "Couldn't set video mode:
Sam> Must be less than 2048x2048"
The POSIX way of doing things is broken, but at least POSIX
implementations have a little more constraint on their environment
than you do. Signals can’t interrupt system calls, a distinction that
SDL doesn’t have, so POSIX’s problems aren’t as bad as SDL’s, and
POSIX is trying to not make waves in a very big pond. I’m impressed
by the number of people using SDL (although not necessarily the
caliber), but there’s still room for a few more orders of magnitude of
growth. It really makes sense to do things right now while the user
base is still relatively small.
Sam> On the pro side of the complexity, _all_ SDL functions will
Sam> take exactly the same error parameter, so you won't have to
Sam> think about it once you get used to it, and you'll no longer
Sam> need to check the return value of functions, just check the
Sam> error code at certain points in your code:
Yup. People can also mimic the compatibility package and use their
own macros that jam that extra arg in there if it worries them so
much.
Sam> (Notice the changing of error to result)
Sam> SDL_ClearResult(&result); screen =
Sam> SDL_SetVideoMode(&result, 640, 480, 8, SDL_SWSURFACE); if (
Sam> result.error ) { SDL_PrintError(&result, "Couldn't set video
Sam> mode"); exit(0); } loaded = SDL_LoadBMP(&result,
Sam> "image.bmp"); image = SDL_DisplayFormat(&result, loaded);
Sam> SDL_FreeSurface(NULL, loaded); if ( result.error ) {
Sam> SDL_PrintError(&result, "Couldn't load and convert
Sam> image.bmp"); exit(0); } for ( i=0; i<screen->w/2; ++i ) {
Sam> SDL_Rect rect;
Sam> rect.x = i; rect.y = 0; rect.w = image->w;
Sam> rect.h = image->h; SDL_BlitSurface(&result, image, NULL,
Sam> screen, &rect); SDL_UpdateRects(&result, screen, 1, &rect); }
Sam> if ( result.error ) { SDL_PrintError(&result, "Couldn't move
Sam> image across screen"); exit(0); }
Sam> SDL_PrintError() could be defined as:
Sam> SDL_PrintError(SDL_result *result, const char *m) { char
Sam> message[BUFSIZ]; fprintf(stderr, "%s: %s\n", m,
Sam> SDL_GetResult(&result, message, BUFSIZ)); } It could also pop
Sam> up a message box on Windows and/or MacOS.
Sam> I will decide whether or not to implement this based on user
Sam> response, since you all are the reason this library exists in
Sam> the first place, but either way this is decided, it must be
Sam> done before 1.0.
There are three issues here:
Do you use a global (or thread-global -- almost as bad) for passing
back information? There is only one right answer to this question if
you want to have a uniform method of returning errors, and that is
no, you don't use a global.
Do you want to pass back more than a simple int as part of the error?
I think a good case can be made for yes, but it's not nearly as clear
cut as the first question. This is a matter of convenience, the
previous question is a matter of correctness/flexibility (it is either
incorrect and flexible or correct and inflexible; it can't be both).
If you do want to pass back more information, do you want to do it
via the API that you've constructed to date. I think the proper answer
is that it's too early to tell. I simply haven't had enough time to
review what you're doing, and from the commentary you've passed on to
me, I doubt that the other people who have looked at it are qualified
to see things that will bite you in the future. If you can't see
why global (or thread-global) variables are bad, you're not qualified
to judge.
>> I've noticed things in the SDL-demos that don't even match the
>> latest versions of the library...Speaking of Sam or whoever
>> making sweeping changes to the library makes me nervous. How
>> many bugs are going to be introduced? Will the examples be
>> updated correctly? Lots of details haven't been dealt with
>> before as regards minor changes...
Sam> That's because the development versions are places where new
Sam> features are tested, documentation isn't updated, things may
Sam> or may not work, etc. The stable releases get all the bug
Sam> fixes, and the API stays stable.
Sam> What happened with 0.9 is that it stayed out way too long and
Sam> became a stable version even though the documentation and
Sam> examples hadn't been updated. This was because I didn't want
Sam> to go to version 1.0 without addressing problems such as
Sam> thread-safety, and internationalization support which are
Sam> important in production level code. However, they were not
Sam> so important that I took time off of Loki work to work on
Sam> them.
Exactly. I don’t blame you for what has happened with 0.9 at all.
>> I'm thinking specifically of the audio stereo stuff, where
>> stereo used to be a flag in the init audio function, but then
>> switched to a member in the structure (channels). And the audio
>> example program on the latest docs said to use the old method,
>> not the new one!
Sam> I still need to update the docs. Once this issue is
Sam> resolved, I update the docs, and release the next stable
Sam> version of SDL, hopefully 1.0. :)
Personally I hope the next release is 0.10. I would like some time to
look at a lot of what you’ve done to see if there are other gotchas on
the way.
Sam> The stable version of SDL will stay, with no API changes,
Sam> just bug-fixes, as the version of SDL which can be used for
Sam> production code. The next development version will start
Sam> having official MacOS support and I'll start on multi-monitor
Sam> support.
Sam> BTW, I'm also planning a tutorial on making a game with
Sam> SDL. :)
Sam> So... what do you all think? Cliff, you might want to join
Sam> the SDL mailing list temporarily to join the discussion.
Sam> Lurkers, I want to hear from you too. :)
I’m about to go away on vacation. I’ll be gone the 29th through the
2nd and from the 6th through the 11th. I don’t think it makes sense
for me to join the discussion. Sorry. I like to think of myself as a
valuable resource. Perhaps we can cover more ground on the phone
sometime. I think you really botched the introduction of this by
suggesting that you would keep the same names and add a parameter.
Sam> -Sam Lantinga (slouken at devolution.com)
Sam> Lead Programmer, Loki Entertainment Software -- "Any
Sam> sufficiently advanced bug is indistinguishable from a
Sam> feature" -- Rich Kulawiec
–Cliff
@Clifford_T_Matthews