SDL_CreateThread hanging (Emscripten)

Emscripten/WebAssembly has for some time supported multi-threading, which I am very much reliant on. If I build my app with Emscripten 2.0.5 and SDL 2.0.10 everything is fine: I can start a worker thread and it works perfectly.

However with Emscripten 2.0.34 and SDL 2.0.17 (I think) SDL_CreateThread() hangs and never returns. Of course I have no idea whether the issue arises from a change in Emscripten between 2.0.5 and 2.0.34, or a change in SDL between 2.0.10 and 2.0.17.

I don’t know how to debug this: I can’t add debugging code to SDL itself because I’m using the Emscripten port, not building from source. Can anybody suggest how I can make progress?

I don’t know how to debug this: I can’t add debugging code to SDL itself because I’m using the Emscripten port, not building from source. Can anybody suggest how I can make progress?

There could be several things that caused this. As a checklist, off the top of my head:

  • Make sure you compile every file in your program, and link your program, with -pthread on the commmand line.
  • See if __EMSCRIPTEN_PTHREADS__ is #defined when you compile your code. The compiler will define this for you, you don’t define it yourself. If not, it’s possible this just sort of panicked because something isn’t built right.
  • Check the console in the web browser to see if an exception was thrown; Javascript/WASM exceptions tend to make it look a function never returned (which technically, it didn’t), but really it has terminated your app with an uncaught exception. If this is the case, you’ll usually have a pretty clear idea on how to fix it from the error message.
  • Are you talking to an HTTPS connection? Several important pieces of web browser APIs will refuse to work without one–not because it needs it but because they’re trying to push people to encrypt everything…which is fine, but potentially a giant pain when you’re just trying to test a local build.
  • This, too: Making your website "cross-origin isolated" using COOP and COEP
  • If you can’t get further, you can build SDL2 from source code with Emscripten and link to it, instead of using Emscripten’s included copy–they are actually just providing their build of the same source code. This might help you to get some clarity on where it’s blowing up, but it might add a little hassle to try to get it to build.

I think I’m already doing all of those, not least because in most cases it wouldn’t be running with Emscripten 2.0.5 otherwise. I have made one discovery, however, which is that the version of SDL2 bundled with Emscripten 2.0.34 is, to my surprise, 2.0.10 which is the same version as comes with Emscripten 2.0.5!

So as the object of the exercise was to update SDL2 to something more recent. switching to Emscripten 2.0.34 wouldn’t achieve anything anyway! I will either need to build SDL2 from source, or bite the bullet and update to Emscripten 3. Rather like changing from SDL2 to SDL3, that’s scary!

I’ve now tried building my app with the latest version of Emscripten 3, and the good news is that it doesn’t hang at SDL_CreateThread(). So perhaps that was a genuine bug which has since been fixed.

The bad news is that it’s crashing with the report below. It’s unlikely that there’s a genuine issue with either FT_Stream_Seek() or dlmalloc() (after all everything works perfectly in Emscripten 2.0.5) but perhaps some general memory access problem. Any thoughts?

Uncaught RuntimeError: table index is out of bounds
    at FT_Stream_Seek (bbcsdl.wasm:0x194a49)
    at tt_face_goto_table (bbcsdl.wasm:0x1b3d00)
    at tt_face_load_cvt (bbcsdl.wasm:0x1bb7dc)
    at tt_face_init (bbcsdl.wasm:0x1bb127)
    at open_face (bbcsdl.wasm:0x190e9f)
    at FT_Open_Face (bbcsdl.wasm:0x19077d)
    at TTF_OpenFontIndexDPIRW (bbcsdl.wasm:0x60135)
    at TTF_OpenFont (bbcsdl.wasm:0x606be)
    at openfont_ (bbcsdl.wasm:0x2a016)
    at mainloop (bbcsdl.wasm:0x2e1a2)

bbcsdl.worker.js:202 Uncaught RuntimeError: memory access out of bounds
    at dlmalloc (bbcsdl.wasm:0x1e1da1)
    at real_malloc (bbcsdl.wasm:0xf72c2)
    at SDL_malloc (bbcsdl.wasm:0xf72f2)
    at SDL_PeepEventsInternal (bbcsdl.wasm:0xee700)
    at SDL_PushEvent (bbcsdl.wasm:0xf0293)
    at UserTimerCallback (bbcsdl.wasm:0x2d807)
    at SDL_TimerThread (bbcsdl.wasm:0x15bcc1)
    at SDL_RunThread (bbcsdl.wasm:0x15b927)
    at RunThread (bbcsdl.wasm:0x15aead)
    at Object.invokeEntryPoint (bbcsdl.js:4210:42)

Is this something I could reasonably build over here, to try to reproduce it?

I’ve spent most of today doing a bisection of how my app behaves on different Emscripten versions. As you can see, it’s versions 2.0.25 to 3.1.15 (a broad range) that are suffering from SDL_CreateThread() hanging:

2.0.5 - 2.0.12:  Runs OK, stable
2.0.13 - 2.0.24: Build fails when attempting to build 'harfbuzz'
2.0.25 - 3.1.15: Code hangs in SDL_CreateThread()
3.1.16 - 3.1.20: Build fails, claiming that embedding a file below the current directory is invalid!?
3.1.21 - 3.1.30: Code runs, but is unstable and crashes unpredictably

Do you happen to have a cross-reference between Emscripten versions and which versions of SDL2 are bundled with them? There is a complication that in some cases there are both libsdl2.a and libsdl2-mt.a (multi-threaded) libraries bundled.

I’ve found what is causing the memory access out of bounds error. It’s a code-reordering issue, presumably as a result of optimization. I have code which is structurally like this:

if (cond1)
  {
    int *ptr = memory;
    *ptr = input;
  }
if (cond2)
  {
    int *ptr = memory;
    output = *ptr;
  }

My code is relying on output being equal to input whenever both cond1 and cond2 are true. This has been the case in every other compiler I have used.

But the Emscripten compiler is seemingly not spotting the dependence, and is executing the ‘cond2’ block before the ‘cond1’ block. So when I read the memory in the second block I don’t see the data which was written in the first block! Adding a print statement between the blocks makes the code work as intended.

The question is: am I making an invalid assumption about the code ordering, or is the compiler incorrectly re-ordering it? If it’s my mistake, how can I modify the code to make the dependence obvious to the compiler? I may have made similar assumptions elsewhere in the code (it was hand-translated from assembler).

Have you tested it in Firefox?

I had a program that worked when building with Emscripten 2.0.16 and didn’t when using one of the 3 versions. When investigating, it worked in Firefox but not Chrome, and the problem was fixed in Chrome - it made into Chrome 100, and it was a bit of a pain to get it in.

Upon investigation there was a big switch function in a part in my code where there was a bytecode interpreter, and the way Emscripten generated the wasm code for that had changed significantly. Below is the issue in case and the comment from someone who still had a problem later - even though the fix worked for me.

No, but obviously it’s got to work in all the major browsers. I principally test in Chrome because most browsers (Brave, Edge, Vivaldi etc.) use the Chrome engine internally.

Adding volatile to the two pointer declarations works, but I don’t know how legitimate it is as a cure:

if (cond1)
  {
    volatile int *ptr = memory;
    *ptr = input;
  }
if (cond2)
  {
    volatile int *ptr = memory;
    output = *ptr;
  }

If the store is marked as volatile , then the optimizer is not allowed to modify the number or order of execution of this store with other volatile operations.

It looks good according to llvm docs!

While the fix is apparently stable according to llvm docs, the optimization looks like a bug for the specifics of your code.

1 Like

Do you have a reference I can quote to the Emscripten people?

I don’t, I meant that going from the bit you posted we have something close to

b = a
b[0] = c

b = a
d = b[0]

It looks like it should pick up the memory being used there in this specific order. I can’t tell more without seeing the actual code.

I don’t know if it’s possible to move the int *ptr = memory to the scope out of the if clauses, but perhaps that would make it easier for the compiler to pick this up.

Possibly it could be moved, but your quote from the LLVM docs shows that the better (and safer) way is to add volatile. This has directly the required effect: it guarantees that the order of accesses will be preserved, i.e. the write to memory will always occur before the read.

I am happy that the cause is understood and that the fix is effective. What I’m unsure about is whether what the compiler has done in this case is strictly compliant with the rules or not. It hinges on whether accessing the same directly-addressed memory location in two clauses should or should not be recognized as a dependency.

There is a complication in that in my original code the pointer declarations are actually of the form:

int *ptr = memory + offset;

where offset is (in this particular instance) #defined to be zero. Maybe the fact that it’s an expression (albeit one that can be evaluated at compile-time) makes a difference.

Instead of relying on volatile (which keeps the compiler from storing the variable in a register, so every single use results in a memory read or write), IMHO it would be better to use a barrier because it’s more explicit about what you’re trying to do.

Luckily SDL gives us SDL_CompilerBarrier(). This explicitly tells the compiler not to reorder across it (it does not stop the CPU from doing so). So try putting SDL_CompilerBarrier() between the two if blocks.

Another possibility is to put any sort of explicit atomic operation between the if blocks. Since SDL’s implementation of atomic variables uses SDL_CompilerBarrier() and full memory barriers, this will stop the compiler from reordering instructions across it, and stop the CPU from reordering reads and writes across it as well. See SDL_atomic.h

edit: the reason putting a printf between the two if blocks works is because the compiler won’t reorder instructions across a function call, in case the function call has visible side effects. So it acts like an implicit compiler barrier.

Interesting.

My objection to using SDL_CompilerBarrier() is that this particular module is in a generic part of the interpreter’s source code, which is shared between the SDL2 build, the console build and the Raspberry Pi Pico build. So the code has to be generic C, it can’t contain anything platform-specific.

There is probably a suitable generic C barrier that I could use (perhaps calling an empty extern function would be enough), but fortunately the code containing the volatile keyword is not particularly time-critical so forcing the variable concerned to be kept in memory rather than a register is not a serious penalty.