Inexplainable race condition in SDL Renderer

What is this SDL_PIXELFORMAT UNKNOWN supposed to mean?

Yes, I did try SDL_LockTexture(), but with no effect. As for the pixel format this also has no effect.

In fact I read your first post several times but still cannot understand exactly the first sentences: what works, what doesn’t work. What circumstances. Which texture does not show.
You seem to say that “by externally setting blitUI” you get what you want, but is is clear from your code that you will render your texture only when blitUI is true.

It seems that the uiTexture is sporadically not drawn although uiBufferTmp contains the correct data. It works some times, some times it doesn’t. It seems to depend on timings. If I force updateUI to be true (without modifying uiBufferTmp), it then draws the contents.

Are you checking the return value of SDL_UpdateTexture()? It’s possible that the texture is in use and can’t be updated. Maybe have two textures, and alternate between them, with a semaphore or something to keep you from getting ahead more frames than you have textures.

Furthermore, are you sure you need a second thread just to copy a buffer to the screen? Is that where your app’s bottleneck is? I was working on a game that did software “3D” rendering a while back. Everything was done on the main thread: event handling, game logic, rendering, and calling SDL_LockTexture() / SDL_UnlockTexture(), and it still got hundreds of frames per second with vsync off. That was with the CPU having to perform calculations for every pixel on the screen and hitting 100% single-core utilization (if I ever get around to finishing it, I’ll probably make the renderer multi-threaded, but that’d just be to speed up the actual rendering; it’s not needed for updating the on-screen texture).

edit:

The joy of debugging multi-threaded code.

As described above, it still happens after removing the repainter thread. In this case everything involving the rendering is done in the main thread without any connection to the secondary thread (while UI is visible, all other activities are suspended). This issue is not related to multi-threading.

I’ll check the return value of SDL_UpdateTexture() later today. But it also happens when using SDL_LockTexture()/SDL_UnlockTexture() instead.

it’s difficult to answer your question because your code above does not show how (and how often) you update blitUI

blitUI is set, when there is a change in the UI, for example if a dialog needs to be drawn. The problem is, that most times the dialog is drawn but sometimes it isn’t. Again, this also happens if I move all drawing to the main thread. Then of course blitUI, uiBufferTmp and the while-loop are no longer necessary and I just call SDL_UpdateTexture(), SDL_RenderClear(), SDL_RenderCopy() and SDL_RenderPresent() as needed.

more precisely: do you get a blank (or black) screen, or garbage pixels, or nothing (the previous display is not modified?)

The previous display is not modified.

did you make sure that the display part of the code is well executed when necessary (for instance, by printing something to the log when calling SDL_UpdateTexture and so on) ?

Yes, I added some printf‘s to make sure all relevant parts of the code are reached.

hmm. did you try to set the blend mode to NONE to make sure it’s not a transparency issue?

Is all this guessing leading anywhere?

If someone is able to compile the code and reproduce the problem then it should probably not be too difficult to debug. Running it through a sanitizer or valgrind might also reveal some issues.

Otherwise I think we should just ask the OP to make a “minimal reproducible example”.

It is very likely that the root cause of the problem will be found in the process of doing so but if not then we will have something that is easier to debug. “Help us help you”. I wouldn’t be surprised if the problem is in some totally different part of the program. Don’t underestimate undefined behaviour.

I tested the return value of SDL_UpdateTexture() and it is always 0. Same for SDL_RenderCopy(). Setting blend mode to NONE has no effect.

This is very hard to debug. I can not make a reproducible example because I have no idea what causes the issue. I can’t set breakpoints to step through the code because the breakpoints affect the timings and the problem won’t occur. Even setting too much debug prints alters the timings so that the issue disappears. I have most success reproducing it by starting my application, clicking on “Display” and then clicking on “Color”. It fails to show the “Select slot:” dialog at a rate of about 75 %.

Maybe if you could share the “one-thread” version it would already be easier to debug

If you apply the appended patch on top of branch_softfloat you get the variant with all drawing done in the main thread. Note that this is only for testing the rendering of the GUI. Running the emulator might cause problems like unexpected quit or endless loops.

thread.diff (16.6 KB)

I tried to compile your code with

mkdir -p build
cd build
cmake ..
cmake --build .

but it failed with

/tmp/previous-code-r1206-branches-branch_softfloat/src/includes/host.h:12:10: fatal error: SDL.h: Aucun fichier ou dossier de ce type
   12 | #include <SDL.h>
      |          ^~~~~~~
compilation terminated.

Of course I do have /usr/include/SDL2/SDL.h on my system (ubuntu) so I suppose that the SDL flags are not passed to the compiler?

Strange. Maybe you have to add

include_directories(${SDL2_INCLUDE_DIRS})

to all CMakeLists.txt? They are in src, src/debug, src/dimension, src/gui-sdl and src/slirp.

yes it’s better, but now I have

/tmp/previous-code-r1206-branches-branch_softfloat/src/debug/log.h:130:8: error: unknown type name ‘uint64_t’
  130 | extern uint64_t LogTraceFlags;
      |        ^~~~~~~~

so I had to add #include <stdint.h> in debug/log.h to pass this,

and finally it fails with

[ 93%] Linking CXX executable Previous
/usr/bin/ld : CMakeFiles/Previous.dir/fast_screen.c.o : dans la fonction « Screen_Init » :
/tmp/previous-code-r1206-branches-branch_softfloat/src/fast_screen.c:364 : référence indéfinie vers « SDL_GetWindowSizeInPixels »
collect2: error: ld returned 1 exit status
gmake[2]: *** [src/CMakeFiles/Previous.dir/build.make:828 : src/Previous] Erreur 1
gmake[1]: *** [CMakeFiles/Makefile2:278 : src/CMakeFiles/Previous.dir/all] Erreur 2
gmake: *** [Makefile:136 : all] Erreur 2