[thread issue] [Linux (mint 19.2)] [version : 2.0.9 and 2.0.10]


#1

Hello,

First, thanks a lot for providing this nice library, with such portability. I read this forum since a long while but I never had the need to ask, since all the questions I had were already answered :slight_smile:

This time, I got a serious issue with (SDL) threads on Linux only, and I’m there because I need help.

First, I have to say there is no problem on Windows, and everything works as expected (tested on Windows 7 and > working well, with the current code, after cross compilation mingw + posic version of gcc/g++)

The issue occurs only on the Linux side, and I only tested on one Linux machine, with LinuxMint 19.2. No issue so far when using older Linuxmint versions …

SDL versions tested are built by me, 2.0.9 and 2.0.10 (see the links to see the compilation flags), and IMHO, the issue is perhaps caused by a bad SDL2 use. Very probably my fault, but I’d like to understand what exactly.

Note : I already reported the issue at linuxmint blog , but got no feedback yet.

The issue : everything works, excepted when I open a video : when I want to close it, SDL_DestroyCond() never returns, and the application is definitley frozen.

strace miniDart stops at :
futex_wait_queue_me … (application stops unendlessly …)

The code is available on framagit : https://framagit.org/ericb/miniDart , and I provided instructions to build the software. Please read:

Of course, there is a lot to do and I’m not a pro, e.g. there is a lot of code factorization to do, but since I 'm alone, I’ll take my time quietly :wink:

To save your time, the problematic parts are probably located in FFmpegPlayer.cpp, + in miniDart.cpp. Just follow LoadFile() calls, to understand how it works.

Last, here are some debug information, just in case it could help :

me@my_machine ~/Devel/minidart/miniDart_0.99/build $ ./miniDart
fps : 0
drawable area is 0 x 0
dpi : 166.444
defaultDpi : 150
windowDpiScaledWidth : 1553
windowDpiScaledHeight : 865
SDL2 Window created
SDL_VERSION_ATLEAST(2,0,9) 1
SDL_VERSION_ATLEAST(2,0,8) 1
SDL_VERSION_ATLEAST(2,0,7) 1
SDL_HAS_CAPTURE_AND_GLOBAL_MOUSE SDL_VERSION_ATLEAST(2,0,4) = 1
SDL_HAS_WINDOW_ALPHA SDL_VERSION_ATLEAST(2,0,5) = 1
SDL_HAS_ALWAYS_ON_TOP SDL_VERSION_ATLEAST(2,0,5) = 1
SDL_HAS_USABLE_DISPLAY_BOUNDS SDL_VERSION_ATLEAST(2,0,5) = 1
SDL_HAS_PER_MONITOR_DPI SDL_VERSION_ATLEAST(2,0,4) = 1
SDL_HAS_VULKAN SDL_VERSION_ATLEAST(2,0,6) = 1
SDL_HAS_MOUSE_FOCUS_CLICKTHROUGH SDL_VERSION_ATLEAST(2,0,5) = 1
We compiled against SDL version 2.0.9 …
And we are linking against SDL version 2.0.9.
OpenGL version: 4.5 (Core Profile) Mesa 19.1.6 (git-4ec2325dd0)
GLSL version: 4.50
Vendor: Intel Open Source Technology Center
Renderer: Mesa DRI Intel® HD Graphics (Whiskey Lake 3x8 GT2)
Audio device (no recording capability) 0: HDA Intel PCH, ALC293 Analog
Audio device (with recording capability) 0: HDA Intel PCH, ALC293 Analog
Audio device (no recording capability) 1: HDA Intel PCH, HDMI 0
Audio device (with recording capability) 1: (null)
Audio device (no recording capability) 2: HDA Intel PCH, HDMI 1
Audio device (with recording capability) 2: (null)
Audio device (no recording capability) 3: HDA Intel PCH, HDMI 2
Audio device (with recording capability) 3: (null)
Audio device (no recording capability) 4: HDA Intel PCH, HDMI 3
Audio device (with recording capability) 4: (null)
Audio device (no recording capability) 5: HDA Intel PCH, HDMI 4
Audio device (with recording capability) 5: (null)
Audio device (no recording capability) 6: (null)
Audio device (with recording capability) 6: (null)
Audio device (no recording capability) 7: (null)
Audio device (with recording capability) 7: (null)
Found Audio device (with recording capability) device number : 0 name : HDA Intel PCH, ALC293 Analog
1 recordable audio devices.
Number of Audio devices (with recording capability) 1
Audio subsystem initialized; driver = alsa.
io = 0x55d3cf282918
style = 0x55d3cf283f50
isFullVideo vaut 1
Gtk-Message: 18:46:44.551: GtkDialog mapped without a transient parent. This is discouraged.
video_init done
calling main2(filename), filename in video_init contains : /home/my_login/Devel/minidart/miniDart_0.99/build/some_video.mkv
av_sync_type = 0
b_video_running = 1 (currently in video_init)
Initialization : origin = 0
max_position = 0
video_duration = 0
video_duration = 223,621000 (currently in read_thread)
do exit
entering in do_exit(is)
is->tead_tid : 0x55d3d0aa0420
closing audio stream …
… done …
closing video stream …
… done …
closing input file …
… done …
destroying video queue …
… done …
destroying video queue …
… done …
destroying frame queue …
I’m here : 383 in frame_queue_destroy
I’m here : 392 before SDL_DestroyMutex()
I’m here : 397 before SDL_DestroyCond() // never returning …

Other important information : the linux kernel I’m using is one I built myself (the Debian way), and it was working perfectly with LinuxMint 17, and 18.4

Other possible tracks: some changes in libc ? Or some important library, or kernel mitigation or …?

Last but not least, I tested building everything using gcc / g++ 5.x , 7.1.0 and no change so far. Old versions (known to work perfectly), and tested on this machine are broken, and ideas are missing me …

Thanks a lot in advance for any track …

Eric Bachard


#2

I’m not sure what your problem is, but I can see that SDL_DestroyCond() seems to call pthread_cond_destroy.

The help for this function says : “Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.”

Is it possible that your condition is still being used, and that it crashes another thread?


#3

Hello Martin,

Thanks a lot for your help. To answer you, the code seems to follow the SDL2 documentation, but as I wrote, I’m not a threads (SDL_Threads here) specialist :-/

I have to add it works very well under Windows, and was working without any issue on Linux with all the previous versions of LinuxMint. That’s the reason why I was wondering whether it could be another change somewhere in the Linux distribution (say some library miniDart depends on).

If this can help, I’ll add what I can see with a simple gdb session. Reminder : the code is available here : https://framagit.org/ericb/miniDart

Blockquote(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/$LOGIN/Devel/minidart/miniDart_0.99/build/miniDart
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
fps : 0
[New Thread 0x7fffdfc1a700 (LWP 23582)]
[New Thread 0x7fffdd011700 (LWP 23583)]
[New Thread 0x7fffdc810700 (LWP 23584)]
drawable area is 0 x 0
dpi : 166.444
defaultDpi : 150
windowDpiScaledWidth : 1553
windowDpiScaledHeight : 865
SDL2 Window created
SDL_VERSION_ATLEAST(2,0,9) 1
SDL_VERSION_ATLEAST(2,0,8) 1
SDL_VERSION_ATLEAST(2,0,7) 1
SDL_HAS_CAPTURE_AND_GLOBAL_MOUSE SDL_VERSION_ATLEAST(2,0,4) = 1
SDL_HAS_WINDOW_ALPHA SDL_VERSION_ATLEAST(2,0,5) = 1
SDL_HAS_ALWAYS_ON_TOP SDL_VERSION_ATLEAST(2,0,5) = 1
SDL_HAS_USABLE_DISPLAY_BOUNDS SDL_VERSION_ATLEAST(2,0,5) = 1
SDL_HAS_PER_MONITOR_DPI SDL_VERSION_ATLEAST(2,0,4) = 1
SDL_HAS_VULKAN SDL_VERSION_ATLEAST(2,0,6) = 1
SDL_HAS_MOUSE_FOCUS_CLICKTHROUGH SDL_VERSION_ATLEAST(2,0,5) = 1
We compiled against SDL version 2.0.10 …
And we are linking against SDL version 2.0.10.
OpenGL version: 4.5 (Core Profile) Mesa 19.1.6 (git-4ec2325dd0)
GLSL version: 4.50
Vendor: Intel Open Source Technology Center
Renderer: Mesa DRI Intel® HD Graphics (Whiskey Lake 3x8 GT2)
Audio device (no recording capability) 0: HDA Intel PCH, ALC293 Analog
Audio device (with recording capability) 0: HDA Intel PCH, ALC293 Analog
Audio device (no recording capability) 1: HDA Intel PCH, HDMI 0
Audio device (with recording capability) 1: (null)
Audio device (no recording capability) 2: HDA Intel PCH, HDMI 1
Audio device (with recording capability) 2: (null)
Audio device (no recording capability) 3: HDA Intel PCH, HDMI 2
Audio device (with recording capability) 3: (null)
Audio device (no recording capability) 4: HDA Intel PCH, HDMI 3
Audio device (with recording capability) 4: (null)
Audio device (no recording capability) 5: HDA Intel PCH, HDMI 4
Audio device (with recording capability) 5: (null)
Audio device (no recording capability) 6: (null)
Audio device (with recording capability) 6: (null)
Audio device (no recording capability) 7: (null)
Audio device (with recording capability) 7: (null)
Found Audio device (with recording capability) device number : 0 name : HDA Intel PCH, ALC293 Analog
1 recordable audio devices.
Number of Audio devices (with recording capability) 1
Audio subsystem initialized; driver = alsa.
io = 0x555555b393a8
style = 0x555555b3a9e0
isFullVideo vaut 1
[New Thread 0x7fffd74e3700 (LWP 23589)]
[New Thread 0x7fffd6ce2700 (LWP 23590)]
[New Thread 0x7fffd62c4700 (LWP 23592)]
[New Thread 0x7fffd4922700 (LWP 23593)]
[New Thread 0x7fffc2456700 (LWP 23594)]
[New Thread 0x7fffc1a53700 (LWP 23595)]
Gtk-Message: 08:51:44.343: GtkDialog mapped without a transient parent. This is discouraged.
[New Thread 0x7fffc1252700 (LWP 23597)]
[New Thread 0x7fffc0a51700 (LWP 23598)]
[New Thread 0x7fffabfff700 (LWP 23599)]
[New Thread 0x7fffa37fe700 (LWP 23600)]
[New Thread 0x7fffab7fe700 (LWP 23601)]
[New Thread 0x7fffaaffd700 (LWP 23602)]
[New Thread 0x7fffaa7fc700 (LWP 23603)]
[Thread 0x7fffd4922700 (LWP 23593) exited]
[Thread 0x7fffab7fe700 (LWP 23601) exited]
[Thread 0x7fffa37fe700 (LWP 23600) exited]
[Thread 0x7fffabfff700 (LWP 23599) exited]
[Thread 0x7fffc1252700 (LWP 23597) exited]
[Thread 0x7fffd62c4700 (LWP 23592) exited]
[Thread 0x7fffc2456700 (LWP 23594) exited]
[Thread 0x7fffaaffd700 (LWP 23602) exited]
[Thread 0x7fffc0a51700 (LWP 23598) exited]
[New Thread 0x7fffc0a51700 (LWP 23605)]
[New Thread 0x7fffaaffd700 (LWP 23606)]
[Thread 0x7fffaaffd700 (LWP 23606) exited]
[Thread 0x7fffaa7fc700 (LWP 23603) exited]
video_init done
calling main2(filename), filename in video_init contains : /home/$LOGIN/Devel/minidart/miniDart_0.99/build/some_song.mp4
av_sync_type = 0
[New Thread 0x7fffaa7fc700 (LWP 23607)]
b_video_running = 1 (currently in video_init)
Initialization : origin = 0
max_position = 0
video_duration = 0
video_duration = 208,003333 (currently in read_thread)
[New Thread 0x7fffaaffd700 (LWP 23608)]
[New Thread 0x7fffc2456700 (LWP 23609)]
[New Thread 0x7fffd62c4700 (LWP 23610)]
do exit
entering in do_exit(is)
[Thread 0x7fffaa7fc700 (LWP 23607) exited]
is->tead_tid : 0x5555572fb870
closing audio stream …
[Thread 0x7fffd62c4700 (LWP 23610) exited]
[Thread 0x7fffc2456700 (LWP 23609) exited]
… done …
closing video stream …
… done …
closing input file …
… done …
destroying video queue …
… done …
destroying video queue …
… done …
destroying frame queue …
Thread 1 “miniDart” hit Breakpoint 1, frame_queue_destroy (f=0x55555740c5e0) at src/FFmpegPlayer.cpp:380
380 {
(gdb) n
381 SDL_LockMutex(f->mutex);
(gdb)
382 std::cout << “I’m here : " << LINE << " in frame_queue_destroy” << “\n”;
(gdb)
I’m here : 382 in frame_queue_destroy
384 for(int i = 0; imax_size; i++) {
(gdb)
385 av_frame_unref(f->queue[i].frame);
(gdb)
384 for(int i = 0; imax_size; i++) {
(gdb)
385 av_frame_unref(f->queue[i].frame);
(gdb)
[Thread 0x7fffc0a51700 (LWP 23605) exited]
386 av_frame_free(&f->queue[i].frame);
(gdb)
384 for(int i = 0; imax_size; i++) {
(gdb) p *f
$1 = {queue = {{frame = 0x0, pts = 4.2000000000000002, width = 1280, height = 720, format = 0, uploaded = 1, flip_v = 0}, {
frame = 0x555557423a00, pts = 4.2400000000000002, width = 1280, height = 720, format = 0, uploaded = 0, flip_v = 0},
{frame = 0x55555718fc40, pts = 4.2800000000000002, width = 1280, height = 720, format = 0, uploaded = 0, flip_v = 0}},
rindex = 0, windex = 0, size = 3, max_size = 3, rindex_shown = 1, mutex = 0x5555575246e0, cond = 0x555557463750}
(gdb) p *f->mutex
$2 = {id = pthread_mutex_t = {Type = Recursive, Status = Acquired, possibly with no waiters, Owner ID = 23581,
Robust = No, Shared = No, Protocol = None}}
(gdb) p *f->cond
$3 = {cond = pthread_cond_t = {Threads known to still execute a wait function = 1, Clock ID = CLOCK_REALTIME, Shared = No}}
(gdb) n
385 av_frame_unref(f->queue[i].frame);
(gdb)
384 for(int i = 0; imax_size; i++) {
(gdb)
385 av_frame_unref(f->queue[i].frame);
(gdb)
386 av_frame_free(&f->queue[i].frame);
(gdb)
384 for(int i = 0; imax_size; i++) {
(gdb)
385 av_frame_unref(f->queue[i].frame);
(gdb)
384 for(int i = 0; imax_size; i++) {
(gdb)
385 av_frame_unref(f->queue[i].frame);
(gdb)
386 av_frame_free(&f->queue[i].frame);
(gdb)
384 for(int i = 0; imax_size; i++) {
(gdb)
388 SDL_UnlockMutex(f->mutex);
(gdb)
391 std::cout << “I’m here : " << LINE << " before SDL_DestroyMutex()” << “\n”;
(gdb)
I’m here : 391 before SDL_DestroyMutex()
393 SDL_DestroyMutex(f->mutex);
(gdb)
396 std::cout << “I’m here : " << LINE << " before SDL_DestroyCond()” << “\n”;
(gdb)
394 f->mutex = nullptr;
(gdb)
396 std::cout << “I’m here : " << LINE << " before SDL_DestroyCond()” << “\n”;
(gdb)
I’m here : 396 before SDL_DestroyCond()
397 SDL_DestroyCond(f->cond);
(gdb) n
^C
Thread 1 “miniDart” received signal SIGINT, Interrupt.
0x00007fffef481449 in futex_wait (private=, expected=12, futex_word=0x555557463774)
at …/sysdeps/unix/sysv/linux/futex-internal.h:61
61 …/sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb)

… searching :-/


#4

if you use clang or gcc you can compile your code with -fsanitize=thread.
https://clang.llvm.org/docs/ThreadSanitizer.html (this page only talk about data race but it can detect other thread bug such as destroying a locked mutex)


#5

Hi,

Incredible, that’s exactly what I did :slightly_smiling_face:

More precisely, I discovered this option reading “man gcc”, and I started to debug using
-fsanitize=thread, as you suggested.

Looks like there are several bugs, and things are complicated. If I understand something, and can progress, I’ll repost, to inform people following the issue.

Thanks a lot for your help !

Edit : I forgot : I’ll give it other try, like -fsanitize=leak, -fsanitize=address and some other too, but separately (to avoid too much of confusion).

Edit 2 :

With -fsanitize=thread, I got 64 warnings, triggered when I use LoadFile(), means opening a new file (a video). Most of the warnings seem to be caused by libglib and libgio

The full output is ~ 70 kB in a text file, but I don"t know whether I’m allowed to attach such file in this forum, only keep the first lines gives:

Blockquote
head -100 thread_sanitize_output.txt
==================
WARNING: ThreadSanitizer: data race (pid=1094)
Read of size 1 at 0x7b040000a0a0 by thread T4:
#0 strlen (libtsan.so.0+0x31275)
#1 pthread_setname_np …/sysdeps/unix/sysv/linux/pthread_setname.c:38 (libpthread.so.0+0x1391c)
#2 (libglib-2.0.so.0+0x7417d)
Previous write of size 1 at 0x7b040000a0a0 by main thread (mutexes: write M651):
#0 memcpy (libtsan.so.0+0x32505)
#1 g_strdup (libglib-2.0.so.0+0x6b44c)
#2 openFileDialog(char*) src/open_file_dialog.cpp:24 (miniDart+0x13fee4)
#3 LoadFile src/main.cpp:428 (miniDart+0x13c733)
#4 main src/main.cpp:3125 (miniDart+0x24842)
Location is heap block of size 6 at 0x7b040000a0a0 allocated by main thread:
#0 malloc (libtsan.so.0+0x2ae03)
#1 g_malloc (libglib-2.0.so.0+0x51ab8)
#2 openFileDialog(char*) src/open_file_dialog.cpp:24 (miniDart+0x13fee4)
#3 LoadFile src/main.cpp:428 (miniDart+0x13c733)
#4 main src/main.cpp:3125 (miniDart+0x24842)
Mutex M651 (0x7b0c0000e430) created at:
#0 pthread_mutex_init (libtsan.so.0+0x2c5ad)
#1 (libglib-2.0.so.0+0x91e4c)
#2 openFileDialog(char*) src/open_file_dialog.cpp:24 (miniDart+0x13fee4)
#3 LoadFile src/main.cpp:428 (miniDart+0x13c733)
#4 main src/main.cpp:3125 (miniDart+0x24842)
Thread T4 ‘gmain’ (tid=1103, running) created by main thread at:
#0 pthread_create (libtsan.so.0+0x2bcee)
#1 (libglib-2.0.so.0+0x9242f)
#2 openFileDialog(char*) src/open_file_dialog.cpp:24 (miniDart+0x13fee4)
#3 LoadFile src/main.cpp:428 (miniDart+0x13c733)
#4 main src/main.cpp:3125 (miniDart+0x24842)
SUMMARY: ThreadSanitizer: data race (/usr/lib/x86_64-linux-gnu/libtsan.so.0+0x31275) in __interceptor_strlen
==================

Blockquote
And just after closing the video (causing the dead lock);
==================
do exit
entering in do_exit(is)
==================
WARNING: ThreadSanitizer: data race (pid=1094)
Write of size 4 at 0x7b4c0000aee8 by main thread:
#0 do_exit(_VideoState*) src/FFmpegPlayer.cpp:168 (miniDart+0x28924)
#1 main src/main.cpp:1153 (miniDart+0x21c8e)
Previous read of size 4 at 0x7b4c0000aee8 by thread T20:
#0 read_thread(void*) src/FFmpegPlayer.cpp:731 (miniDart+0x2a716)
#1 SDL_RunThread /home/eric/Devel/minidart/CROSS_COMPILATION/SDL2-2.0.10/src/thread/SDL_thread.c:283 (libSDL2-2.0.so.0+0x96a6b)
Location is heap block of size 432 at 0x7b4c0000ad40 allocated by main thread:
#0 posix_memalign (libtsan.so.0+0x2ba5c)
#1 av_malloc (libavutil.so.56+0x33a82)
#2 video_init src/main.cpp:5267 (miniDart+0x13c696)
#3 LoadFile src/main.cpp:412 (miniDart+0x13c696)
#4 LoadFile src/main.cpp:428 (miniDart+0x13c77d)
#5 main src/main.cpp:3125 (miniDart+0x24842)
Thread T20 ‘read_thread’ (tid=1122, running) created by main thread at:
#0 pthread_create (libtsan.so.0+0x2bcee)
#1 SDL_SYS_CreateThread /home/eric/Devel/minidart/CROSS_COMPILATION/SDL2-2.0.10/src/thread/pthread/SDL_systhread.c:120 (libSDL2-2.0.so.0+0x135d36)
#2 video_init src/main.cpp:5267 (miniDart+0x13c696)
#3 LoadFile src/main.cpp:412 (miniDart+0x13c696)
#4 LoadFile src/main.cpp:428 (miniDart+0x13c77d)
#5 main src/main.cpp:3125 (miniDart+0x24842)
SUMMARY: ThreadSanitizer: data race src/FFmpegPlayer.cpp:168 in do_exit(_VideoState*)
==================
is->tead_tid : 0x7b5c00007000
closing audio stream …
… done …
closing video stream …
… done …
closing input file …
… done …
destroying video queue …
… done …
destroying video queue …
… done …
destroying frame queue …
I’m here : 382 in frame_queue_destroy
I’m here : 391 before SDL_DestroyMutex()
I’m here : 396 before SDL_DestroyCond()


#6

First thing to know is that ThreadSanitizer can report false positive. So you need to carefully analyze and understand what’s happening in your code.

Then to correct an error ThreadSanitizer give you all the info you need.
Your first data race report tell you that you write to a variable with mutex M651 locked:
Previous write of size 1 at 0x7b040000a0a0 by main thread (mutexes: write M651):,
then read the same variable without any mutex locked:
Read of size 1 at 0x7b040000a0a0 by thread T4:.
So you look at when you created the mutex to get its name:
Mutex M651 (0x7b0c0000e430) created at:
and add a lock before the read.
However, in this report we can see that mutex, thread and data were created by glib. It is probably a false positive.


#7

Thanks a lot for your help, much appreciated, and apologies for the delay : I was afk and don’t have much free time for coding these days …

After following your advice, I got a ~ 2700 lines file -very probably- containing the origin of the dead lock. Work in progress.

Other idea I’m investigating : rewrite all the player code using pthread or something else than SDL threads. Maybe this will workaround the SDL thread maybe not in the main thread or something similar.

But I’m still wondering, why oh why, everything works perfectly on Windows, and not on Linux :frowning: (I meant : since LinuxMint 19.x)