SDL_mixer accuracy

hi all,

i’m using SDL and SD_mixer to present very short audio probes (50msec) to a test subject in a psychology experiment. However, it is rather important that we can present these probes as accurate as possible, time-wise. So when I call Mix_PlayChannel() at a certain time, I really really need to be sure that that sound is coming out of the headphones as fast as possible (preferably the same millisecond although I realize that that is a bit much to expect) but definitely with a CONSTANT delay.

i’ve been doing some tests with a sound trigger setup (which reads out the audio and records the exact time sound happens above a certain threshold) and it looks like as if the audio sometimes comes 50msec too late, but sometimes up to 140msec too late.

is that possible ? has anyone any experience with this ? Are there parameters I can play with to improve timing accuracy ?

the code I use to initialize is like this (code for return value checking omitted for clarity)

Mix_OpenAudio(22050, AUDIO_S16SYS, 2, SDL_AUDIO_BUFFER_SIZE); Mix_AllocateChannels(16); memset(m_chunk_buffer, 0, sizeof(m_chunk_buffer)); m_chunk_movie = Mix_QuickLoad_RAW((Uint8 *)m_chunk_buffer, SDL_AUDIO_BUFFER_SIZE); Mix_PlayChannel(0, m_chunk_movie, -1); Mix_RegisterEffect(0, mixer_effect_ffmpeg_cb, NULL, &m_video); SDL_PauseAudio(0);

and playing audio is simply a call to

Mix_PlayChannel(0, m_wave, 0);

where m_wave is a Mix_chunk loaded way before with

m_wave = Mix_LoadWAV(ascii.c_str());

any advice would be much appreciated !

Hi,

I’m not familiar with SDL_Mixer, but in general to reduce audio latency, 1) use a high sample rate combined with 2) a small buffer size. Also, avoid Bluetooth. :slight_smile:

The latency added due to buffering will be anywhere from 0ms to ‘BufferSize / SampleRate * 1000’ ms.

What is the value of SDL_AUDIO_BUFFER_SIZE in your example?

Assuming it’s 2048, and your relatively low sample rate of 22050Hz that’s anywhere from 0ms to 93ms additional latency, on top of other processing and buffering delays which may happen.

thanks for the feedback ! I have reduced the buffer size and increased the sample rate and indeed : I managed to get the latency to 44-55msecs. I was wondering however if there is a way to decrease even that ? Is there ANY api that has lower latencies or is this inherent in audio playback ?

What is your platform? In my experience latency varies considerably between Windows, Linux, MacOS, Android etc. If you’re on Windows, you may find that switching from the WASAPI driver to the DirectSound driver (or vice versa) makes a difference. Also, ensure that you are outputting the native sample rate expected by the audio driver: if any sample rate conversion is required that will add to the latency. You may find it helpful not to use SDL_mixer at all but rather the native audio capabilities in SDL2 (i.e. SDL_OpenAudioDevice() etc.)

Many things go into achieving the lowest possible latency.

  1. First, how are you measuring the delay between the moment you trigger and hear the sound? How sure are you the method of measurement doesn’t introduce latency and error itself?

  2. If you are using the audio hardware that’s inside whatever machine you’re using, consider using an (external if has to) audio interface with known characteristics, like a high native sample rate, small buffer sizes optimized for low latency, etc. A professional audio interface can be expensive. If you get a consumer level one, make sure to do your research!

  3. How is your code organized? Do you also draw stuff to the screen, and are you perhaps using vsync? A 60Hz monitor with vsync enabled will introduce a fixed 16ms input latency. Consider turning off vsync.

  4. As rtrussell suggested, consider using a different audio api. Even when addressing the same hardware, the api may change characteristics.

Thanks Marcel & Russel !

  • I am on windows 10, a realy speedy machine (12core Xeon @ 3.8GHz)
  • my file is a simple WindowsPCM sound @ 44100Hz, 16bit mono
  • measuring is done with a special piece of hardware which we know to be reliable : it receives line-in audio and triggers TTL within 1 millisecond when the audio reaches a certain volume threshold. When using a purely analog solution with a logic analyser attached to both, I can see that the audio is detected within 1 millisecond. So fairly confident here :slight_smile:
  • my code is multithreaded with the audio running in a separate thread, and no VSync in use. But even if it was VSync, one would expect latencies in the range of 10 to 15msec depending on 100Hz to 85Hz monitors, not a 50msec delay
  • ideal would indeed be to use external audio for the sound, that does not have a computer inside, but generates the sound itself based on a millisecond trigger. Such “probe generators” do exist but they’re cumbersome to use and would reduce the number of setups we can use (right now it all runs on one computer, without the need for audio generators)
  • I will see if I can write a small SDL application that bypasses SDL_mixer entirely, and check if that reduces the latencies. But I wonder if that will help : isn’t SDL_mixer a layer on top to mainly handle formats ? Do you really think this will make a huge difference ?

thanks again for all the info already ! I’m learning a lot here :sunglasses:

I really really would not recommend SDL_mixer for absolute lowest latency. That being said, best practices here would be to make sure you are never never converting audio between formats (as SDL might be buffering audio you hand it in order to convert) and choosing the absolute smallest sample size you can, which makes the audio callback fire more often for smaller amounts of data at a time (which is to say: with less time between you feeding it data and that data coming out of the speakers).

For example, it’s possible the Windows hardware wants 48000Hz audio, so SDL doesn’t feed it directly to the hardware when you give it 44100Hz.

Hi icculus,

thanks for the info ! I’ve been playing with all the parameters, and 44100Hz witha buffer of 512 seems to be the best solution at the moment : a delay of about 40-50msec, which is still more than I expected. Is there a way to figure out these native windows hardware requirements ?

also, do you think it is actually possible AT ALL to play audio with delays of less than 5msec ? Or is this simply something that can not be done on todays hardware ? What would you expect the bottom limit to be ? 10msec ?

If you’re on Android, you could try Oboe: A C++ library for low latency audio .

i’m on windows & mac :cry:

Threads can introduce some variability, especially with context switching, and you’ve got at least two more than you need. Try and turn off other apps and services that aren’t needed. Make sure you’re not on power saving mode, or on batteries.

Reiterating the advice:

  • use SDL_OpenAudioDevice directly
  • try different backend drivers. Pretty sure DirectSound is often lower latency.
  • audio hardware and driver has an impact. There are often multiple sound drivers for one piece of hardware, and some are a lot better for low latency than defaults.
  • Waiting for events rather than polling could be much better, but failing that turn off VSYNC. However you’re triggering it, try and measure that latency somehow.
  • as well as bluetooth, USB audio is often laggy, and so are any ‘effects’ that might be enabled.

For windows, it mentions that you can change the buffer size. https://docs.microsoft.com/en-us/windows-hardware/drivers/audio/low-latency-audio#windows_audio_session_api_wasapi

“WASAPIAudio sample (available on GitHub: https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/WindowsAudioSession) shows how to use IAudioClient3 for low latency.”

If you look at the SDL WASAPI driver you see that it does not use the IAudioClient3 low latency features available in Windows 10. https://hg.libsdl.org/SDL/file/0789a425e8d7/src/audio/wasapi/SDL_wasapi.c This means it uses the 10ms buffer rather than eg. a 1ms buffer. (I think? corrections welcome)

Hi Jeroen,

Less than 5ms delay is absolutely possible. Machine spec is almost irrelevant. Even a twenty year old machine could give you near-zero latency. It all depends on audio hardware, drivers, OS (optimizations) etc. If you just want to output some noise, probably nothing beats an Arduino with a piezo hooked to one of it’s digital outputs. That should give you near 0ms latency. It’s a bit hard to output anything else than noise or synthetic sounds on an Arduino though, and if you want to use a speaker, it requires some basic electronics knowledge. On a modern OS, you’ll have to carefully select audio hardware, audio drivers, driver settings and make sure you’re not blocking your input polling thread and make sure thread priorities are ok.

With regard to latency measurement: how do you know for sure the special piece of hardware you use starts measuring the moment you activate the sound? How do you tell it to start measuring? It could also add a latency (meaning your actual latency would be even higher…).

With regard to an audio interface, I was alluding to an audio interface like one of RME or MOTU produce. These have much better specs than most built-in audio, and optimized low latency audio drivers. They’re costly though, but maybe the place you work has this equipment lying around somewhere.

You’re currently using a 512 buffer size. In my opinion, that’s huge. When doing live music, that already introduces a noticeable delay between hitting a key on a midi keyboard and hearing the sound. Does the audio start to crackle below 512 samples? Instead of SDL w/ SDL_Mixer, you could use a library like PortAudio, and only use SDL_Mixer to load in the samples. PortAudio will just give you an audio stream and you can enumerate and configure the driver to your liking.
http://www.portaudio.com/

It’s perhaps worth adding that a buffer (of any size) does not necessarily imply a latency. In principle there’s no reason why the audio drivers/hardware cannot start outputting the contents of the buffer as soon as they have received it, so the latency could be close to zero even if the buffer is quite large. But this requires a ‘push’ model in which the app sends the buffer to the driver at a time of its choosing, rather than a ‘pull’ model in which the driver requests continuous data (e.g. via a callback).

Does any hardware/driver exist which does that though, on a modern desktop OS?

Also for continuous audio streams, the driver/hardware should at least implement a double buffering scheme (or ensure extreme realtime-ness) to avoid audible audio glitches.