[SDL 2.28.4] [alsa] how does SDL handle SND_PCM_FORMAT_U24_LE alsa format ?

Context :

  • I’m working on Linux, x86_64, recent kernel (customized 6.1.68), but the code will be cross-compiled for Windows later (no problem with that)
  • sdl2-config --version returns : 2.28.4
  • inxi -F returns, for the audio part :

Audio:     Device-1: Intel Tiger Lake-LP Smart Sound Audio driver: snd_hda_intel 
           Device-2: Logitech HD Pro Webcam C920 type: USB driver: snd-usb-audio,uvcvideo 
           Sound Server: ALSA v: k6.1.68-ericb

And I’m trying to record audio+video with a Logitech C920 webcam.

For obvious portability reasons, my choice is to use SDL2 (I’ll try the promising SDL3 asap).

As a summary, actually, recording audio+video works, but audio is suboptimal, and I’d like to fix the problem seriously.

Separately, there are NO problem at all :

and when using ALSA directly, the AUDIO_S16LSB format works as expected, in stereo mode, and the sound is perfect for my need.

(see https://github.com/ebachard/Linux_Alsa_Audio_Record/blob/master/src/alsa_record.cpp in the sources)

And I observed that, when recording, the SDL2 alsa audio driver, does not work as expected. More precisely, AUDIO_S16LSB format is NOT detected. So I tried to find what happens with SDL2.

Methodology: I compared SDL alsa driver intialisation to my own alsa driver initialisation.

I did first study how works alsa driver. Adding some verbosity, my webcam, in recording mode, is seen by SDL2 alsa audio driver as

INFO: Using audio driver: alsa
this->spec.format : 8 (should be AUDIO_S16LSB)
this->spec.format = SND_PCM_FORMAT_U24_LE. It means, it’s an unsigned 24 bit Little Endian
using low three bytes in 32-bit word (not handled by SDL …)

Investigating alsa sources, (for example enum snd_pcm_format_t — ALSA Library Documentation), I found :

	/** Unsigned 24 bit Little Endian using low three bytes in 32-bit word */
	/** Unsigned 24 bit Big Endian using low three bytes in 32-bit word */

That means SND_PCM_FORMAT_U24* formats are seen by alsa as 32-bit words (indeed, after some tries, looks like the -planar- bytes are grouped by 3, sort of LLL RRR planar)

This case is obviously internaly managed by alsa library (see https://github.com/alsa-project/alsa-lib/blob/master/src/pcm/pcm_misc.c for further information)

Note: full C920 detected features are provided at the end of my post.

Back to SDL2 source code, there is NO trace of SND_PCM_FORMAT_U24* at least in 2.28.4 version, and I wonder what I can do to find a clean solution.

My questions now :

Is SDL2 alsa audio driver missing some cases ?

Do you have a solution/workaround ? Of course if the solution is simply not implemented, and not too complicated, I can propose a patch or something close to help, but I need SDL audio experts answers before.

Any suggestion is welcome. Thanks in advance for any help :slight_smile:

Complement: full C920 detected features

ALSA library version: 1.2.2

PCM stream types:

PCM access types:

PCM formats:
  S8 (Signed 8 bit)
  U8 (Unsigned 8 bit)
  S16_LE (Signed 16 bit Little Endian)
  S16_BE (Signed 16 bit Big Endian)
  U16_LE (Unsigned 16 bit Little Endian)
  U16_BE (Unsigned 16 bit Big Endian)
  S24_LE (Signed 24 bit Little Endian)
  S24_BE (Signed 24 bit Big Endian)
  U24_LE (Unsigned 24 bit Little Endian)
  U24_BE (Unsigned 24 bit Big Endian)
  S32_LE (Signed 32 bit Little Endian)
  S32_BE (Signed 32 bit Big Endian)
  U32_LE (Unsigned 32 bit Little Endian)
  U32_BE (Unsigned 32 bit Big Endian)
  FLOAT_LE (Float 32 bit Little Endian)
  FLOAT_BE (Float 32 bit Big Endian)
  FLOAT64_LE (Float 64 bit Little Endian)
  FLOAT64_BE (Float 64 bit Big Endian)
  IEC958_SUBFRAME_LE (IEC-958 Little Endian)
  IEC958_SUBFRAME_BE (IEC-958 Big Endian)
  MU_LAW (Mu-Law)
  A_LAW (A-Law)
  S20_LE (Signed 20 bit Little Endian in 4 bytes, LSB justified)
  S20_BE (Signed 20 bit Big Endian in 4 bytes, LSB justified)
  U20_LE (Unsigned 20 bit Little Endian in 4 bytes, LSB justified)
  U20_BE (Unsigned 20 bit Big Endian in 4 bytes, LSB justified)
  SPECIAL (Special)
  S24_3LE (Signed 24 bit Little Endian in 3bytes)
  S24_3BE (Signed 24 bit Big Endian in 3bytes)
  U24_3LE (Unsigned 24 bit Little Endian in 3bytes)
  U24_3BE (Unsigned 24 bit Big Endian in 3bytes)
  S20_3LE (Signed 20 bit Little Endian in 3bytes)
  S20_3BE (Signed 20 bit Big Endian in 3bytes)
  U20_3LE (Unsigned 20 bit Little Endian in 3bytes)
  U20_3BE (Unsigned 20 bit Big Endian in 3bytes)
  S18_3LE (Signed 18 bit Little Endian in 3bytes)
  S18_3BE (Signed 18 bit Big Endian in 3bytes)
  U18_3LE (Unsigned 18 bit Little Endian in 3bytes)
  U18_3BE (Unsigned 18 bit Big Endian in 3bytes)
  G723_24 (G.723 (ADPCM) 24 kbit/s, 8 samples in 3 bytes)
  G723_24_1B (G.723 (ADPCM) 24 kbit/s, 1 sample in 1 byte)
  G723_40 (G.723 (ADPCM) 40 kbit/s, 8 samples in 3 bytes)
  G723_40_1B (G.723 (ADPCM) 40 kbit/s, 1 sample in 1 byte)
  DSD_U8 (Direct Stream Digital, 1-byte (x8), oldest bit in MSB)
  DSD_U16_LE (Direct Stream Digital, 2-byte (x16), little endian, oldest bits in MSB)
  DSD_U32_LE (Direct Stream Digital, 4-byte (x32), little endian, oldest bits in MSB)
  DSD_U16_BE (Direct Stream Digital, 2-byte (x16), big endian, oldest bits in MSB)
  DSD_U32_BE (Direct Stream Digital, 4-byte (x32), big endian, oldest bits in MSB)

PCM subformats:
  STD (Standard)

PCM states:

I don’t understand what the exact problem is, is there a time-sync issue, grainy vocals, or are you getting mono when you expect stereo?

Is there code that we can view?

In general, the driver and SDL will convert the audio samples to the nearest available/requested format. You would likely find more formats available in libraries that focus on audio, but SDL provides a sane number of options.

If you are planning to edit the audio samples, then it might be convenient to use 32 bit sample size to help prevent overflow while mixing or applying effects, but if you just plan on playing the audio directly as recorded, then 16 bit audio should suffice.

Thanks for your answer,

To answer you, I only want to play my recorded audio+video as a normal video (whatever container work, see my previous post). With my current implementation audio and video synchronization is perfect, and as I wrote, video is ok, since … years. What does not work as expected is the low level audio recording.

Maybe I forgot to say the recording device is an USB webcam, and the model is Logitech C920 (same results with C922 and Brio 4k btw). This is a very good webcam, and It works perfectly since years. The audio capabilities and parameters are in my previous post.

Now, what I did :

Case1. I record audio (only audio and no video) using alsa driver directly, with a little command line application, SND_PCM_FORMAT_U24_LE is correctly detected AND the sound is correctly recorded, means SND_PCM format is correctly interpreted and the recorded sound is perfect.

More intersting the alsa driver intitialization is ~ the same as the SDL alsa audio driver does, excepted the SDL alsa audio driver does not know the SND_PCM_FORMAT_U24(LE or BE).

FYI, the code of my alsa (recording .wav) application is there : GitHub - ebachard/Linux_Alsa_Audio_Record: Very simple Linux audio recording using Alsa

Case 2. Using libav API + SDL + OpenCV, I wrote a " muxer" to record audio + vidéo. On the audio side, for portability reasons, I’d like to implement SDL + SDL alsa driver, thus the Windows port would be direct. The issue is that the sound is obviously not correct (noisy).

More precisely, the PCM format detected by SDL alsa wrapper returns the correct enum number, but the SND_PCM_FORMAT_U24_LE does not exist in SDL code, and the result is, how say … undefined ? (while this format is perfectly interpreted by Linux alsa driver).

The NOT working (audio blurry) application is there : Sources/step8 · master · Eric Bachard / AudioRecord · GitLab

Means something is missing in SDL alsa driver, but I don’t know what exactly.

I simply would like someone to help me, to figure out what exactly does not work (very probably, something in SDL alsa driver is not correct, imho).

Apologies if my english is not correct, I’m not fluent.

In line #947 of your src/muxer.cpp file
want_in.format = audio_st.enc->sample_fmt;
It looks like you are providing an AV_SAMPLE_FMT_etc value to the SDL_AudioSpec structure format field.

I’m pretty certain that AV and SDL would not have compatible internal values between them. (FFMPEG defines them using an enum and SDL defines them using #define macros)

One option I’d recommend is to create an std::unordered_map and map AV variables to the suitable SDL variables.

// Something like this:
        std::unordered_map<AVSampleFormat, int> map;
        map[AV_SAMPLE_FMT_S16] = AUDIO_S16;
        map[AV_SAMPLE_FMT_S32] = AUDIO_S32;
        map[AV_SAMPLE_FMT_U8] = AUDIO_U8;
        // etc

// and here is it being used
        SDL_AudioSpec want_in;
        SDL_AudioSpec have_in;

        want_in.freq = audio_st.enc->sample_rate;
        auto got = map.find(audio_st.enc->sample_fmt);
        if(got == map.end())
                SDL_Log("Error, sample format is not currenlty supported");
                // cleanly exit or try something else?
                want_in.format = map[audio_st.enc->sample_fmt];

        want_in.channels = 1;
        want_in.samples = audio_samples_number;
        want_in.callback = nullptr;

Unfortunately, this does mean that you would have to write the conversion functions for those data sample types that you want to use but SDL doesn’t support.

1 Like

@GuildedDoughnut thank you very very much for your time and the precious information including code sample ! Now I have a very serious track to correctly map the samples :slight_smile:

Waiting, I’ll continue to learn and patiently read the code on both sides (ffmpeg and SDL) to see what can be done (and how), in order to obtain the same result as alsa provides already (would be a nice result for me).

To be honest, I’m suprised nobody did such “translation” between the two “worlds” (SDL and ffmpeg) before.

FYI, I did some tries in meantime and I’ll document (as regularly as possible) what I do. See: Sources/step8/documentation/aspect_spectral.pdf · master · Eric Bachard / AudioRecord · GitLab

So SDL probably doesn’t directly support 24-bit audio because it isn’t really necessary for games (I understand that you aren’t writing a game). SDL supports 32-bit floating-point samples because a lot of OSes support it (it’s the default audio format on macOS IIRC), and it makes writing mixers easier. Generally speaking, if you’re recording audio from a webcam microphone then the microphone itself is never going to be able to make use of the dynamic range afforded by recording 24-bit samples.

24-bit audio is fine for recording with a high-end, professional audio setup if you’re going to be manipulating the audio later. If not, 16-bit more than covers what you need, both in fidelity and dynamic range, especially from a webcam microphone. And audio that is recorded in 24-bit always gets mixed down to 16-bit for delivery anyway.

@sjr : thank you for your answer and your explanations.

To answer you, I’m sorry if it wasn’t clear, but I’m perfectly fine with recording @16bits. I don’t need more recording handball events / important situations of matches that I need for analysis.

I’d simply try to solve one remaining issue : how to remap sound coming from alsa, to match with ffmpeg AVframe and obtain a correct sound (current 16 bits signed sound is not good enough yet).

This is very important for me, because it’s the very last technical difficulty I have to solve, before a big code refactorization who could lead me to finalize a software I started to write -for the fun- 7 years ago.

What do you mean by “not good enough”?

The difference between 24-bit and 16-bit audio is basically inaudible; 24-bit gets you more dynamic range, so sounds can get louder before clipping

Not good enough means, the sound is distorded, and “drolls” (not sure with the translation, means low frequencies are anormaly amplified + some extra harmonics are there). The difference is very audible, and the problem is maybe simple to solve. Searching … :slight_smile:

For further information, see the link I provided in my previous answer (document : aspect_spectral.pdf). I recorded the same letter with audacity (good sound, clean and not distorded) and the same sound using SDL+alsa converted into AVFrames.

Looking at your code, as somebody mentioned earlier you’re directly assigning an FFMPEG audio format enum to your SDL_AudioSpec, which is almost certainly not going to be correct.
Line 945 muxer.cpp: want_in.format = audio_st.enc->sample_fmt;

You’re going to need to map FFMPEG’s audio format enums to SDL’s instead, and the method @GuildedDoughnut suggested is the way I’d go.

The way the code is right now, the format you’re telling SDL you want is not what you think you’re telling SDL you want.

Further down, on line 950, you have:
input_dev = SDL_OpenAudioDevice(NULL /* default */, 1 /* isCapture */, &want_in, &have_in, SDL_AUDIO_ALLOW_ANY_CHANGE);

The SDL_AUDIO_ALLOW_ANY_CHANGE flag tells SDL it can make any changes to any of the values you’ve passed in, but you never check have_in to see what you actually got.

So what seems to be happening is that you’re opening your audio output stream with a given format and sample rate, then initializing SDL for audio input (with the wrong values), not checking what audio format SDL is actually giving you, and then you’re not doing any conversion between the audio format you’re getting from SDL and what you’re handing to FFMPEG.

I’d initialize SDL first (using SDL values for SDL_AudioSpec), then init FFMPEG with what you get from SDL (converting from SDL’s format enum’s to FFMPEG’s). That way you don’t have to do any audio format or sample rate conversion on the fly.

If you want to keep things more or less the way they are, you’ll have to store have_in somewhere, and in your captureUpdate() function you’ll need to access it and convert the audio samples you’re getting from SDL to the sample format you want to give FFMPEG. SDL has functions to do the conversion for you, see SDL_AudioCVT, but you have to call them manually.

Also, beware that SDL_DequeueAudio() is dealing with bytes, not samples, so
Line 82 muxer.cpp: static unsigned int maxBytes = SAMPLES;
is probably wrong unless you’re dealing with 1-channel 8-bit audio (a 2048 sample buffer, for stereo 16-bit audio, is gonna be 8192 bytes)

1 Like

So the reason you’re getting poor quality audio seems to be because you are passing input audio in one format to an output stream in a different format, without doing any conversion, and miraculously this isn’t coming out as speaker-busting noise.

1 Like

Wow … one more time, thank you very very much for your code review, who really enlighted me !

As I wrote, I knew I was plain wrong, and of course, I’ll follow your advices precisely : this will save me a lot of time, and the TODO list seams to be very consistent. I’ll now stop spamming the forum, and take some time to prepare a plan for the nest steps. Of course, If I make progress, I’ll update the topic.

To be continued :slight_smile:

BTW, just in case this can help someone else, I found a link explaining a lot about audio: Decoding audio files with ffmpeg.

With best regards

1 Like

Glad to be of help!

One thing consider is ditching the SDL_AUDIO_ALLOW_ANY_CHANGE flag in your call to SDL_OpenAudioDevice(). I can’t imagine there’re many audio devices these days that can’t handle signed 16-bit 44.1kHz mono audio input.

That way you can focus on making your program work, and once it’s working correctly you can add SDL_AUDIO_ALLOW_ANY_CHANGE if you want (just make sure to check what you’re actually getting, pass that on to FFMPEG, etc., like I mentioned in my earlier post).

1 Like