[need help] record and convert audio

ericb · July 27, 2021, 8:30am

Hello,

To explain my need, what works already:

create a list of all the present recordable or playback audio devices on the machine ;
display them ;
select one of them ;
choose playback or record ;
record (the buffer contains ~ 2 s) ;
play 2s.

The code works very well (tested on Linux only, Windows untested yet), and some links are provided there : [SDL_OpenAudioDevice] : can we open several recordable devices and select /change the used one? - #6 by ericb (search for audiodevice and audiomanager classes)

On the video side, using OpenCV, I can create any .mp4 or .mpv or .avi too (works well on both Linux and Windows).

As you probably already understood, the next main goal is to create a file containing the recorded audio from the selected sources.

Note : I already know how to mux both streams and create some .mp4 or .mkv after, not a problem.

First step (Current WIP, close to work)

how to record continuesly the sound ? (investigating, I think i’m close: use a circular buffer, e.g. a ring buffer )

Current status : I can record 2 seconds of audio, and hear the sound selecting default system + playback device. Works well, and I can record audio from one selected between several connected webcams (perfect for my need).

Second step : convert the recorded audio to .aac or .mp3

But I’m stuck there:
What is the format of the recorded audio in the buffer ? And how to “convert” it into something “.aac” or “mp3” able ? Said differently : what does SDL2 callback record and how to convert it to .aac (low profile) or .mp3 ?

BTW, does a solution exist to directly create a file containing all the audio ? do SDL_rwops fit my needs ? (which one if so ? ) I think I have to copy the recorded sound (copied from some buffer) and create a file containing all the audio, but I have no precise plan yet.

FYI, some times ago, I implemented the audio directly recording using alsa (Linux only), NOT using SDL and giving a .wav at the end. It works very well, but this is not portable (the Windows version won’t work) nor a good solution. That’s the reason why I prefer use SDL who does the job under the hood.

Apologies for my poor wording (I’m not fluent with english), if I’m not precise, but any suggestion or track or even some links would be very appreciated

Thanks in advance !!

rtrussell · July 27, 2021, 10:17am

That is returned by SDL_OpenAudioDevice() in the obtained structure. If you don’t allow SDL to make any changes, it will be the same as you specify in the desired structure.

The way I do it is to save the audio as an uncompressed WAV file (which is simply the raw data from the buffer with a standard WAV header attached) and then use a separate utility to convert the WAV file to something else.

No doubt there is a library that you could use to do the same thing in code, but I’ve not tried that.

ericb · July 28, 2021, 7:59am

Hello,

Thanks a lot for your detailed answer

That is returned by SDL_OpenAudioDevice() in the obtained structure. If you don’t allow SDL to make any changes, it will be the same as you specify in the desired structure.

Anyway, I think I have understood the idea, and I’ll add “check” what is obtained in my TODO list.

FYI Currently, I’m using these flags : (SDL_AUDIO_ALLOW_FREQUENCY_CHANGE | SDL_AUDIO_ALLOW_CHANNELS_CHANGE). In fact, I dont know if this is optimal or not, and I could use ANY change, but I’m a complete newbie with that, and I’ll continue to search the best compromise (e.g. reading more examples).

The "expected* structure contains the values below (C++):

auto set_playback_audio_spec_S32 = [this](SDL_AudioSpec &s, SDL_AudioCallback c)
    {
        // TODO : use better values. Probably reduce the buffer size, to decrease the delay
        SDL_zero(s);
        s.freq = 44100;
        s.format = AUDIO_S32; // AUDIO_S16SYS;//
        s.channels = 2 ;  // stereo
        s.silence = 0;
        s.samples = SDL_AUDIO_MIN_BUFFER_SIZE * 2; // 2048;  /!\ Less could be better, to avoid a hole in the last buffer plot
        s.callback = c;
        s.userdata = this;
    };

TODO : investigate.

In fact, I have got a more precise question: currently, the callback records the sound coming from any wecbam microphone, seen as recording device. Maybe I’m wrong, but for me the buffer where every data is written is simply some reserved area in the (RAM) memory (because faster), and has to be copied on the disk to be definitively catched. Am I wrong ?

If so, what I need to understand is: how to write everything the fastest way on the disk, in a file ? (pardon me if I’m plain wrong, but I’d like to understand what happens)

In this case, and if I’m not mistaken, I had in mind to read the (recorded) buffer with a second (ring) callback, to write in some file, as regularly as possible. Is it correct ? Of course, feel free to correct me if I’m plain wrong

No doubt there is a library that you could use to do the same thing in code, but I’ve not tried that.

Ok, at this step, I see what I can do using ffmpeg API. I had in mind to mux the audio (as .aac) + the video (as h264 high profile, already working), but this is just one more step, and I see better what has to be done.

e.g. see : Convert mp3 to aac, use AVAudioFifo to buffer pcm data - Programmer Sought

**To be continued **

rtrussell · July 28, 2021, 8:52am

I’m not sure that the WAV format supports 32-bit samples so it would be safer to use AUDIO_S16LSB so you can copy the buffer directly to a file without any reformatting.

In my application I don’t use a callback function, instead I call SDL_DequeueAudio(). That way I can write the data in the buffer directly to a file, without any concerns about what can and can’t safely be done in a callback.

You can use the SDL_RWwrite() function to write the audio data from the buffer to a file. The only slight complication is that the WAV header contains the total length of the data which of course is not known until the capture is complete! So on completion I use SDL_RWseek() to reposition the pointer, and then update the header with the length.

ericb · July 29, 2021, 12:31pm

Hello,

Sorry for the delay. I was trying, and one bug took me a while before I fixed it. That’s now done, and it works : I can select and record any webcam (at least on Linux) using SDL2 only (means alsa on Linux or DirectSound on Windows).

I’ll commit the code very soon (look at #ifdef TESTING_DEQUEUE … #endif changes), and after a testing time, I’ll definitely replace the old code.

I’m not sure that the WAV format supports 32-bit samples so it would be safer to use AUDIO_S16LSB so you can copy the buffer directly to a file without any reformatting.

Good advice : I changed for AUDIO_S16LSB and it works perfectly. I didn’t notice this important thing. Thanks !

In my application I don’t use a callback function, instead I call SDL_DequeueAudio(). That way I can write the data in the buffer directly to a file,

Well, after some tries, SDL_DequeueAudio() works very well indeeed. The callback was working well to, but here you have nothing to do for memory allocation / release and that’s fine for me ! => adopted !

You can use the SDL_RWwrite() function to write the audio data from the buffer to a file.

I firstly tested and it was working. But I introduced a stupid bug, and I fixed it using fwrite (very similar, excepted there is no context). I’ll probably return to SDL_RWwrite() soon.

The only slight complication is that the WAV header contains the total length of the data which of course is not known until the capture is complete! So on completion I use SDL_RWseek() to reposition the pointer, and then update the header with the length.

Indeed. This is a very elegant way to screw the wrong header we write at the begining. Currently it works somewhat (probably a miscalculation on my side) but I’ll investigate.

As summary: I can select one webcam, and record the sound the time I want, and keep a .wav containing the sound. Next step : mux and synchronize video + audio (yet some work to come …)

@rtrussell: Thank you very much for your help !

HTH other with SDL2 + audio

rtrussell · July 29, 2021, 1:06pm

There are two header records that you need to update: file size (at offset 4 in the file, immediately after ‘RIFF’) and data size (immediately after ‘data’, probably at offset 40 in the file).

icculus · July 29, 2021, 1:33pm

I’m not sure that the WAV format supports 32-bit samples

It does, but it’s likely lots of tools that handle WAV files do not.

ericb · July 29, 2021, 5:00pm

@rtrussell: one more time you are right. In fact, I discovered the current structure I use to create the header does not respect the .wav specification. I’ll take some time to reorder cleanely everything. If somebody tests the code, there is even one big remaining issue: I badly designed the class, and this causes a crash when we try to record a second audio file (the first one is very good).

Today is too late (got lot of things to do) but I’ll try to add some set() / reset() methods to fix that tomorrow or asap.

Thanks again for your nice support !

EDIT: after some refactoring + simple encapsulation, everything works and one can record several times without crash, including when closing the application, and a new .wav is created every time. Uff ! (remain fix the fileSize + data Size to fix)

iniitu · December 18, 2022, 7:45pm

hello @ericb @rtrussell ! thanks for the precious information hereabove…
i’m working on an audio-visual software for which i’d like to access the recording buffer as well ( manipulating the data for visualization )
i’m curious to take a look at the code you mention, using SDL_DequeueAudio() - is it somewhere on your Framagit depository, Eric ? i coudnt manage to find it.
thanks,
Sylvain

ericb · December 23, 2022, 11:37pm

@iniitu : short answer, the SDL_DequeueAudio() relevant code is there : https://framagit.org/ericb/miniDart/-/blob/master/Sources/src/Audio/audiodevice.cpp

For further information, the interface is defined in the includes : Sources/inc · master · Eric Bachard / miniDart · GitLab Mainly concerned files are: audiodevice.hpp , audiomanager.hpp and alsa_recorder.hpp

My application allows to detect, select and record one audio stream + one video stream from several sources and create a final video with both (means mux both sources together)

Thus , the main need was to identify, on a given machine, every audio device (its name, all properties and so on), which one can record -or not- be able to select one through several + do the same with a given video device (e.g. a webcam, or a smartphone or a video). Currently everything works on Linux (audio is a problem for me on Windows, since I have no Windows machine), and the last remaining step is to implement the muxer. The main issue for me is I don’t have the time to code since one year

Thanks again to SDL library for everything.

dgm5555 · April 9, 2023, 7:09pm

@ericb I am looking for exactly what you are aspiring to produce (ie audio saved to .wav) so wondering if you ever fixed all the errors in the code - and if you could share/update your gitlab repo if you did?
Edit: I’m a total SDL novice (having only started to look at using it a couple of days ago), so don’t take the following as any expert advice.

Obviously the code posted on your gitlab will save fairly random noise with all the errors it currently has.
There are a fair few you’ve probably already picked up over the couple of months since you posted it but on initial scanning through your code the biggest ones would seem to be:

0: I’m not sure why you have created your save loop - instead why not just use a callback and save the data direct from the hardware queue when the callback is triggered rather than copying data to a secondary/temporary memory buffer - that would seem easier and would be more efficient use of resources?

1: The strange mashup of file saves randomly alternating between standard fwrite and SDL_RWwrite streams is surely a recipe for errors. Especially since you don’t seem to be opening and closing them off correctly in sequence so you must be getting errors with file locks.

If you really wanted to keep the save loop:
2. I assume the biggest cause of the noise in the current code is that your save loop seems to save an entire MAX_BUFFER load of data every loop without accounting for how much data is actually loaded into it (and as per 1. why not just stick with SDL_RWwrite for consistency.)
eg something like this: SDL_RWwrite(rwAudioStrm, &gData, bytesReturned, 1);
3. I suspect you need SDL_Delay(45) in that loop to allow time for an approximately full MAX_BUFFER load - but you will still get very random variation in the underfilling/overrun(=dataloss) and fundamentally poor quality audio as the 45 is likely only an approximate thread sleep time (and fill-time would depend on sampling rate, etc). Given this could easily vary as you allow the device to set the format is definitely “suboptimal” strategy.
4. Instead of collecting/copying the buffer data then only saving if (!b_paused)` then setting SDL_PauseAudioDevice( recordingDeviceId, SDL_FALSE ); and just sleeping the loop until it’s unpaused would be much better.

dgm5555 · April 22, 2023, 7:39am

Oh and given the recording device output format is unlikely to exactly match the .wav uint8_t or int16_t format, you will also likely need to convert the data before saving = SDL_ConvertAudio

Also for reasons which escape me - using either traditional fwrite or SDL_RWwrite seem to keep repeatedly appending the same entire stream to the file each loop - rather than just the appending the new buffer data - even though it was opened using “wb” and the buffer didn’t contain the entire file data. This meant the .wav data was badly corrupted. Sadly it took me some time to figure out why and then only after manually looking at the binary file with HxD).
An fstream worked perfectly first time.

ericb · June 11, 2023, 2:52pm

@dgm5555

Apologies for the loooong delay before answering you, but I was not able to answer you seriously before (new job, no more spare time.

Thanks a lot for reading my code and try to help me, I really appreciate ! And you are right, the way I record data seems to be not very serious, and needs more work. When I’ll have more timte (starting next week) I’ll have a deeper look, and I’ll try to improve my code. Of course, any feedback is welcome.

FYI, I did not continue in that direction, and I’m doing something completely different : currently, I’m able to select any (between several) existing and recordable audio source and any images sources e.g. webcam or a video and create an .mp4 from that. I didn’t commit the code yet, because I need to clean it up a bit, e.g. choose between static callbacks included in a class (C++ style) or C style with static callbacks (more easy to use but no nice in the middle of C++ source code).

Short answer, what works;

collect audio from the webcam micro : ok
collect video : conversion from from ffmpeg into cv::Mat is ok and works as expected.
mux both into a final .mp4 : looks ok

What does not work is audio : the .mp4 (or .mkv ou .avi) is valid including quality and synchronization, but the audio is broken (sort of shopped sound, in sync with video, but not the original sound. Probably a type issus with some pointer and very probably I’m wrong with userdata use in my callback. Can need some time … no idea how long this issue will take to be solved.

Last : I’ll very probably commit everythin in a new branch soon (in max 2 weeks I mean).

Further information here : Convert SDL2 audio into ffmpeg AVFrame (recording from webcam + muxing)

Thanks again for your support