[need help] record and convert audio

Hello,

To explain my need, what works already:

  • create a list of all the present recordable or playback audio devices on the machine ;
  • display them ;
  • select one of them ;
  • choose playback or record ;
  • record (the buffer contains ~ 2 s) ;
  • play 2s.

The code works very well (tested on Linux only, Windows untested yet), and some links are provided there : [SDL_OpenAudioDevice] : can we open several recordable devices and select /change the used one? - #6 by ericb (search for audiodevice and audiomanager classes)

On the video side, using OpenCV, I can create any .mp4 or .mpv or .avi too (works well on both Linux and Windows).

As you probably already understood, the next main goal is to create a file containing the recorded audio from the selected sources.

Note : I already know how to mux both streams and create some .mp4 or .mkv after, not a problem.

First step (Current WIP, close to work)

  • how to record continuesly the sound ? (investigating, I think i’m close: use a circular buffer, e.g. a ring buffer )

Current status : I can record 2 seconds of audio, and hear the sound selecting default system + playback device. Works well, and I can record audio from one selected between several connected webcams (perfect for my need).

Second step : convert the recorded audio to .aac or .mp3

But I’m stuck there:
What is the format of the recorded audio in the buffer ? And how to “convert” it into something “.aac” or “mp3” able ? Said differently : what does SDL2 callback record and how to convert it to .aac (low profile) or .mp3 ?

BTW, does a solution exist to directly create a file containing all the audio ? do SDL_rwops fit my needs ? (which one if so ? ) I think I have to copy the recorded sound (copied from some buffer) and create a file containing all the audio, but I have no precise plan yet.

FYI, some times ago, I implemented the audio directly recording using alsa (Linux only), NOT using SDL and giving a .wav at the end. It works very well, but this is not portable (the Windows version won’t work) nor a good solution. That’s the reason why I prefer use SDL who does the job under the hood.

Apologies for my poor wording (I’m not fluent with english), if I’m not precise, but any suggestion or track or even some links would be very appreciated :slight_smile:

Thanks in advance !! :slight_smile:

That is returned by SDL_OpenAudioDevice() in the obtained structure. If you don’t allow SDL to make any changes, it will be the same as you specify in the desired structure.

The way I do it is to save the audio as an uncompressed WAV file (which is simply the raw data from the buffer with a standard WAV header attached) and then use a separate utility to convert the WAV file to something else.

No doubt there is a library that you could use to do the same thing in code, but I’ve not tried that.

Hello,

Thanks a lot for your detailed answer :slight_smile:

That is returned by SDL_OpenAudioDevice() in the obtained structure. If you don’t allow SDL to make any changes, it will be the same as you specify in the desired structure.

Anyway, I think I have understood the idea, and I’ll add “check” what is obtained in my TODO list.

FYI Currently, I’m using these flags : (SDL_AUDIO_ALLOW_FREQUENCY_CHANGE | SDL_AUDIO_ALLOW_CHANNELS_CHANGE). In fact, I dont know if this is optimal or not, and I could use ANY change, but I’m a complete newbie with that, and I’ll continue to search the best compromise (e.g. reading more examples).

The "expected* structure contains the values below (C++):

auto set_playback_audio_spec_S32 = [this](SDL_AudioSpec &s, SDL_AudioCallback c)
    {
        // TODO : use better values. Probably reduce the buffer size, to decrease the delay
        SDL_zero(s);
        s.freq = 44100;
        s.format = AUDIO_S32; // AUDIO_S16SYS;//
        s.channels = 2 ;  // stereo
        s.silence = 0;
        s.samples = SDL_AUDIO_MIN_BUFFER_SIZE * 2; // 2048;  /!\ Less could be better, to avoid a hole in the last buffer plot
        s.callback = c;
        s.userdata = this;
    };

TODO : investigate.

In fact, I have got a more precise question: currently, the callback records the sound coming from any wecbam microphone, seen as recording device. Maybe I’m wrong, but for me the buffer where every data is written is simply some reserved area in the (RAM) memory (because faster), and has to be copied on the disk to be definitively catched. Am I wrong ?

If so, what I need to understand is: how to write everything the fastest way on the disk, in a file ? (pardon me if I’m plain wrong, but I’d like to understand what happens)

In this case, and if I’m not mistaken, I had in mind to read the (recorded) buffer with a second (ring) callback, to write in some file, as regularly as possible. Is it correct ? Of course, feel free to correct me if I’m plain wrong :slight_smile:

No doubt there is a library that you could use to do the same thing in code, but I’ve not tried that.

Ok, at this step, I see what I can do using ffmpeg API. I had in mind to mux the audio (as .aac) + the video (as h264 high profile, already working), but this is just one more step, and I see better what has to be done.

e.g. see : Convert mp3 to aac, use AVAudioFifo to buffer pcm data - Programmer Sought

**To be continued ** :slight_smile:

I’m not sure that the WAV format supports 32-bit samples so it would be safer to use AUDIO_S16LSB so you can copy the buffer directly to a file without any reformatting.

In my application I don’t use a callback function, instead I call SDL_DequeueAudio(). That way I can write the data in the buffer directly to a file, without any concerns about what can and can’t safely be done in a callback.

You can use the SDL_RWwrite() function to write the audio data from the buffer to a file. The only slight complication is that the WAV header contains the total length of the data which of course is not known until the capture is complete! So on completion I use SDL_RWseek() to reposition the pointer, and then update the header with the length.

Hello,

Sorry for the delay. I was trying, and one bug took me a while before I fixed it. That’s now done, and it works : I can select and record any webcam (at least on Linux) using SDL2 only (means alsa on Linux or DirectSound on Windows).

I’ll commit the code very soon (look at #ifdef TESTING_DEQUEUE … #endif changes), and after a testing time, I’ll definitely replace the old code.

I’m not sure that the WAV format supports 32-bit samples so it would be safer to use AUDIO_S16LSB so you can copy the buffer directly to a file without any reformatting.

Good advice : I changed for AUDIO_S16LSB and it works perfectly. I didn’t notice this important thing. Thanks !

In my application I don’t use a callback function, instead I call SDL_DequeueAudio(). That way I can write the data in the buffer directly to a file,

Well, after some tries, SDL_DequeueAudio() works very well indeeed. The callback was working well to, but here you have nothing to do for memory allocation / release and that’s fine for me ! => adopted !

You can use the SDL_RWwrite() function to write the audio data from the buffer to a file.

I firstly tested and it was working. But I introduced a stupid bug, and I fixed it using fwrite (very similar, excepted there is no context). I’ll probably return to SDL_RWwrite() soon.

The only slight complication is that the WAV header contains the total length of the data which of course is not known until the capture is complete! So on completion I use SDL_RWseek() to reposition the pointer, and then update the header with the length.

Indeed. This is a very elegant way to screw the wrong header we write at the begining. Currently it works somewhat (probably a miscalculation on my side) but I’ll investigate.

As summary: I can select one webcam, and record the sound the time I want, and keep a .wav containing the sound. Next step : mux and synchronize video + audio (yet some work to come …)

@rtrussell: Thank you very much for your help !

HTH other with SDL2 + audio

1 Like

There are two header records that you need to update: file size (at offset 4 in the file, immediately after ‘RIFF’) and data size (immediately after ‘data’, probably at offset 40 in the file).

I’m not sure that the WAV format supports 32-bit samples

It does, but it’s likely lots of tools that handle WAV files do not.

@rtrussell: one more time you are right. In fact, I discovered the current structure I use to create the header does not respect the .wav specification. I’ll take some time to reorder cleanely everything. If somebody tests the code, there is even one big remaining issue: I badly designed the class, and this causes a crash when we try to record a second audio file (the first one is very good).

Today is too late (got lot of things to do) but I’ll try to add some set() / reset() methods to fix that tomorrow or asap.

Thanks again for your nice support !

EDIT: after some refactoring + simple encapsulation, everything works and one can record several times without crash, including when closing the application, and a new .wav is created every time. Uff ! (remain fix the fileSize + data Size to fix)