Convert SDL2 audio into ffmpeg AVFrame (recording from webcam + muxing)

Hello,

I’m trying to record audio and video from a webcam (Logitech C920 currently, but any webcam with a micro could do it), and I’ve got several questions

First, I’m using -sort of- circular/ring buffer, and callbacks to:

  • record audio stream from the webcam
  • playback it into the default device

Tested a lot (Linux only, but it should be easely portable to Windows), it seems to work very well, but maybe something is plain wrong, and I need advices.

The code is there : Sources/step4 · master · Eric Bachard / AudioRecord · GitLab

FYI: I added some links in the code, to explain what I did and who inspired me.

Second question :

I have integrated this code inside a muxer, based on muxing.c (available in ffmpeg doc) and written by Fabrice Bellard.

The idea is to create an mp4 vidéo including both images and sound (and complete miniDart, recording any selected audio+video sources, but that’s the next step)

More precisely:

  • the input is : webcam (Logitech C920) audio + video;
  • the output is an mp4 container with aac for audio and h264 for video;
  • to stop recording, just hit the ESC key.

Currently, the video recording works fine, and the last problem to solve is about the audio, not working.

If I’m not too wrong, the sound coming from the micro has AUDIO_S16LSB format (SDL side). It works very well, no problem with that, and I can even, talking in the micro, record the sound with playback working using Audacity : SDL does a great job here.

In fact, the muxer (choosen for portability reasons), needs a libav AVFrame to create the final video file, and my problem is I don’t know what do to convert SDL audio into audio AVFrames.

The state of my work in progress is : the video is nice, but the audio (perfectly synchronized !) is not normal : this is just noise, but we can hear something, and verify video and audio are synchronized.

The code is available there : Sources/step6 · master · Eric Bachard / AudioRecord · GitLab (see muxer.cpp). There is a script to compile it, and a list of dependencies is provided in the script (easy to test under Linux)

If this could help, looking at the log (see below), the output codec is AAC and waits for fltp audio format. Of course, I searched a lot on the web, but found nothing usable/usefull to solve my problem. What do I have to do with such data ? Can I expect other SDL_AUDIO_FORMAT, closer to 16LSB (e.g.) or … ??
how many steps are missing ? Am I close or … ?

Thanks in advance for any suggestion, advice or help :slight_smile:
Eric Bachard

FYI, the log says (on Linux):

FPS : 24
Adresse de cb_out = 0x556495e4beb8
Adresse de cb_in = 0x556495e4bd13
have_out.freq = 48000
have_out.samples = 1024
Found encoder : ‘h264’
Found encoder : ‘aac’
(*codec)->sample_fmts[0] contains : = 8
(*codec)->supported_samplerates[0] = 96000
(*codec)->supported_samplerates[1] = 88200
(*codec)->supported_samplerates[2] = 64000
(*codec)->supported_samplerates[3] = 48000
(*codec)->supported_samplerates[4] = 44100
(*codec)->supported_samplerates[5] = 32000
(*codec)->supported_samplerates[6] = 24000
We got : c->sample_rate = 48000
ost->st->time_base = 1/48000
Found AV sample_fmt of type : 8, means : fltp
this sample format has 4 bytes per sample ,
and its buffer size is equal to 8192
[libx264 @ 0x556496523b40] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 @ 0x556496523b40] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0x556496523b40] 264 - core 164 r3098 7628a56 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - x264, the best H.264/AVC encoder - VideoLAN - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=12 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=12 rc=abr mbtree=1 bitrate=3000 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
nb_samples = 1024
Output #0, mp4, to ‘outfile.mp4’:
Stream #0:0: Video: h264, yuv420p, 1280x720, q=2-31, 3000 kb/s, 24 tbn
Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp, 192 kb/s

Answering myself : I commited some changes today, and things start to work.

Currently it’s possible to record .mp4, .avi and .mkv (basic support for all, and codecs are autodetected, since the code is ffmpeg 4.4.x compatible). Tested with Logitech C920 webcam on Linux only.

See : Sources/step6 · master · Eric Bachard / AudioRecord · GitLab

To record a video+ sound, simply type ./muxer filename.${ext}, where ${ext} car be .mp4, or .mkv or .avi. For example:

./muxer filename.mp4

To record an .mp4,


./muxer anotherFilename.mkv

… etc, and hit escape key to stop recording.

The sound and video are perfectly synchronized, excepted some echos/noise appearing in some cases (mostly when the micro is too close from the laptop, sort of Larsen effect). Looks like Linux is not well suited with sound maybe. Last, I probably have a lot to do yet.

For the record, some of (probably lot of) obvious remaining issues:

  • the recorded sound level is too low (no idea why yet) ;
  • mono only (stereo produces some strangeness … needs further investigation)
  • the sound is not balanced, and a bit distorded ;
  • a strange echo appears from time to time (looks like pulseaudio on Linux or something like that) ;
  • some values are randomly choosen, work in progress to understand ;
  • remain a dark side: maybe some values are wrong, but I don’t have the knowledge, learning …

Thanks a lot in advance for any help or suggestion.

… and any help is welcome :slight_smile:

EDIT :
IMPORTANT : while recording, set the volume to 0 and the micro recording level to 60% or 70% (max) seems to give good results.

(UPDATE) recently, I did some progress:

  • removed playback causing the echo
  • modified the AVFrame fill in
  • my wecbam (Logitech C920) can now easely record synchronized audio + video files as.avi and .mkv work, .mp4 still broken (metallic sound ?), and the code is online.
  • using ALSA module
  • at launch, argv[1] contains the output file name, and once the extension is known, some parameters will changes (.mkv and .avi or .mp4 do not work the same way)

Now the sound recording seems to work far better, and I’d like to say a big THANK YOU to audacity owners : their software greatly helped me to analyse what happened with the sound !

The working code (Linux only, but should work on Windows too with some little changes). See : Sources/step7.1 · master · Eric Bachard / AudioRecord · GitLab

N.B. : if you want to test, the dependencies are OpenCV (+dev archives), ffmpeg 4.4.2 and SDL2. If all are solved, the build should be direct.

The most important changes concern get_audio_frame() because the sound was not correctly formated and had a DC offset. The workaround I found consists in two casts:

  • first, convert the current sample into int ;
  • then convert it to a float + adding an offset of -1 ;
  • last, cast everything into int + multiply by the volume level ;
  • once doen, create pseudo-stereo filling in left and right with the same value (see below).
      int current_read_pos = read_pos;

    int16_t *q;
    if (!strcmp(extension, "mp4"))
        q = (int16_t*)frame->extended_data[0];
    else
        q = (int16_t*)frame->data[0];

    for (int j = 0; j < frame->nb_samples; j++)
    {
        int audioBuffer2 = (int)audioBuffer[current_read_pos + j];
        float audioBuffer3 = (float)audioBuffer2/128.0f - 1.0f ;

        for (int k = 0; k < ost->enc->channels; k++)
        {
            q[2*j + k] = (int)(audioBuffer3 * VOLUME_AMPLIFICATION_FACTOR);
        }
        ost->t += ost->tincr;
    }

    frame->pts     = ost->next_pts;
    ost->next_pts += frame->nb_samples;

Remaining issues:

  • if the buffer size is too big, latency occurs
  • crackling and glitches appear, because of the ring buffer. Maybe replace the current ring buffer with a fifo (queue) buffer to avoid such issue?
  • .mp4 can be recorded (video is fine), but the sound is still broken : we can hear sort of metallic sound. (probably samples need a special treatment ?)

Thanks in advance for any suggestion or any help !

To be continued :smile:

[Update]

It’s not perfect, but I did some progress with sound recording:

  • video and audio are now perfectly synchronized (I can’t detect anything when SDL_DequeueAudio() is used in step8) ;
  • crackling is gone (was samples. 128 is too low, 2048 eliminated the crackling, but was causing latency) ;
  • callback seems to work correctly now (first solution only, see below).

Now, I propose 2 solutions:

Next steps : improve the sound, optimize, create a class and integrate everything inside miniDart (see : Eric Bachard / miniDart · GitLab) (a lot of working code allowing to separately select the audio and video sources, should be added this year)

Of course, I perfectly know the way I create audio frame is hackish (and wrong), but I never found any serious documentation about how to proceed, and the code can certainly be improved.

Thanks in advance for ANY HELP or SUGGESTION to improve !


ericb