Setting volume while decoding AVI audio chunks

Hi all.

First off thanks a lot for this library. I’ve been playing around with SDL2 for a bit more than a year now and believe I am still far from having grasped all its concepts entirely.

The project I’m working on for now is the rewrite of a game, Bermuda Syndrome, which was published by Gregory Montoir a few years ago. I’m currently stuck at some point enhancing sound playing with the custom AVI player. The player works perfectly fine though; I have just one question about audio.

The game includes SDL 2.0.9 and SDL_Mixer 2.0.4. I have implemented a volume setting (from 1 to 128) on the command line and it works great with MIX chunks and MIX music. The custom AVI player has been completely re-written and I’d like to use SDL_QueueAudio to send the audio samples that were fetched from the AVI sound chunks. The thing is I’d like to set the volume according to the main volume parameter I have for the game.

I have checked MIX_xxx functions, which do have a volume parameter but those work on chunks and I perceive they play an audio chunk then stop and I need continuous buffer feed. That’s why I’ve used SDL_QueueAudio. Just that that function doesn’t allow changing the volume of the raw audio samples… unless I’m writing a callback, which is exactly what I’d like to avoid.

I’ve also checked SDL_MixAudioFormat() but it takes two buffers and I have only one: the one containing raw samples from the AVI audio chunk. Writing my own callback would require me to account for different audio formats… which is exactly what SDL does.

I may certainly have missed something so does anyone have some hints?

Thanks a lot in advance.

To use SDL_MixAudioFormat() as a simple volume control allocate a temporary ‘destination’ buffer, the same size as the one containing your raw samples, and initialise it with ‘silence’ (usually zero), for example using memset().

Thank you very much rtrussel for the hint. In my case samples are unsigned 8 bit integers. And from your response I’ve just understood the purpose of the “silence” member in struct SDL_AudioSpec :sunglasses: .

1 Like

Well, I’m not done yet. I don’t know what/if there’s something I did wrong but the volume doesn’t change at all when I use SDL_MixAudioFormat(). I’ve checked that my silence buffer is not empty, it’s large enough, it’s filled with the silence value, the silence value fits the 8-bit unsigned audio samples and the target buffer is writeable. I even set the volume value to zero and it’s still playing as loud as if I didn’t do anything. I’m using std::vector for the sound buffers.

Can that function reduce the volume? Or is it only for making the volume louder?

Okay, after some quick investigation, I finally found why SDL_MixAudioFormat() doesn’t change the volume of the raw samples. The reason is that function changes the volume of the samples in the source buffer and then mixes it with the destination buffer. In my code the source buffer is filled with the silence value (because I wanted to minimize the number of of byte copy/move loops), which results in no volume change at all.

However this is a bit annoying because it implies the following everytime an audio chunk is to be enqueued :

  1. Read the raw source chunk into a buffer (1 buffer iteration loop)
  2. Fill the destination buffer with silence (1 buffer iteration loop)
  3. Mix both buffers (1 buffer iteration loop)
  4. Enqueue the resulting buffer (1 buffer iteration loop)

This implies 4 buffer iteration loops in total, which is far from optimal. Since I’d like to not reinvent the wheel and as I’ve looked up in the source code, a better option is to mix the audio chunk with itself after dividing the audio volume settings by a factor 2. I’ll still have 3 buffer iteration loops but it’s the best I can do so far.

That said, it would be interesting, IMHO, if SDL2 provided ways to change the volume when enqueuing audio buffers to make the number of buffer copy/move operations as small as possible.

EDIT: And BTW, I believe it’s also best if the documentation reflected that observation. It currently says:

This takes an audio buffer src of len bytes of format data and mixes it into dst , performing addition, volume adjustment, and overflow clipping.

which falsely suggests volume adjustment takes place after mixing. It first adjusts the volume from the source buffer and then mixes it into the destination buffer, performing addition and overflow clipping.

Just my 2¢.

Personally I don’t think it does, I have always understood it to mean that it performs the operation destination = clip(destination + source * volume). In my earlier reply I said that you should “allocate a temporary ‘destination’ buffer … and initialise it with ‘silence’” which I hoped would make it clear.

I would advise against that, especially if you are using only 8-bit samples, because it will increase the quantising noise. Have you done any profiling to confirm that your worry about inefficiency is justified? If the buffer copies are done internally using memcpy() they probably take very little time compared with the gain adjustment and clipping.

Well, it made it clear that I needed some silence buffer :smile:. My destination buffer is already allocated so I probably translated your suggestion into “allocate some buffer initialized with silence”.

Can you develop? Is it because of the internal mix8[] array?

I may be splitting hairs but “worries” or “inefficient” is a bit exaggerated — “it could be better” doesn’t necessarily mean “it’s bad”, you know. Anyway why would I need a profiler to confirm the obvious: the less you move data around, the better? If I can (most of all: quite easily) spare one buffer copy over 4, winner-winner, chicken dinner, right?

As a matter of fact, since I tweaked the video player code it ran better on a Raspberry Pi 2 B+ otherwise it was stuttering like crazy. It still suffers from a bit of lag though and if I can avoid cheap buffer copies, well, you know… (I mean “cheap” as in “easily avoided”.)

Disclaimer: I’m not Australian. Just a “fan” of EEVblog Dave Jones.

I didn’t entirely understand your proposal, but it sounded as though, when no volume adjustment was required, it would be halving the sample values and then doubling them again. Obviously if you do that, with integers, the quantising noise increases because the LSB gets thown away (the 8-bit data becomes equivalent to 7-bit data).

If there is no undesirable side-effect, yes, but I haven’t so far understood how you can reduce the number of copies whilst achieving the same end result.

In fact it’s the other way around: halving the volume (a constant) because samples are would be doubled each (it appears SDL_MixAudioFormat() does absolutely not work like that). Of course this becomes useless when [little to] no volume adjustment is to be done, which is a case I can account for.

That said I now see what you mean about quantization noise. However this becomes true at low volume levels anyway.

For instance when copying samples from the file audio buffer to the destination buffer, taking care of volume adjustment, something like *dst++ = *src++ * volume. However I need a callback for that.

Anyway mixing a buffer with itself is totally pointless as the volume is only adjusted on the source buffer, not on the result so I’d get the initial samples at the initial volume level plus the same samples with adjusted volume level, therefore a louder result regardless. I indeed need to silence the destination buffer and constantly mix it with the raw source if I want to rely on SDL_MixAudioFormat(). It’s that filling with silence that could easily be avoided but then I’d have to use a callback and account for every possible audio format, as I said earlier.