Why is the SDL_MixAudio volume 128?

… since the actual range of audio data is different? It seems pointless.
And it would be worth specifying in the manual what loudness is, e.g. source, goal, or everything … and what is the source, what is the goal … A lot of time and work is wasted deciphering instructions…

It’s a gain control: 128 is a gain of 1.0 (0 dB), 64 is a gain of 0.5 (-6 dB), 32 is a gain of 0.25 (-12 dB) etc. It’s as simple as that.

0.5 is -3dB and 0.25 is -6dB

No, my figures were correct, -3dB is (approximately) half the power, not half the amplitude.

I should know, I worked for BBC Research & Development for 33 years!

1 Like

Actually turns out there’s no standard in how gain is measured, so it happens that where I used to work meant power, but where you did it means amplitude.

We are here talking about the volume parameter of the SDL_MixAudio() function. That controls the amplitude of the audio (128 is full amplitude, 64 is half-amplitude etc.). There is no ambiguity about what ‘gain’ means in this case, and the approximate dB equivalents are as I listed.

But since the max. level is usually 32766, this value seems natural. Where did those 128s come from??

16-bit audio might be most common but it seems like SDL supports 8-bit and 32-bit audio formats as well. 128 is the largest value that can be stored in a signed 8-bit integer (using the negative range). I don’t know but my guess is that it probably has something to do with that.

1 Like

That’s my guess too.

Fascinating career Richard! I remember the old BBC electronic clock and the Acorn Electron was my favourite of my home computers as a kid, so thanks for your input into my own life!

2 Likes

Also, why use negative numbers? The diaphragm of the speaker swings from 0 to x… How does the mixer work well with negative numbers??

If you’re talking about the sample values themselves, it’s because sound is vibration in a medium, aka a wave, and to correctly represent it digitally you need negative values.

Driving the speaker is another matter.

edit: if you’re talking about volume in SDL_MixAudio(), according to the documentation the valid value range is 0-128. No negative numbers.

I’m talking about normal values.

So the AUDIO_U16 format is incorrect?

For AUDIO_S16:
If I play 2 sounds out of phase - after mixing there will be silence.
If I play these sounds on separate channels - I will hear both.
Should it be like that ?

AUDIO_U16 presumably just shifts everything into positive range, so silence is 32767 instead of 0.

And yes, IIRC if you have two identical waveforms played 90 (180?) degrees out of phase on the same channel they cancel each other out.

Pedantically silence is 32768, but the difference is insignificant.

Why on the same channel should they cancel, and on separate channels add …?

silence is 32767 instead of 0

The sound that we hear come from the change in amplitude. If all sample values are the same we won’t hear anything no matter what that value is.

That said, for signed audio formats it makes sense to oscillate around 0 and approach that value when there is silence to avoid a “tick” sound when you start and stop playing, and to get the same range in both directions.

IIRC if you have two identical waveforms played 90 (180?) degrees out of phase on the same channel they cancel each other out.

Mixing two waves just means adding the amplitudes. If you have two sine waves were one is shifted half a period compared to the other then the amplitudes would add up to zero.

Example:

wave1:

 4       ####                              ####
 3    ###    ###                        ###    ###
 2  ##          ##                    ##          ## 
 1 #              #                  #              # 
 0#----------------#----------------#----------------#----------------#
-1                  #              #                  #              #
-2                   ##          ##                    ##          ##
-3                     ###    ###                        ###    ###
-4                        ####                              ####

wave2:

 4                        ####                              ####       
 3                     ###    ###                        ###    ###    
 2                   ##          ##                    ##          ##  
 1                  #              #                  #              # 
 0#----------------#----------------#----------------#----------------#
-1 #              #                  #              #
-2  ##          ##                    ##          ##
-3    ###    ###                        ###    ###
-4       ####                              ####

If you “mix” these two waves, by adding the sample values at each point in time, you will get silence.

wave1 + wave2:

 4                                                                      
 3                                                                      
 2                                                                     
 1                                                                     
 0####################################################################
-1                                                   
-2                                                  
-3                                                
-4

Mixing a wave with itself will result in the original wave but with double amplitude (volume).

Real world sound is usually much more complicated and contains a combination of many different frequencies (I believe any sound can be described as a combination of sine waves) so the sort of cancellation effects described above are unlikely to happen by chance.

Why on the same channel should they cancel, and on separate channels add …?

If you have two different channels, doesn’t that mean you are outputting to two different speaker? If you use headphones you will hear both channels, one on each ear.

If you listen to both speakers with one ear I still think you could get at least a partial cancellation effect if the distances between ear and speakers are right. The adding/cancellation stuff described above are not just arbitrary hardware limitations. It’s actually how sound works. Noise cancelling headphones take advantage of this.

I know that in theory it works like that, but how does it relate to reality… We usually don’t listen with only one ear…

If I use AudioU16, the waves don’t cancel, they just add. I repeat the question - is this incorrect?

What do we usually want to achieve by mixing sound? In real, the sounds do not come from one source, but many sounds from different sources, it seems that in practice this should be added rather than removed…

If I use AudioU16, the waves don’t cancel, they just add. I repeat the question - is this incorrect?

I have no experience with unsigned audio formats but it doesn’t necessarily sound wrong to me.

If we change my previous example to use unsigned/positive values…

wave1:

8       ####                              ####
7    ###    ###                        ###    ###
6  ##          ##                    ##          ## 
5 #              #                  #              # 
4#                #                #                #                #
3                  #              #                  #              #
2                   ##          ##                    ##          ##
1                     ###    ###                        ###    ###
0                        ####                              ####

wave2:

8                        ####                              ####       
7                     ###    ###                        ###    ###    
6                   ##          ##                    ##          ##  
5                  #              #                  #              # 
4#                #                #                #                #
3 #              #                  #              #
2  ##          ##                    ##          ##
1    ###    ###                        ###    ###
0       ####                              ####

… then adding them still results in all sample values being the same.

wave1 + wave2:

8####################################################################
7
6
5
4
3
2
1
0

This should mean silence. Is this not what you get?

EDIT: I looked at the code and SDL_MixAudio doesn’t seem to just add the unsigned sample values. It makes some adjustments to keep it centered around the middle value like sjr and rtrussell discussed above. If it worked like this in my unsigned wave example, and if we assume that the range is from 0-8 instead of 0-65535, then the wave1 + wave2 would instead result in all samples having the middle value 4 which is still silence but a different value.

1 Like

You draw nicely :slight_smile: and it turns out that you are right … I have to think about it …