Extract and manipulate the amplitude of every sample in an audio file


#1

Hi everyone,

I’m creating an audio player using C++ and SDL in Visual Studio.
I’ve been following an example provided by armornick to cover the basics of loading a WAV file and creating a callback function:

So I can play an audio file fine, but what I’m interested in now is being able to draw a waveform of that audio file. I’m very new to SDL, and most of my audio programming has been with Pure Data so thre’s been a lot to learn.

My thought process is if I can collect the amplitude of each sample into an array and use graphics to render it. It would also be useful in creating some DSP functions if I could have direct access to each sample.

The confusion I’m having now is that the audio is supposedly 16-bit, yet only 8-bit values are being passed from the buffer to the stream. Does the callback concatenate the two bytes to make it a single 16-bit value before passing it to the sound card? To collect an accurate reading of the sample’s amplitude in bytes, would I need to do this process within the callback function and then send it to an array?

For another example, say I wanted to have the audio play at half the amplitude. My thinking is I would concatenate each byte in the callback with it’s subsequent byte to make it a single unsigned 16-bit, convert it to a signed 16-bit, divide that value, and split it back into two separate bytes before passing it through to the sound card. My assumption is that every sample’s amplitude is stored in two bytes which are then combined.

Or perhaps I’ve missed something entirely, or all my facts are muddled up. Any clarity on the subject would be greatly appreciated!

In particular It’d be helpful to know:
-How SDL creates a 16-bit value for each sample amplitude from the 8-bit values provided to the callback function
-What would be the best way of changing the amplitude of each sample

As well as anything else that may be helpful to know.

//some rough code for dividing the amplitude by 2:


Uint16 combinedSample;
long currentSample;

for (int s = 0; s <= streamLength; s += 2)
{
//concatenate every other byte with the subsequent one to produce a 16-bit binary
	combinedSample = audio_pos[s+1] | (audio_pos[s] << 8); 
//essentially convert it to unsigned, making the range from -32768 to 32767
	currentSample = combinedSample - 32768;
//divide while it is still signed so the waveform retains its shape
	currentSample /= 2;
//convert back to an unsigned 16 bit
	currentSample += 32768;
	currentSample = Uint16(currentSample);
	combinedSample = currentSample;

//split back into two separate bytes and assign the samples their new value /2
	audio_pos[s] = *((uint8_t*)&(combinedSample)+1);
	audio_pos[s + 1] = *((uint8_t*)&(combinedSample)+0);
}

I’ve put this in the callback function, trying to change the amplitude of a short snippet of speech. The speech is still discernible, indicating not everything is ruined, but it becomes excessively distorted so I’m doing something wrong. The first part of this function seems suitable for simply gaining the amplitude of the sample.

I’ve also tried using this function in the main, applying it to the entire file, before the callback is ever even called but the result is the same.

Thanks in advance for any help.


#2

It depends on the specs of your file. When you load a file, for instance with https://wiki.libsdl.org/SDL_LoadWAV_RW, the variable spec will tell you the format of the audio. See https://wiki.libsdl.org/SDL_AudioSpec#Remarks
Even for 16bits audio you have several possibilities: signed, unsigned, little/big endian.

Knowing this is crucial for converting two consecutive bytes into the correct sound value.

An additional difficulty is that the format of your mixer is fixed, so if you want to play two files with the same mixer, you have to make sure they have the same format, or convert them. The doc is not terribly clear on this, see https://wiki.libsdl.org/SDL_OpenAudioDevice#device


#3

Hi, thanks for the reply.

I’ve specified thewav_spec.format = AUDIO_S16LSB, or signed 16 bit, little endian. When leaving it to its default setting, outputting wav_spec.format printed 32768 which also seems to indicate a signed 16-bit audio format.

I’ve now included Uint16 *stream16 = (Uint16 *)stream; in my callback function so that the stream buffer being sent to the sound card is in an unsigned 16-bit format (the SDL_memcpy function doesn’t appear to take signed values) although doing it this way appears to do the same thing as concatenating the 2 bytes, and the sound file plays without issue.

The problem now is, how do I manipulate this figure so that the file plays back at half the amplitude?
I’ve tried something like this:

for (int s = 0; s <= streamLength; s++)
{
	long currentSample = stream16[s]; //copy the sample to a signed 32-bit figure, large enough to convert an unsigned 16-bit to signed
	currentSample -= 32768; //replicate the range of a signed 16-bit figure
	currentSample /= 2; //divide this figure by 2
	currentSample += 32768; //return the figure to the format of an unsigned 16-bit figure
	stream16[s] = currentSample; //assign the newly calculated value to the stream data to be passed to the sound card
}

But I get the same distortion as if I simply divided each value by 2 as I did in the first place. What am I missing here?

Apologies about how noob-ish this probably sounds.

Thanks for any help.


#4

I may be wrong but I think that the callback can only deal with Uint8 buffers.

In your code you should check explicitely if wav_spec.format = AUDIO_S16LSB, after loading the sound file.

Then you can modify the volume in your callback function, having in mind that the buffer is Uint8: so you have to concatenate your bytes in the correct order.


#5

I have multiple checks now throughout the code that confirm that the format remains as AUDIO_S16LSB. I’ve tried considering the problem logically to no avail, and also had no luck through trial and error with regards to the order on concatenating bytes.

I think I might be even more confused, so I’ve run a few tests to try and better understand the problem before trying to do any byte-concatenation. Some examples:

1.)If I do a for loop to go through every sample in the stream buffer, and set it to 0, I get silence for the duration of the audio file, as expected.

2.) If I set every other sample to 0 (stream[0], stream[2],stream[4] etc.), the file still plays back as it did unmodified

3.) If I repeat the above, only offsetting which sample is set to 0 by 1 (stream[1], stream[3], stream[5] etc.) then I get silence again. Tests 2 and 3 almost seem to indicate that half the data in the buffer (stream[even]) has no effect on the playback of the audio file

4.) Setting every other 2 samples to 0 (stream [0] and stream[1] = 0, stream[2] and stream[3] remain unchanged etc.) the audio will playback fine, but only through the right ear. This makes sense from reading the docs, audio is sent in a left, right order.

So why is it, if I have direct access to the amplitude of every sample in the stream buffer, can I not simply perform stream[i] /= 2 on every sample and have the audio playback at half the amplitude? It plays back louder than it did when unaltered and has all this horrible distortion. Is there something I’m missing?

Thanks again for all the help.


#6

Well, what you describe is perfectly consistent with what I just said above: the callback wants Uint8.
So, since you defined a Uint16 stream, clearly only the first byte is used. This explains 1,2,3.
4 is also logical, as you said.

So, again, my suggestion is: stick to a Uint8 stream, and compute values with stream[0] and stream[1].
I’m not a C programmer, so I’m not sure about the best way to convert to 16bits. If your target variable is defined as signed 16bits, it is enough to concatenate the bytes: stream[0] | (stream[1]<<8).

If your target variable is a regular int, you have to take care of the sign with 2’s-complement. Here is what I would use (I’m sure you can understand the language)

let bytes_to_value_s16le b1 b2 =
  let value = b1 lor (b2 lsl 8) in
  if value land 32768 <> 0 (* negative sign *)
  then -1 - value lxor 65535
  else value

#7

Are you sure it sounds exactly the same? Since your audio data is supposedly 16 bits, you will have zeroed the LS 8-bits of every sample; the expected result would be an increase in quantising noise (a kind of ‘graininess’ in the reproduced sound).

I would have expected the result to be very quiet and noiselike (since you have discarded the MS 8-bits of the audio data) but not completely silent.

If zeroing the LS byte really is making no difference at all, and zeroing the MS byte results in total silence, it would suggest that your source audio data is actually only 8-bits but has been padded to 16-bit samples. You could dump the WAV file in hex to confirm that.


#8

Casting the stream to 16-bits should be more efficient, unless the compiler notices what your code is doing and achieves the same result as would a cast.


#9

You’re right rtrussell. I tried again with some studio-grade reference headphones and a dedicated headphone amp. Zeroing the LS-8 bits introduces a background noise, similar to that of a recording that was done inn untreated room.

Zeroing the MS 8-bits resulted in what sounded like just this background noise. Just to double check, having every sample set to 0 did result in complete silence, no added noise.

I don’t think I actually need to cast the stream to 16-bit as the results are the same either way - it appears specifying the format upon loading the WAV is sufficient (the above tests were done without casting to 16-bit).

To ensure that there was no padding (as previously I was using a free sound effect I’d found online), I replaced the sound I was using with one I a had recorded myself through a 24-bit 192kHz audio interface, and exported it from my digital audio workstation at 16-bit, 44.1kHz but the same issue still persists.

Based on the advice given here, I included the following in my callback function:

for (int s = 0; s <= streamLength; s += 2)
	{
		Uint16 currentSample = stream[s] | (stream[s + 1] << 8);
		//currentSample *= 0.5;
		stream[s] = *((uint8_t*)&(currentSample)+0);
		stream[s + 1] = *((uint8_t*)&(currentSample)+1);
	}

When the sample multiplication is commented out, playback works as expected, no noise, no graininess, no distortion, which seems to indicate that the two bytes are being concatenated, and then being split back from 16-bit to the correct bytes. However, trying to divide this figure by 2 results in the same distorted signal, louder than it was when unchanged. That last part is the biggest mystery to me; if the numerical values indicating the amplitude of the audio are being reduced by a significant magnitude, how could the sound be louder?

Perhaps if anyone has any knowledge of how functions such as SDL_MIX_MAXVOLUME or those found in the SDL_Mixer library control volume (as the docs don’t particularly explain it) I could try implementing those methods instead?

Thanks everyone for the help.


#10

Hi Tewii,

did you try my previous suggestion to take care of the sign before dividing by 2?
(and, then after this, convert back of course)

Again, I’m not a C programmer, so there must be some better ways to converts Unsigned16 to Signed16, but I suppose you get the idea.

I will be very busy the next days, so I hope rtrussel can help you with this. He’s much more knowledgeable that I am anyways.


#11

I think unsigned 16-bit audio is unusual (it implies that ‘silence’ is the value 0x8000 rather than 0x0000) but if that’s really what you have then converting it to signed is simply a case of inverting the MSB. That can be done using the exclusive-or operator.

But the OP doesn’t need to do any conversion, his audio samples are signed to start with (AUDIO_S16LSB). The problem seems to be that he has declared his buffer in the callback function as containing unsigned samples (Uint16 *) which of course means that when dividing-by-two the wrong calculation is performed.

For example if the audio sample is 0xC000 (-16384) halving it should give the value 0xE000 (-8192) but as a result of declaring the data as unsigned it will produce 0x6000 (+24576) instead! Casting the stream to (Sint16 *) or (signed short*) should result in the division-by-two reducing the amplitude as expected:

for (int s = 0; s <= streamLength; s += 2)
  	{
    		Sint16 currentSample = stream[s] | (stream[s + 1] << 8);
    		currentSample *= 0.5;
    		stream[s] = *((uint8_t*)&(currentSample)+0);
    		stream[s + 1] = *((uint8_t*)&(currentSample)+1);
    	}

But I still think creating a 16-bit value from two 8-bit values is an unnecessary overhead (unless the compiler optimises it out) because you could declare it as an array of signed 16-bit samples in the first place and simply access it as stream[s]:

Sint16 *sstream = (Sint16 *) stream ;
for (int s = 0; s < streamLength / 2; s++)
    sstream[s] /= 2 ;

#12

Hello, sorry its been a few days.
Thank you so much, this works great! Both methods work as intended, so if the latter one is more efficient (which, from some basic tests, it seems to be) I’ll opt for that one.

This has all been very informative. It’s reassuring to know that I had the general concept correct - now I just need to brush up on some of my computer science and C knowledge!

Thanks again to both of you.