OT: How do you transpose the pitch of a WAV/sample?

Andreas_Schiffler · May 6, 2011, 4:59am

And there is the “little brother” libresample (no pitch shifting):
Digital Audio Resampling Home Page (LGPL)On 5/5/11 5:05 PM, Patrick Baggett wrote:

libsamplerate does sample rate conversion if you don’t mind GPL license.

Patrick

On Thu, May 5, 2011 at 5:09 PM, SparkyNZ <pj74 at xtra.co.nz <mailto:pj74 at xtra.co.nz>> wrote:
Rainer Deyke wrote: 	


Wouldn't it make more sense to iterate over the indexes to newsample?
	



I wouldn't have though so - the resulting sample length would be
smaller than the original sample length in some cases, but longer
in others - depending whether the sample is to be transposed up or
down. Makes sense to use the original length to me. Yeah?

_______________________________________________
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SparkyNZ · May 6, 2011, 9:46am

OK, well the first algorithm suggested isn’t working for me so I’ll try one of the other ones. Here’s what I knocked up with the first suggestion - it results in a pulsation waveform of higher amplitude than my starting triangular wave - so there’s either a bug or its wrong.

Code:
short abWaveC[] <= Original triangle wave stored in here. Playing this sample data produces a nice triangle wave, no problem
short abWave <= This is where the transposed triangle wave should be stored

float Interpolate( float f, float a, float b, float fRatio )
{
// interpolate(a,b,r) does something like return (ai+b(1-i))/2.
return ( a * f + b * ( 1 - f ) ) / 2;
}

void Transpose( void )
{
memcpy( abWaveC, abWave, NUM_SAMPLES * sizeof( short ) );
memset( abWave, 0, NUM_SAMPLES * sizeof( short ) );

float freqC = 523.25F;
float freqD = 587.33F;
float fRatio = freqD / freqC;

int k = 0;

for( float f = 0; f < NUM_SAMPLES; f += fRatio )
{
abWave[ k ++ ] = Interpolate( f, abWaveC[ (int) f ], abWaveC[ (int) f + 1 ], f - (float) floor( f ) );
}

g_NewWaveSize = k;
}

I’ll go try one of the other suggestions now…

SparkyNZ · May 6, 2011, 9:59am

Rainer Deyke wrote:> On 5/5/2011 16:09, SparkyNZ wrote:

The opposite - iterating over the original samples - is also possible,
but it would look completely different from the code to which I was
responding. Something like this:

/* new_samples must be initialized to zeros. */
float ratio = (float) org_samples_length / new_samples_length;
for (int i = 0; i < org_samples_length; ++i) {
float new_pos = i / ratio;
float fractional = new_pos - floor(new_pos);
new_samples[(int) new_pos] += org_samples[i] * (1.0 - fractional) * ratio;
new_samples[(int) new_pos + 1] += org_samples[i] * fractional * ratio;
}

Hi Rainer… Using your code above, I tried this:

void Transpose( void )
{
memcpy( abWaveC, abWave, NUM_SAMPLES * sizeof( short ) );
memset( abWave, 0, NUM_SAMPLES * sizeof( short ) );

/* new_samples must be initialized to zeros. */
//float ratio = (float) org_samples_length / new_samples_length;
float freqC = 523.25F;
float freqD = 587.33F;
float fRatio = freqC / freqD;

for( int i = 0; i < NUM_SAMPLES; ++ i )
{
float new_pos = i / fRatio;
float fractional = new_pos - floor( new_pos );

abWave[(int) new_pos     ] += abWaveC[i] * (1.0 - fractional) * fRatio;
abWave[(int) new_pos + 1 ] += abWaveC[i] * fractional * fRatio;

}
}

…and this new_pos is bigger than NUM_SAMPLES (the original number of samples). Given that I want to transpose a wave from C to D, it should occupy fewer samples, not more. For the sake of this test I increased the size of the resulting wave array but I can’t hear anything in the playback. Do you have any idea whats wrong?

Now on to the next algorithm… Looks a lot bigger… lets see if that works

Gato · May 6, 2011, 9:49am

Without having tested the code, I’d change the following:

No need to divide by 2, swap “a”, and “b”, and also, you need to use
fRatio as the interpolating parameter, not “f”. Try:

float Interpolate( float f, float a, float b, float fRatio )
{
   // interpolate(a,b,r) does something like return (b*i+a*(1-i)).
   return ( b * fRatio + a * ( 1 - fRatio ) );
}

or even better::

float Interpolate(float a, float b, float fRatio )
{
   return (b-a)*fRatio+a;
}

and call it like so (without the first parameter, which isn’t used):

abWave[ k ++ ] = Interpolate( abWaveC[ (int) f ], abWaveC[ (int) f +
1 ],  f - (float) floor( f ) );

-GatoOn 05/06/2011 04:46 AM, SparkyNZ wrote:

OK, well the first algorithm suggested isn’t working for me so I’ll
try one of the other ones. Here’s what I knocked up with the first
suggestion - it results in a pulsation waveform of higher amplitude
than my starting triangular wave - so there’s either a bug or its wrong.

Code:

short abWaveC <= Original triangle wave stored in here. Playing this
sample data produces a nice triangle wave, no problem
short abWave <= This is where the transposed triangle wave should be
stored

float Interpolate( float f, float a, float b, float fRatio )
{
// interpolate(a,b,r) does something like return (ai+b(1-i))/2.
return ( a * f + b * ( 1 - f ) ) / 2;
}

void Transpose( void )
{
memcpy( abWaveC, abWave, NUM_SAMPLES * sizeof( short ) );
memset( abWave, 0, NUM_SAMPLES * sizeof( short ) );

float freqC = 523.25F;
float freqD = 587.33F;
float fRatio = freqD / freqC;

int k = 0;

for( float f = 0; f < NUM_SAMPLES; f += fRatio )
{
abWave[ k ++ ] = Interpolate( f, abWaveC[ (int) f ], abWaveC[
(int) f + 1 ], f - (float) floor( f ) );
}

g_NewWaveSize = k;
}

I’ll go try one of the other suggestions now…

SparkyNZ · May 6, 2011, 10:09am

Kenneth Bull wrote:> On 5 May 2011 22:52, Kenneth Bull wrote:

struct SampleSet
{
sample *data; ///< The samples.
unsigned count; ///< The number of samples in \a data.
unsigned capacity; ///< The maximum number of samples \a data can hold.
double rate; ///< The sample rate in samples / second.
};

template T min(T x, T y) { return (x>y)? y: x; }
template T max(T x, T y) { return (y>x)? y: x; }

…
if (out.rate == in.rate)
{
out.count = max(out.capacity, in.count);
memcpy( out.data, in.data, out.count * sizeof( sample ) );
return;
}

Hi Ken. I’m not sure this is what I’m after either. The comment for rate states “samples per second”. For my code, my samples per second is 44100 samples per second always. The ingoing rate will always be the same as the required outgoing rate, so its just going to copy the ingoing samples and bail.

SparkyNZ · May 6, 2011, 10:19am

[quote=“Gato”]
Without having tested the code, I’d change the following:

No need to divide by 2, swap “a”, and “b”, and also, you need to use fRatio as the interpolating parameter, not “f”. Try:

float Interpolate( float f, float a, float b, float fRatio )
{
// interpolate(a,b,r) does something like return (bi+a(1-i)).
return ( b * fRatio + a * ( 1 - fRatio ) );
}

Well done Gato! This combined with the first algorithm does the trick and it works. Whats more, the first algorithm makes sense to me. I’ve only tried the first interpolation suggestion but that works a treat! Are the other two suggestion merely optimisations or are they supposed to yield better results?

SparkyNZ · May 6, 2011, 10:37am

OK, I have uploaded the VC++ 6 project here: http://www.mediafire.com/?444665xa4efn8sz

The .exe is also included if anybody would just like to run it to hear the slight pitch increase that I refer to.

Any ideas what could be wrong?

Thanks for all your help this far, everyone too!

[/url]

Patrick_Baggett · May 6, 2011, 12:03pm

My understanding is that you’re trying to increase the pitch:

float freqC = 523.25F;
float freqD = 587.33F;
float fRatio = freqD / freqC;

fRatio is approx 1.1225, or about 12% increase. Let’s say you have 100
samples points representing some amount of sound playing for a time ‘T’. If
you add (fRatio) > 1.0 to the loop counter, you move across the source wave
faster. You’ll end up with less than 100 points. In effect, you’ve
compressed the wave into a smaller time (T / fRatio), and thus the pitch
should rise. That isn’t any surprise if you’ve ever heard human voices sped
up (usually fRatio = 2.0) and you hear the ‘chipmunk’ voices that speak
twice as fast (T / 2.0 = half as much time, so they speak twice as fast). If
fRatio is < 1.0 in the original example, you end up with more than 100
points [because you’re doing a for(f = 0 to 100), but the loop counter is
incremented by less than 1 each time] – in effect, you’ve extended the time
it takes to play those same samples. This makes sense because you get T’ = T
/ fRatio, but since fRatio < 1.0 means that T’ is actually larger than T.
The pitch should be lower. It’s really obvious when you play a human voice
back at 1/2 speed – everyone sounds like ogres.

I hope that helps.

PatrickOn Fri, May 6, 2011 at 5:37 AM, SparkyNZ wrote:

OK, I have uploaded the VC++ 6 project here:
http://www.mediafire.com/?444665xa4efn8sz

The .exe is also included if anybody would just like to run it to hear the
slight pitch increase that I refer to.

Any ideas what could be wrong?

Thanks for all your help this far, everyone too!

[/url]

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Andreas_Schiffler · May 6, 2011, 1:11pm

One more resource I found … seems like you might need some heavy duty
math to get the job done right.
http://www.dspdimension.com/admin/time-pitch-overview/On 5/6/11 5:03 AM, Patrick Baggett wrote:

My understanding is that you’re trying to increase the pitch:

float freqC = 523.25F;
float freqD = 587.33F;
float fRatio = freqD / freqC;

fRatio is approx 1.1225, or about 12% increase. Let’s say you have 100
samples points representing some amount of sound playing for a time
‘T’. If you add (fRatio) > 1.0 to the loop counter, you move across
the source wave faster. You’ll end up with less than 100 points. In
effect, you’ve compressed the wave into a smaller time (T / fRatio),
and thus the pitch should rise. That isn’t any surprise if you’ve ever
heard human voices sped up (usually fRatio = 2.0) and you hear the
‘chipmunk’ voices that speak twice as fast (T / 2.0 = half as much
time, so they speak twice as fast). If fRatio is < 1.0 in the original
example, you end up with more than 100 points [because you’re doing a
for(f = 0 to 100), but the loop counter is incremented by less than 1
each time] – in effect, you’ve extended the time it takes to play
those same samples. This makes sense because you get T’ = T / fRatio,
but since fRatio < 1.0 means that T’ is actually larger than T. The
pitch should be lower. It’s really obvious when you play a human voice
back at 1/2 speed – everyone sounds like ogres.

I hope that helps.

Patrick

On Fri, May 6, 2011 at 5:37 AM, SparkyNZ <pj74 at xtra.co.nz <mailto:pj74 at xtra.co.nz>> wrote:
OK, I have uploaded the VC++ 6 project here:
http://www.mediafire.com/?444665xa4efn8sz

The .exe is also included if anybody would just like to run it to
hear the slight pitch increase that I refer to.

Any ideas what could be wrong?

Thanks for all your help this far, everyone too!

[/url]

_______________________________________________
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Kenneth_Bull · May 6, 2011, 4:13pm

You’d use it like this:

out.rate = in.rate;
in.rate *= 523.25 / 587.33;
resample(&out, &in);

You’d do this for sample rate conversion:

out.rate = 22050.0;
resample(&out, &in);

… assuming you’ve already allocated space in out and setup in of course.

All the same, no guarantee that’ll work. It’s untested, and I’ve
probably got a bug in there somewhere.On 6 May 2011 06:09, SparkyNZ wrote:

Hi Ken. I’m not sure this is what I’m after either. The comment for rate states “samples per second”. For my code, my samples per second is 44100 samples per second always. The ingoing rate will always be the same as the required outgoing rate, so its just going to copy the ingoing samples and bail.

Kenneth_Bull · May 6, 2011, 4:54pm

Here’s another one:

struct SampleSet {
sample* data; ///< The samples.
unsigned count; ///< The number of samples in \a data.
unsigned capacity; ///< The maximum number of samples \a data can hold.
double rate; ///< The sample rate in samples / second.
};

template T min(T x, T y) { return (x>y)? y: x; }
template T max(T x, T y) { return (y>x)? y: x; }

sample interpolate(
const sample* in,
double start,
double end
) {
sample out = 0;
double range = end - start;
double scale = 1.0 / range;
int c = (int) start;
int n = (int) end;
if (c == n) {
out = in[c] * scale;
}
else {
out = in[c++] * (c - start) * scale;
while (c < n) {
out += in[c++] * scale;
}
out += in[c] * (end - c) * scale;
}
return out;
}

int resample(
SampleSet* out,
const SampleSet* in,
sample (interpolate)(const sample, double, double)
) {
if (out.rate == in.rate) {
out.count = max(out.capacity, in.count);
memcpy(out.data, in.data, out.count*sizeof(sample));
return;
}
double ratio = out.rate / in.rate;
double ic = 0.0, ic2 = ratio;
unsigned oc = 0;
while (oc < out.capacity) {
if (ic2 >= in.count) {
out[oc] = interpolate(in, ic, (double) in.count) * ((double)
in.count - ic) / ratio;
out.count = oc + 1;
return 0;
}
out[oc] = interpolate(in, ic, ic2);
ic = ic2;
ic2 += ratio;
++oc;
}
out.count = oc;
return 1;
}On 6 May 2011 12:13, Kenneth Bull <@Kenneth_Bull> wrote:

All the same, no guarantee that’ll work. ?It’s untested, and I’ve
probably got a bug in there somewhere.

Kenneth_Bull · May 6, 2011, 4:56pm

Both of those should be returning 0 after the out.rate == in.rate bit.On 6 May 2011 12:54, Kenneth Bull <@Kenneth_Bull> wrote:

On 6 May 2011 12:13, Kenneth Bull <@Kenneth_Bull> wrote:

All the same, no guarantee that’ll work. ?It’s untested, and I’ve
probably got a bug in there somewhere.

Here’s another one:

SparkyNZ · May 7, 2011, 1:11am

Kenneth Bull wrote:> On 6 May 2011 12:54, Kenneth Bull wrote:

Thanks again Ken - I’ll give it a try later when the kids go to bed. I hope it works. I didn’t think - I have the source code for MilkyTracker - I should have had a look to see how they do it for the portamento commands. I’ll let you know how your code works later on too.

SparkyNZ · May 7, 2011, 4:03am

Kenneth, that last version you sent also works. I can’t tell if there are any glitches in it like the smaller algorithm as it reduces the volume of my original C wave for some reason. I can tell that it is definitely transposed up, playing clean from what I can hear but it is soooo quiet even with the volume turned right up that I’ll have to investigate why first.

SparkyNZ · May 7, 2011, 4:51am

Ok everyone. Here is the algorithm that works. The reason the changes suggested by Adrien/Gato weren’t working was due to the precision of the floating point math. I changed the float variables to doubles and the transpose function now works perfectly!

Thank you for your help everyone - I hope the code below comes in handy for somebody else in the future…

Code:
//--------------------------------------------------------------------------------------------
// Interpolate()
//--------------------------------------------------------------------------------------------
double Interpolate( double a, double b, double fRatio )
{
return ( b - a ) * fRatio + a;
}

//--------------------------------------------------------------------------------------------
// Transpose()
//--------------------------------------------------------------------------------------------
void Transpose( void )
{
memcpy( abWaveC, abWave, NUM_SAMPLES * sizeof( short ) );
memset( abWave, 0, NUM_SAMPLES * sizeof( short ) );

double freqC = 523.25F;
double freqD = 587.33F;
double fRatio = freqD / freqC;

int k = 0;

for( double f = 0; f < NUM_SAMPLES; f += fRatio )
{
abWave[ k ++ ] = Interpolate( abWaveC[ (int) f ], abWaveC[ (int) f + 1 ], f - (float) floor( f ) );
}

g_NewWaveSize = k;
}

SparkyNZ · May 7, 2011, 5:07am

Now that I got transposition sorted, I’ve been able to amuse my son with some up and down pitch sliding using the same code…

//--------------------------------------------------------------------------------------------
// Slide()
//--------------------------------------------------------------------------------------------
void Slide( void )
{
memcpy( abWaveC, abWave, NUM_SAMPLES * sizeof( short ) );
memset( abWave, 0, NUM_SAMPLES * sizeof( short ) );

double freqC = 523.25F;
double freqD = 587.33F;
double fRatio = 1.0;
double fRatioEnd = freqD / freqC;
double freqAmount = freqD - freqC;
double fRatioInc = 0.00004;

int k = 0;

for( double f = 0; f < NUM_SAMPLES; f += fRatio )
{
abWave[ k ++ ] = Interpolate( abWaveC[ (int) f ], abWaveC[ (int) f + 1 ], f - (float) floor( f ) );

fRatio += fRatioInc;

// PDS: Police siren here we come!! :-)
if( k % 10000 == 0 )
  fRatioInc =- fRatioInc;

}

g_NewWaveSize = k;
}

Rainer_Deyke · May 7, 2011, 6:28pm

Don’t do that! I only posted that code for comparison, not for actual
use! You should iterate over the transposed samples, not the original
samples. Iterating over the transposed samples means that there are
fewer corner cases to worry about. Iterating over the transposed
samples leads to cleaner, simpler, and hence more robust code.On 5/6/2011 03:59, SparkyNZ wrote:

/* new_samples must be initialized to zeros. */ float ratio =
(float) org_samples_length / new_samples_length; for (int i = 0; i
< org_samples_length; ++i) { float new_pos = i / ratio; float
fractional = new_pos - floor(new_pos); new_samples[(int) new_pos]
+= org_samples[i] * (1.0 - fractional) * ratio; new_samples[(int)
new_pos + 1] += org_samples[i] * fractional * ratio; }

Hi Rainer… Using your code above, I tried this:

–
Rainer Deyke - rainerd at eldwood.com