[Audio] [Enhancement] SDL_SetStreamSampleRate?

I’m currently in the process of integrating SDL_AudioStream into FACT, a reimplementation of XACT that I’ve been working on, and it’s been incredibly nice to work with! I was dreading having to deal with resampling and it’s worked incredibly well with some of the bizarre sample rates that XACT likes to spit out.

It works so well, in fact, that I’ve been toying with the idea of using it as a pitch shifting tool as well… it’s mostly there already; just set the source sample rate to the adjusted rate based on the requested pitch, and the output data is both resampled and pitch shifted in one pass!

The problem is that for my scenario I would be adjusting the source sample rate a LOT, not just once at load time. To explain a little better, here’s where we currently use SDL_AudioStream in FACT:

So what I’d like to do is add something like…

void SDL_SetStreamSampleRate(
    const SDL_AudioStream *stream,
    const int src_rate,
    const int dst_rate
);

By just tweaking a couple numbers I could get real-time pitch shifting without having to do a whole extra pass just for this one effect. The reason I set both src and dst despite only caring about src is because, looking internally, it seems these two actually do quite a lot with one another when calling SDL_NewAudioStream:

https://hg.libsdl.org/SDL/file/fbfdee28682d/src/audio/SDL_audiocvt.c#l1281

My concern is that there’s no cheap way to actually adjust the rate of the stream after it’s been made, which would be a shame since freeing/reallocating an entire stream for every wave pitch adjustment sounds awful enough on paper…

Any thoughts on this? I’d be more than happy to implement and test ideas, I just wanted to see if there was any interest in something like this before I started hacking up audiocvt.c to bits.

There are a few challenges here, in no particular order:

  • We need to keep a buffer of unresampled data, because resampling needs some padding to not introduce artifacts. Changing the src_rate means we have some data buffered that hasn’t been resampled yet, and is at neither the new src-rate or the dst_rate.
  • If the dst_rate changes, we might need more or less padding, which means we would have to tapdance to make sure the padding fills in or drains to the right level before operations can continue as usual, and we might have to realloc the padding buffers.
  • Data we’ve already resampled (which happens during SDL_AudioStreamPut()), will be queued already but at the wrong sample rate if we change it before SDL_AudioStreamGet() is called to drain it.

I’m not saying it’s impossible but it would be a big mess, doubly so if this is something you’d likely be adjusting every frame.

Maybe pitch-shift in the frequency domain instead of the time domain?