Problem converting large WAV files

I’m using the code below to load and convert .wav files into my project’s native format (sample rate is proj->sample_rate, and proj->fmt is the SDL_AudioFormat.)

It works perfectly most of the time, but it seems to break when upsampling (44.1 → 48kHz) large wav files. The first part of the file will convert fine, but at some seemingly arbitrary time (12m42s in the main test file I’m using), the converted signal becomes extremely distorted. The desired signal is still discernible through the noise, but barely. Here’s what the boundary between the correctly-converted signal and distorted portion looks like:

Anyone know what might be causing this? I’d be suspicious of type overflow somewhere in my code, but can’t find an error. The sanity-check print statements show reasonable values.

    SDL_AudioSpec wav_spec;
    uint8_t* audio_buf = NULL;
    uint32_t audio_len_bytes = 0;
    if (!(SDL_LoadWAV(filename, &wav_spec, &audio_buf, &audio_len_bytes))) {
        fprintf(stderr, "Error loading wav %s: %s", filename, SDL_GetError());
        return NULL;
    }
    SDL_AudioCVT wav_cvt;
    int ret = SDL_BuildAudioCVT(&wav_cvt, wav_spec.format, wav_spec.channels, wav_spec.freq, proj->fmt, proj->channels, proj->sample_rate);
    if (ret < 0) {
        fprintf(stderr, "Error: unable to build SDL_AudioCVT. %s\n", SDL_GetError());
        return NULL;
    } else if (ret == 1) { // Needs conversion
        fprintf(stderr, "Converting. Len mult: %d\n", wav_cvt.len_mult);
	fprintf(stderr, "WAV specs: freq: %d, format: %s, channels: %d\n", wav_spec.freq, get_fmt_str(wav_spec.format), wav_spec.channels);
	fprintf(stderr, "dst specs: freq: %d, format %s, channels: %d\n", proj->sample_rate, get_fmt_str(proj->fmt), proj->channels);
	fprintf(stderr, "Len ratio: %f\n", wav_cvt.len_ratio);
        wav_cvt.needed = 1;
        wav_cvt.len = audio_len_bytes;
	size_t alloc_len = (size_t)audio_len_bytes * wav_cvt.len_mult;
	fprintf(stdout, "Alloc len: %lu\n", alloc_len);
        wav_cvt.buf = malloc(alloc_len);
	if (!wav_cvt.buf) {
	    fprintf(stderr, "ERROR: unable to allocate space for conversion buffer\n");
	    exit(1);
	}
        memcpy(wav_cvt.buf, audio_buf, audio_len_bytes);
        if (SDL_ConvertAudio(&wav_cvt) < 0) {
            fprintf(stderr, "Error: Unable to convert audio. %s\n", SDL_GetError());
            return NULL;
        }
        audio_len_bytes *= wav_cvt.len_ratio;
    } else if (ret == 0) { // No conversion needed
        fprintf(stderr, "No conversion needed. copying directly to track.\n");
        wav_cvt.buf = malloc(audio_len_bytes);
        memcpy(wav_cvt.buf, audio_buf, audio_len_bytes);
    } else {
        fprintf(stderr, "Error: unexpected return value for SDL_BuildAudioCVT.\n");
        return NULL;
    }

Output:

Converting. Len mult: 8
WAV specs: freq: 44100, format: AUDIO_S16LSB, channels: 2
dst specs: freq: 48000, format AUDIO_S16LSB, channels: 2
Len ratio: 1.088435
Alloc len: 6117130240

I think you might be overflowing your RAM into swap-files? If I’m reading that correctly, your initial file is allocating 6 GB of memory, and I see at least one memcpy() on this data. Do you have more than 12 GB of RAM on the system that you are using?
You could use a task manager to confirm the memory usage.

It does seems likely that swap files would be needed to satisfy the massive allocation request. But is that relevant? My understanding of the malloc family of functions is that they’ll supply an appropriately-sized block of virtual memory, and actual hardware allocations/address mapping are the domain of the OS. If malloc was unable to supply a block of continuous virtual memory, I would expect it fail. So swap memory slows things down, but shouldn’t break them this way.

My understanding is that for some operating systems the most recently used data will be given priority in RAM and that could even mean that currently running programs will have to access the hard-drive as if it were RAM. It’s not just affecting the playing of the audio file, but rather it is affecting your entire system.
(For example, try running your program while watching a video or using a different audio player to listen to music.)

Also this is not a great situation for the lifespan of your HDD/SSD.

Ok, but I still don’t see the relevance to the problem I’m encountering. In any case, loading a wav file >1hr in length is something I expect to happen only very rarely. I just want it to work on those rare occasions when it’s needed.

You need to split your data up into frames somehow. I don’t think SDL has that option in the base API.

Personally I would create my own wav loading function, it’s one of the easier specifications to learn, and I would then have full control of how much data to load and could implement my own file seek functionality.

You could use ffmpeg to split the file into smaller, more manageable chunks (I know this does not suit your situation).

Otherwise I think you are stuck with adding another library like SDL_Mixer or OpenAL. There are probably a lot of other libraries that can handle large WAV files on the fly.

It’s actually not SDL_LoadWAV that’s distorting the signal, it’s SDL_ConvertAudio. I should have tested and made that clear earlier; that’s my bad.

The way I see it either SDL_ConvertAudio has a length limit, or it does not. The documentation does not reference a limit, and I’m not going to assume one based on testing or hearsay alone. Additionally, if the length is causing a problem for that function, then I’d expect it to return an error code (or SDL_BuildAudioCVT should). You could be right that splitting the data into frames before converting will fix the issue, but how can I be confident that I’ve actually fixed the problem without knowing what it is? How do I decide what is an acceptable length for conversion? I might just do this and move on, but then we’re seemingly left with an SDL function that doesn’t behave as expected, and that sucks. Or, there’s still a bug somewhere else in my code that we haven’t spotted!

[Edit: ignore this post at the moment, I think I have the answer in the next post. This is still relevant as it indicates another boundary, but let’s fix the 12 minute issue first. See next post. ]

If we are looking at a file that’s too long, then let me redirect the suspicion back at SDL_LoadWAV().
Here’s the function prototype:

SDL_bool SDL_LoadWAV(const char *path, SDL_AudioSpec *spec, Uint8 **audio_buf, Uint32 *audio_len);

The problem as I see it is that audio_len is an unsigned int. The largest number that it can represent is 4,294,967,295 which means that any source that contains more than 4.3GB of uncompressed data will return inappropriate information about the length of the allocated data buffer.

At 44100hz, 2 channels and 16 bits per sample, my calculator says this audio file limit is about 6.7 hours.

Anything longer than 6.7 hours will report the wrong buffer length. Do you mind testing your original code on a file that is 2 hours long or so? Does that still glitch at 12 minutes, or does it play through without issue?

I think I have the answer!

You might see an indication by calling SDL_GetError() right after SDL_LoadWAV on a huge file.
If you look at the source, SDL_LoadWAV() calls “WaveLoad()”. In it there is this variable “Uint32 chunkcountlimit = 10000;”, which translates to a default file limit of about 80MB [about 15 minutes, give or take, since you are playing at 48000hz it makes sense that the time is shortened further, but that means the converted audio is playing faster than expected which is another issue you might want to test, sorry].

Here’s the snippet that I’m looking at:

    hint = SDL_GetHint(SDL_HINT_WAVE_CHUNK_LIMIT);
    if (hint) {
        unsigned int count;
        if (SDL_sscanf(hint, "%u", &count) == 1) {
            chunkcountlimit = count <= SDL_MAX_UINT32 ? count : SDL_MAX_UINT32;
        }
    }

The important bit:
Use SDL_SetHint to set SDL_HINT_WAVE_CHUNK_LIMIT up to a length of SDL_MAX_UINT32.

If it works how I expect it to, I think you could increase your maximum audio length to 4GB, or about 6.7 hours [See previous post for explanation of that limit].

There are a couple of other hints that may be worth investigating, I’m less certain about what they do:

SDL_HINT_WAVE_RIFF_CHUNK_SIZE
SDL_HINT_WAVE_TRUNCATION

Unfortunately no, that’s not it. As I mentioned above, the issue occurs in the conversion step, not in loading the wav file. I’ve confirmed this in two ways:

  1. Setting my program’s native sample rate to 44.1k to match the wav file I’m testing with. The entire wav loads correctly.
  2. Writing the data loaded by SDL_LoadWAV directly back to a new wav file (I have my own wav writer). The resulting file plays back as expected.

Also, a limit on the number of allowed chunks in a wav file does not limit the allowed length of the file; the size of a given chunk can vary. It is possible to have an hour long wav file consisting of on a single chunk.

OK, I thought I had it, sorry.
So, do you have a guess at the largest size of file are you able to play/convert without distortion?

After doing lots more testing and chasing a whole school of red herrings I finally found my error.

I was determining the length of the converted buffer by multiplying the original audio length by the len_ratio member of the SDL_AudioCVT struct, as suggested in the doc for that struct. What I should have been doing is using len_cvt member, as suggested in the doc for SDL_ConvertAudio. Because of floating point rounding error, the result of the multiplication is not quite correct. If the number of bytes it produces is odd, then the resulting 2-byte-width audio samples can get offset by a single byte up until another such offset occurs. The distortion I was seeing was therefore exactly the same as if I had been misreading big-endian samples as little-endian ones.

I should’ve caught that that multiplication was fishy, but this excerpt from the SDL_AudioCVT doc is also wrong, or at least needs qualification:

len_ratio is the length ratio of the converted data to the original data. When you have finished converting your audio data, you need to know how much of your audio buffer is valid. len*len_ratio is the size of the converted audio data in bytes.

I’d put in a PR myself, but I actually can’t find that page anywhere in the source code, so not sure how.

EDIT:
Ugh, this is all true but actually still doesn’t explain the original problem. The byte offset became the issue when I started splitting the audio into frames, and was getting intermittent periods of distorted (offset) audio.

I still need to figure out the original issue, but I will mention that len_cvt on the test audio, which is over an hour in length, is giving me only 12m42s-worth, even when I feed it the whole audio file.

1 Like

What would you suggest for the wording on len_ratio?

Are you running into overflow on len_cvt when feeding it the whole file?

Something like this, maybe? Calling out len_cvt in the remarks right before seems like a good idea.

len_cvt is the length of the converted data in bytes. After successful conversion, only len_cvt bytes of buf will be valid.

len_ratio is the length ratio of the converted data to the original data. len*len_ratio is therefore roughly the size of the converted audio data in bytes, but may not be accurate due to floating point rounding error; therefore len_cvt should be taken as the source of truth.

As for overflow, I’m not sure. It seems likely, but I don’t think it’s occurring in an obvious place. Or, I’m overlooking something obvious.

I was planning to step into SDL source to examine this and memory errors that are cropping up in SDL_ConvertAudio in similar circumstances, but I’m having trouble getting my program to run correctly with my debug build of SDL right now. I’m also on a hiking trip this week, but I intend to do some more digging when I have time!