Changing buffer size causes inconsistent audio output

I am using SDL 2.0.20 on Pop!_OS 22.04 (I really need to migrate to SDL3 when I get a chance)

I am trying to output a constant square wave to the audio device via SDL_QueueAudio, but I run into a very strange issue where instead of a constant stream, I get a series of short “blips” about a second apart.

#include <SDL2/SDL.h>
#include <SDL2/SDL_audio.h>

int32 main() {

	/* Initialization for window and input management */

	if (SDL_InitSubSystem(SDL_INIT_AUDIO | SDL_INIT_VIDEO) < 0) {
		std::cerr << "failed to initialize SDL audio, video, or events: " << SDL_GetError() << "\n";
		return -1;
	}

	/* SDL window creation */

	// Audio?
	int32 devNum = SDL_GetNumAudioDevices(0) - 1;
	const char* devName = SDL_GetAudioDeviceName(devNum, 0);

	SDL_AudioSpec specTarget;
	specTarget.format   = AUDIO_S32MSB; // signed, 32-bit, big-endian (?), integer format
	specTarget.freq     = 48000; // samples in Hz
	specTarget.channels =     2;

This is where I think the issue is originating from (described further below). I set the buffer size to 2 seconds, which to my understanding is a very large buffer. I have noticed that if I make this buffer smaller, the delay between the disconnected audio blips seems to also get smaller.

	specTarget.samples  = 48000; // buffer size in sample frames, 2 seconds?
	specTarget.callback =     0; // will not be using a callback function
	specTarget.userdata =     0; // ignored because no callback function

	SDL_AudioSpec spec;
	SDL_AudioDeviceID devID = SDL_OpenAudioDevice(devName, 0, &specTarget, &spec, 0);
	SDL_PauseAudioDevice(devID, 0);

	uint32 wavePhase = 0;
	uint32 waveFreqHz = 260;

	while (loopCont == true) {
		
		/* Input management and exit condition */

		/* Graphics output */

My approach to ensure that the audio is continuous was to check if there is any audio currently queued. If there is none, queue more square wave (arbitrarily set to queue 1/60th of a second’s worth of audio) then wait for it to drain again. However, when I run the program it seems that the queued audio isn’t actually draining to the device. SDL_GetQueuedAudioSize repeatedly reports 6400 bytes queued until about every second or so, where it suddenly reports 0 and then states that it is queueing audio. This happens at the same frequency of the audio blips, but not in sync with them.

		uint32 audioQueuedBytes = SDL_GetQueuedAudioSize(devID);
		std::cout << audioQueuedBytes << " bytes queued."; // DEBUG
		if (audioQueuedBytes == 0) {

			uint32 lenSamples = spec.freq * spec.channels / 60; // queue 1/60th seconds of audio?
			uint32 lenBytes = lenSamples * SDL_AUDIO_BITSIZE(spec.format) / 8;

			int32 audioData[lenSamples];
			for (uint32 i = 0; i < lenSamples; i++) {
				++wavePhase;
				if (wavePhase > waveFreqHz) wavePhase = 0;
				if (wavePhase <= waveFreqHz / 2) audioData[i] =  6000;
				if (wavePhase  > waveFreqHz / 2) audioData[i] = -6000;
			}
			std::cout << "Queueing audio!"; // DEBUG
			SDL_QueueAudio(devID, &audioData, lenBytes);
		}
		std::cout << "\n"; // DEBUG

	}

	/* Window destruction */
}

As mentioned above, when I reduce the buffer size, it also causes the audio blips to come more frequently, which I think means I have a fundamental misunderstanding of what the buffer means. I thought that the buffer is simply the block of memory where you can store audio while it is continuously drained to the device at a constant rate regardless of the buffer size. But it seems like the size of the buffer is actually delaying the output of audio, as if SDL is waiting for the duration of the buffer to pass so it can output the audio I queued.

I also checked the SDL3 SDL_AudioSpec wiki page and noticed that it doesn’t have a buffer member at all! I’m not sure what that implies about the significance of the buffer in SDL2.

Here is an example for SDL3 that produces a ~800 hz sine tone, hope that helps.
You should use at least a small audio buffer.
if (audioQueuedBytes == 0) { … is also not optimal. If this is the case SDL has no more audio to play..
Warning: example with no event handling.

int main(int argc, char *argv[]) {
    SDL_AudioSpec AudioSpec;
    SDL_AudioStream *pAudioStream;
    short value;
    short audio_buf[2000];  // 1000 samples / 2 channels
    float i = 0;
    int z;

    if (SDL_Init(SDL_INIT_AUDIO)) {
        SDL_Log("Hello SDL3");

        AudioSpec.freq = 44100;
        AudioSpec.format = SDL_AUDIO_S16LE;
        AudioSpec.channels = 2;
        pAudioStream = SDL_OpenAudioDeviceStream(SDL_AUDIO_DEVICE_DEFAULT_PLAYBACK,&AudioSpec,NULL,NULL);
        if (pAudioStream != NULL) {
            if (SDL_ResumeAudioStreamDevice(pAudioStream)) {
                while (1) {
                    if (SDL_GetAudioStreamQueued(pAudioStream) < 4000) {
                        // Fill audio buffer
                        for (z = 0; z < 1000; z++) {
                            i = i + 0.1;
                            value = sin(i) * 32767;
                            audio_buf[z * 2] = value;  // left channel
                            audio_buf[z * 2 + 1] = value; // right channel
                        }
                        if (!SDL_PutAudioStreamData(pAudioStream,audio_buf,sizeof(audio_buf))) {
                            SDL_Log("%s: SDL_QueueAudio() failed: %s",__FUNCTION__,SDL_GetError());
                        }
                    } else {
                        SDL_Delay(10);
                    }
                }
            } else {
                SDL_Log("%s: SDL_ResumeAudioStreamDevice() failed: %s",__FUNCTION__,SDL_GetError());
            }
        } else {
            SDL_Log("%s: SDL_OpenAudioDeviceStream() failed: %s",__FUNCTION__,SDL_GetError());
        }
    }
    return 0;
}

You can use “audacity” (or a similar program) in record mode that can show your waveform while your program is running.

1 Like

Okay, wait, there’s a lot of issues to talk about here.

First, let’s just open whatever the default device on the system is: SDL_OpenAudioDevice() with a NULL devName. Using SDL_GetNumAudioDevices() - 1 is probably not a good approach.

Next: you almost certainly don’t want specTarget.format = AUDIO_S32MSB; …you’re probably on an Intel-based computer, which is littleendian, but AUDIO_S32MSB is bigendian. Since you’re generating your own audio (a square wave), it’s always going to be in the system’s byteorder, so AUDIO_S32 will suffice here and SDL will pick the right one for your system.

Samples:

specTarget.samples  = 48000; // buffer size in sample frames, 2 seconds?

Since we’re running with a freq of 48000Hz, this is 1 second. And yes, this is a massive buffer size to choose. Usually this is more like 1024. This is (more or less) the hardware buffer, and how much data it will try to consume at once, not how much audio is buffered to be played. You want this to be much much lower. Not just for latency, but I wouldn’t be surprised if the system misbehaves or outright fails with numbers that high. You can still queue more data than this value, so definitely lower it to 1024.

check if there is any audio currently queued. If there is none, queue more square wave

You don’t want to wait for the amount queued to get to zero. If it hits zero, it means the system has run out of audio to play and is now playing silence until you feed it more data. Even if you’re doing this quickly, you’ll still have gaps in your audio output.

I would say queue about 1/60th of a second (about 16 milliseconds) of audio everytime there’s less than 1/16th queued, so at most you’ve got about 32 milliseconds buffered, and you have a good chunk of time to fill in more when it gets low before it totally runs out. The specific amounts can be tweaked, but that’s the idea.

SDL_GetQueuedAudioSize repeatedly reports 6400 bytes queued until about every second or so, where it suddenly reports 0 and then states that it is queueing audio.

That’s because the specTarget.samples is making it come along every second to pull in another one full second worth of audio, finding 1/16th of a second queued, taking all of that and filling in silence for the other 15. Then your app finds the queue empty and adds another 1/16th of a second worth of data, which SDL will come pick up when it finishes playing the 15/16th of a second worth of silence.

SDL3 works similarly: it just moves the SDL_QueueAudio function to an SDL_AudioStream object, but the exact same theory applies. But SDL3 also chooses a smaller samples for you when opening the device).

A nice thing about SDL_QueueAudio and SDL_AudioStream, though, is that you don’t have to be exact in things. Just buffer a bunch of audio and let SDL nibble on it as it needs to. If you want to give it 3/16th of a second at a time, you can, and it’ll still do the right thing, at the cost of a little more buffered memory, or you can give 2/16ths here and 1/16th here, whatever, as long as you stay ahead of playback.

Anyhow, you’re mostly there, just some small tweaks to fix small misunderstandings and you’ll be playing sound just fine!

3 Likes

About the spec format, I tried both LSB and MSB and both of them had their own weird issues that I’ll have to make it’s own problem. But for the moment, under
sdl_audio.h I see that AUDIO_S32 is just a macro for AUDIO_S32LSB, so I don’t understand how that would solve my problems? It doesn’t look like it’s doing any kind of automatic decision-making based on my device settings.

#define AUDIO_S32LSB    0x8020  /**< 32-bit integer samples */
#define AUDIO_S32MSB    0x9020  /**< As above, but big-endian byte order */
#define AUDIO_S32       AUDIO_S32LSB

Additionally, would I need to check for the endian type (?) of the user’s environment if I plan on putting my program on other devices? If I tell SDL “use the endian type of my machine” while compiling, presumably that would not work on a user’s device that has a different endian type?

For all intents and purposes, your program isn’t going to run on a big-endian system. x86 and ARM are both little-endian.

If you were to cross-compile for MIPS or PowerPC or some other big-endian CPU architecture, the compiler would take care of generating big-endian code and the only time you’d have to worry about it would be swapping byte order when loading/saving data.

Correct. However, this isn’t really a concern. If you compiled your program for your machine (presumably x86-based) the user couldn’t run it on a device with a different endianness because a different endianness implies a different CPU architecture.

1 Like

I fixed a lot of the obvious issues, such as the excessively large buffer, incorrect wait times, and incorrect frequency. I tried fiddling around with various buffer sizes such as 1/60, 1/30, and 1/20 seconds, and it seems like 1/16 seconds provides a good balance of latency and stability.

But I still have another issue which is that when I set little-endian format (or the default format which evaluates to little-endian anyway), the output becomes completely inaudible. Since this is technically a distinct problem (albeit a related one) from my original question, I don’t know if it should be considered its own post rather than being resolved here.

#include <SDL2/SDL.h>
#include <SDL2/SDL_audio.h>

int32 main() {

	/* Initialization for window and input management */

	if (SDL_InitSubSystem(SDL_INIT_AUDIO | SDL_INIT_VIDEO) < 0) {
		std::cerr << "failed to initialize SDL audio, video, or events: " << SDL_GetError() << "\n";
		return -1;
	}

	/* SDL window creation */

	SDL_AudioSpec specTarget;
	specTarget.format   = AUDIO_S32; // signed, 32-bit, little-endian (?), integer format
	// BUG: setting audio format to little-endian makes output inaudible
	specTarget.freq     = 48000; // sample rate in Hz
	specTarget.channels =     2;
	specTarget.samples  =  3000; // buffer size in sample frames, 1/16th seconds
	specTarget.size     =     0; // value is automatically overwritten by SDL
	specTarget.callback =     0; // will not be using a callback function
	specTarget.userdata =     0; // ignored because no callback function
	SDL_AudioSpec spec;
	SDL_AudioDeviceID devID = SDL_OpenAudioDevice(NULL, 0, &specTarget, &spec, 0);
	SDL_PauseAudioDevice(devID, 0);

	int32 waveAmp = 1000;
	uint32 waveFreqHz = 65;
	uint32 waveLenSamples = spec.freq / waveFreqHz;
	uint32 wavePhase = 0;

	while (loopCont == true) {
		
		/* Input management and exit condition */

		/* Graphics output */

		uint32 queueBytes = SDL_GetQueuedAudioSize(devID);
		if (queueBytes <= spec.size / 2) {

			uint32 lenBytes = spec.size - queueBytes; // fill remainder of buffer
			uint32 lenSamples = lenBytes / 4;

			int32 audioData[lenSamples];
			for (uint32 i = 0; i < lenSamples; i++) {
				++wavePhase;
				if (wavePhase > waveLenSamples) wavePhase = 0;
				if (wavePhase <= waveLenSamples / 2) audioData[i] =  waveAmp;
				if (wavePhase  > waveLenSamples / 2) audioData[i] = -waveAmp;
			}
			SDL_QueueAudio(devID, &audioData, lenBytes);
		}
	}

	/* Window destruction */
}

(I introduced an arbitrary break in the code block to force correct formatting, it’s supposed to be read as one continuous block)

An amplitude of 1000 is tiny compared to the huge range of a 32-bit sample. Maybe that’s why it’s “inaudible”.

Yes, that solved it. Even though I have 32-bit sample sizes, I’ve been mistakenly calculating my target values as if they were 16-bit.

I see that AUDIO_S32 is just a macro for AUDIO_S32LSB

Sorry, it’s AUDIO_S32SYS not AUDIO_S32. That was a mistake on my part.