Questions about SDL 1.3 and audio callback function

Hello!

I am starting to work on a VoIP/video chat application as a project, and to learn a few things. I think SDL 1.3 looks ideal for my needs (even though it’s still in development), because it appears from the API documentation that it allows hardware-accelerated streaming texture uploads and color-space conversion (which avoids me having to mess with PBOs and ARB fragment shaders in OpenGL). Please correct me if I’m wrong here.

However, I'm a bit confused by the somewhat sparsely documented audio callback function. I understand the audio driver will call the function to get a buffer of samples, which it will then play. However, because I want to reduce latency and processor overhead as much as possible while having tight audio, I need to figure out when it will need those samples, and how best to populate the buffer it hands me.

Firstly, how much execution time is reasonable for the audio callback function? If it buffers ahead a fair ways, I could call my audio decoder and populate the buffer directly from it within the callback function, but if it is expected that the callback will return quickly, I need to decode the audio somewhere else.

Also, I want to be able to sync video to the audio clock, essentially delaying or skipping frames based on audio timing (again, to minimize latency). It seems reasonable that I should get the timestamp of the audio segment being passed in the callback function, and update some global variable, so the video (probably running in a different thread, once I get that figured out) can know when to refresh, or skip a frame. However, this again depends on how much buffering is going to be done within SDL/the audio device.

Does anyone know about the details involved in this?

Thanks in advance for your help!

Message-ID:
Content-Type: text/plain; charset=“us-ascii”

Firstly, how much execution time is reasonable for the
audio callback function? If it buffers ahead a fair ways,
I could call my audio decoder and populate the buffer
directly from it within the callback function, but if it is
expected that the callback will return quickly, I need to
decode the audio somewhere else.

The execution time should always be a (fairly small) fraction of the
buffer duration.

Buffer duration in seconds = audio sample number / samples per second.
I’m pretty sure that you want this value to be some fraction of a
second for your particular app, instead of having a big buffer (see
below for why), but I’m equally certain you don’t actually get to
choose what it is on some platforms.

I’d suggest having a dedicated thread do the actual decoding, store
the data into a temporary buffer, and have the callback read from that
buffer.

? Also, I want to be able to sync video to the audio clock,
essentially delaying or skipping frames based on audio
timing (again, to minimize latency). It seems reasonable
that I should get the timestamp of the audio segment being
passed in the callback function, and update some global
variable, so the video (probably running in a different
thread, once I get that figured out) can know when to
refresh, or skip a frame. However, this again depends on
how much buffering is going to be done within SDL/the
audio device.

I’m not seeing any mention in the documentation of a timestamp being
passed to the callback, so I’m guessing you intend to provide that
yourself.

As I best recall, the callback is NORMALLY only used when the buffer
that was previously provided starts being played (so, when buffer 1
starts playing, SDL will request buffer 2), but I recommend providing
some way for the user to adjust the video/audio sync offset/speed
anyways, since this can potentially be an OS-specific trait.

Regardless, I believe that you can’t get progress information that’s
more accurate than the duration of the buffer without specialty
hardware (the actual playback is often done by a cheap chip with 1+
analog outputs, and the things apparently often don’t provide progress
info).> Date: Sat, 25 Dec 2010 02:11:48 +0000

From: “Fox, Paul A.”
To: “sdl at lists.libsdl.org
Subject: [SDL] Questions about SDL 1.3 and audio callback function

The execution time should always be a (fairly small) fraction of the
buffer duration.

Buffer duration in seconds = audio sample number / samples per second.
I’m pretty sure that you want this value to be some fraction of a
second for your particular app, instead of having a big buffer (see
below for why), but I’m equally certain you don’t actually get to
choose what it is on some platforms.

I’d suggest having a dedicated thread do the actual decoding, store
the data into a temporary buffer, and have the callback read from that
buffer.

Ah, OK, I was trying to avoid that to reduce overhead and simplify the program (especially since this is my first program using “heavyweight” threads), but if I can’t afford that kind of delay, I guess I’ll need a different thread to decode. In any case, I’m hoping this will run on older, single core machines, so I’m not expecting much real concurrency (hence, limiting the number of threads).

But in any case, wouldn’t the decoding take the same amount of time no matter where I do it (hopefully much shorter than the audio buffer)? I’m just not sure if there’s a “quick return” expectation with the callback (i.e. it gets called right before the samples are needed, rather than as soon as the previous buffer begins playing), it would be helpful to know when exactly it gets called, but maybe I’ll have to actually grok the SDL source to figure that out, unless a dev can comment on that.

The documentation on the callback seems to be somewhat lacking in general. Maybe once I get it figured out better, I can improve that.

I’m not seeing any mention in the documentation of a timestamp being
passed to the callback, so I’m guessing you intend to provide that
yourself.

As I best recall, the callback is NORMALLY only used when the buffer
that was previously provided starts being played (so, when buffer 1
starts playing, SDL will request buffer 2), but I recommend providing
some way for the user to adjust the video/audio sync offset/speed
anyways, since this can potentially be an OS-specific trait.

Regardless, I believe that you can’t get progress information that’s
more accurate than the duration of the buffer without specialty
hardware (the actual playback is often done by a cheap chip with 1+
analog outputs, and the things apparently often don’t provide progress
info).

Right, I was planning on grabbing a timestamp from the decoder and sticking that info in a struct with the audio data. The video doesn’t need really fine-grained timing, since it’ll max out at 30 fps, which should be pretty coarse compared to audio chunks. My buffer size (requested) will probably be based on the minimum the decoder can decode at once, and I do want it to be small, since I’m trying to minimize delay in the audio (hence, why I’m using lower-level APIs and not something like gstreamer). I’ll probably have to experiment a bit on different platforms, but hopefully I can get the audio pretty steady and low-delay cross-platform.

Thanks for your help!