SDL audio: push vs. pull model

Jeffrey_Bridge · February 14, 2001, 7:02pm

There are two basic ways to handle audio output in an application:
pushing the data out to a device, or having the data be pulled.
SDL currently uses the latter model, with its audio callback functions
and all. While this can be beneficial for people writing games, where it
makes for an easier interface to SDL_mixer, it is also a pain when
writing a multithreaded application that only outputs linear audio data,
or does its own mixing. When writing such an application, I had to come
up with a way of interfacing the push model of the main program with the
pull model of SDL audio, and I ended up needing an extra buffer in
between my final output stage and the SDL audio thread stage.

I would like to hear what others think about the relative merits of
these two models, and whether a push model could possibly be included in
SDL 1.3 as an option.

Note that SDL video (like just about any other video output framework)
uses the push model, where you call SDL functions (LockSurface,
UnlockSurface, UpdateRect) and put data directly onto the surface, in
order to output data.

~jeffrey :j

David_Olofson · February 15, 2001, 12:18am

applications!), the only difference between the push and pull models is where
the synchronization with the hardware is done. In the pull model, it’s done
by the driver or API layer, while in the push case, you have to do it
yourself.

IMHO, the push model is pretty much pointless in real time applications, and
only serves to encourage flawed design methods. After the Linux/lowlatency
patch was introduced, this became very obvious, as the majority of audio
applications had to be fixed to take advantage of the “very firm real time
scheduling” that it provides! The most common mistake is mixing DSP code and
GUI code in the same thread.

The problem is that, unless you use substantial buffering, the push model
simply doesn’t give you any advantages over the pull model. It just makes
your code messier. You cannot sit in a big main loop doing all sorts of
things, and occasionally push a buffer to the audio device, as that will
result in audio drop-outs as soon as something in the main loop takes a bit
longer than expected.

Now, this may not be a serious issue in a game that’s expected to run at full
frame rate, and possibly drop an occasional frame - a fixed audio latency
slightly larger than 2 video frame periods should be ok.

However, the ear is several times faster than the eye, making audio a lot
more sensitive to latency, and further, missing deadlines result in audio
drop-outs (usually resulting in moments of total silence), while dropping a
frame just produces a tiny “jerk” in the animations - something that’s rather
visible in scrolling 2D games, but hard to see in most 3D games.

In short, unless you’re doing something that’s very unsensitive to latency, I
think you should fix your code rather than break SDL.

The current audio API is one of the simplest and cleanest ones I’ve seen so
far, and it doesn’t create any problems WRT hard RT scheduling and the like.
(I have ideas for additions, but they don’t include a pure push style API.)

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | --------------------------------------> david at linuxdj.com -'On Wednesday 14 February 2001 20:02, Jeffrey Bridge wrote:

There are two basic ways to handle audio output in an application:
pushing the data out to a device, or having the data be pulled.
SDL currently uses the latter model, with its audio callback functions
and all. While this can be beneficial for people writing games, where it
makes for an easier interface to SDL_mixer, it is also a pain when
writing a multithreaded application that only outputs linear audio data,
or does its own mixing. When writing such an application, I had to come
up with a way of interfacing the push model of the main program with the
pull model of SDL audio, and I ended up needing an extra buffer in
between my final output stage and the SDL audio thread stage.

I would like to hear what others think about the relative merits of
these two models, and whether a push model could possibly be included in
SDL 1.3 as an option.

Note that SDL video (like just about any other video output framework)
uses the push model, where you call SDL functions (LockSurface,
UnlockSurface, UpdateRect) and put data directly onto the surface, in
order to output data.

From the timing perspective (an ever so important perspective in real time

slouken · February 15, 2001, 12:48am

Now, this may not be a serious issue in a game that’s expected to run at full
frame rate, and possibly drop an occasional frame - a fixed audio latency
slightly larger than 2 video frame periods should be ok.

As a data point, Quake III Arena suffers from occasional sound problems
because of this, and on the list of things to do is rewrite the audio to
run in a separate thread.

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Jeffrey_Bridge · February 15, 2001, 2:17am

David Olofson wrote:

From the timing perspective (an ever so important perspective in real time
applications!), the only difference between the push and pull models is where
the synchronization with the hardware is done. In the pull model, it’s done
by the driver or API layer, while in the push case, you have to do it
yourself.

Sometimes you want to do the synchronization yourself, like when writing an
audio/video player, where you need to maintain sync between those two…

IMHO, the push model is pretty much pointless in real time applications, and
only serves to encourage flawed design methods. After the Linux/lowlatency
patch was introduced, this became very obvious, as the majority of audio
applications had to be fixed to take advantage of the “very firm real time
scheduling” that it provides! The most common mistake is mixing DSP code and
GUI code in the same thread.

My code is extremely multithreaded, there is a dedicated thread for audio output
that simply reads from a buffer and calls an output function. The problem comes
when this thread has to wait until SDL calls the audio callback, which is in a
separate thread.

The problem is that, unless you use substantial buffering, the push model
simply doesn’t give you any advantages over the pull model. It just makes
your code messier. You cannot sit in a big main loop doing all sorts of
things, and occasionally push a buffer to the audio device, as that will
result in audio drop-outs as soon as something in the main loop takes a bit
longer than expected.

Yes, big main loops are stupid, the BeOS way is the way to go. That’s why I have
many threads, and I also DO have substantial buffering.

Now, this may not be a serious issue in a game that’s expected to run at full
frame rate, and possibly drop an occasional frame - a fixed audio latency
slightly larger than 2 video frame periods should be ok.

Right.

However, the ear is several times faster than the eye, making audio a lot
more sensitive to latency, and further, missing deadlines result in audio
drop-outs (usually resulting in moments of total silence), while dropping a
frame just produces a tiny “jerk” in the animations - something that’s rather
visible in scrolling 2D games, but hard to see in most 3D games.

Well, in my case, if SDL calls the audio callback, but my audio out thread
doesn’t have any data to give it, then you get REPEATED RANDOM data being
output, until such point as the audio out thread has something to give. And I’d
rather have a moment of silence instead of repeated buffers, the latter is MUCH
more annoying.

In short, unless you’re doing something that’s very unsensitive to latency, I

think you should fix your code rather than break SDL.

My code has nothing wrong with it, it’s just DIFFERENT! It’s coming from the
standpoint of a media player, instead of a game.

The current audio API is one of the simplest and cleanest ones I’ve seen so
far, and it doesn’t create any problems WRT hard RT scheduling and the like.
(I have ideas for additions, but they don’t include a pure push style API.)

Right. The API is good right now, except that if the system bogs down, in my
case, you get repeated buffers.

~jeffrey :j

David_Olofson · February 15, 2001, 2:56am

I noticed that, but only when I f*cked the retrace sync code up so it
dropped the frame rate to a few frames/second. I don’t think I’ve
managed to get it to drop-out under normal circumstances - but that’s
on a P-III 933 with a G400 MAX…

Anyway, how about supporting the “mix ahead + add” method with shared
memory? That’s probably the only way to relax the scheduling latency
dependency without increasing latency, at least without a real time
mixing engine. I’m thinking of a nice API for that, but the method is
inherently a bit messy… (Maybe it belongs in SDL_mixer, but the
method still requires basic support on the SDL level.) It’s possible
to do with ALSA, most OSS drivers, DirectSound and similar APIs, so I
think it should cover most gaming targets. (It’s easy to fake when
shared buffers aren’t available, I think.)

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | --------------------------------------> david at linuxdj.com -'On Thursday 15 February 2001 01:48, Sam Lantinga wrote:

Now, this may not be a serious issue in a game that’s expected to
run at full frame rate, and possibly drop an occasional frame - a
fixed audio latency slightly larger than 2 video frame periods
should be ok.

As a data point, Quake III Arena suffers from occasional sound
problems because of this, and on the list of things to do is
rewrite the audio to run in a separate thread.

David_Olofson · February 15, 2001, 1:15pm

David Olofson wrote:

From the timing perspective (an ever so important perspective in
real time applications!), the only difference between the push
and pull models is where the synchronization with the hardware is
done. In the pull model, it’s done by the driver or API layer,
while in the push case, you have to do it yourself.

Sometimes you want to do the synchronization yourself, like when
writing an audio/video player, where you need to maintain sync
between those two…

You can’t do that without occasional drop-outs. An audio stream
must be continous, so you can’t sync the actual stream to anything
but the audio card. What you sync with the video is the contents of
the stream, and that is preferably done by making the audio thread
aware of where the video time is, so that it can timestretch or
whatever it wants to do to stay in sync. (If you really want to sync
audio to video, and not vice versa, that is…)

IMHO, the push model is pretty much pointless in real time
applications, and only serves to encourage flawed design methods.
After the Linux/lowlatency patch was introduced, this became very
obvious, as the majority of audio applications had to be fixed to
take advantage of the “very firm real time scheduling” that it
provides! The most common mistake is mixing DSP code and GUI code
in the same thread.

My code is extremely multithreaded, there is a dedicated thread for
audio output that simply reads from a buffer and calls an output
function. The problem comes when this thread has to wait until SDL
calls the audio callback, which is in a separate thread.

Why do you use an extra audio thread when SDL already sets one
up…?

The problem is that, unless you use substantial buffering, the
push model simply doesn’t give you any advantages over the pull
model. It just makes your code messier. You cannot sit in a big
main loop doing all sorts of things, and occasionally push a
buffer to the audio device, as that will result in audio
drop-outs as soon as something in the main loop takes a bit
longer than expected.

Yes, big main loops are stupid, the BeOS way is the way to go.
That’s why I have many threads, and I also DO have substantial
buffering.

Now, this may not be a serious issue in a game that’s expected to
run at full frame rate, and possibly drop an occasional frame - a
fixed audio latency slightly larger than 2 video frame periods
should be ok.

Right.

However, the ear is several times faster than the eye, making
audio a lot more sensitive to latency, and further, missing
deadlines result in audio drop-outs (usually resulting in
moments of total silence), while dropping a frame just produces a
tiny “jerk” in the animations - something that’s rather visible
in scrolling 2D games, but hard to see in most 3D games.

Well, in my case, if SDL calls the audio callback, but my audio out
thread doesn’t have any data to give it, then you get REPEATED
RANDOM data being output, until such point as the audio out thread
has something to give. And I’d rather have a moment of silence
instead of repeated buffers, the latter is MUCH more annoying.

Why does this happen at all? Your callback function could do
something useful rather than letting the audio card underrun, as some
audio drivers would do.

In short, unless you’re doing something that’s very unsensitive
to latency, I

think you should fix your code rather than break SDL.

My code has nothing wrong with it, it’s just DIFFERENT! It’s
coming from the standpoint of a media player, instead of a game.

Well, different or not; I don’t see why media players should be
different from other real time applications.

I can see why you’d preffer a write() style API where you can pass
buffers of varying sizes, but that brings worse determinism. Thus, it
inherently requires additional buffering, and I can’t see why you
can’t do that properly in your callback function.

As to latency and reliability, SDL API support would possibly help a
little on targets with an underlying API that is capable of write()
style operations, or that supports circular buffers, or at least many
small fragments.

The current audio API is one of the simplest and cleanest ones
I’ve seen so far, and it doesn’t create any problems WRT hard RT
scheduling and the like. (I have ideas for additions, but they
don’t include a pure push style API.)

Right. The API is good right now, except that if the system bogs
down, in my case, you get repeated buffers.

The API can’t do much about that. Increased buffering is the only
cure for platforms with poor real time scheduling. (The “mix-ahead +
add” method is just a compromize that makes it possible to avoid that
all audio drops out - it’s of no use if you have only one,
adequately buffered signal source.)

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | --------------------------------------> david at linuxdj.com -'On Thursday 15 February 2001 03:17, Jeffrey Bridge wrote:

slouken · February 15, 2001, 5:28pm

As a data point, Quake III Arena suffers from occasional sound
problems because of this, and on the list of things to do is
rewrite the audio to run in a separate thread.

I noticed that, but only when I f*cked the retrace sync code up so it
dropped the frame rate to a few frames/second. I don’t think I’ve
managed to get it to drop-out under normal circumstances - but that’s
on a P-III 933 with a G400 MAX…

Well, it’s definitely a problem on more “modest” hardware, like a
P-III 450. Heh.

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Jeffrey_Bridge · February 15, 2001, 5:26pm

David Olofson wrote:

You can’t do that without occasional drop-outs. An audio stream
must be continous, so you can’t sync the actual stream to anything
but the audio card. What you sync with the video is the contents of
the stream, and that is preferably done by making the audio thread
aware of where the video time is, so that it can timestretch or
whatever it wants to do to stay in sync. (If you really want to sync
audio to video, and not vice versa, that is…)

Well, I suppose I should sync video to audio, right?

Why do you use an extra audio thread when SDL already sets one
up…?

Because the player is plugin-based, and you can use other output
targets. I.E., you could write an audio out driver that pushed audio
across a network pipe or something…

If I removed my thread, it would cause two problems:

for an SDL audio out target, the callback function would have to know
about many internals of the player core. This is bad programming
practice.
for a different audio out target, each plugin therein would have to
write its own threading code.

Additionally, suppose the user wants to write the audio back into a
file, instead of outputting it? Then the audio output plugin really WILL
be doing write()… or fwrite().

Why does this happen at all? Your callback function could do
something useful rather than letting the audio card underrun, as some
audio drivers would do.

What would be something useful to do when there’s no data? I have an
open mind here.

Well, different or not; I don’t see why media players should be
different from other real time applications.

I can see why you’d preffer a write() style API where you can pass
buffers of varying sizes, but that brings worse determinism. Thus, it
inherently requires additional buffering, and I can’t see why you
can’t do that properly in your callback function.

Once again, the callback is encapsulated into the SDL audio output
plugin.

As to latency and reliability, SDL API support would possibly help a
little on targets with an underlying API that is capable of write()
style operations, or that supports circular buffers, or at least many
small fragments.

Howso?

The API can’t do much about that. Increased buffering is the only
cure for platforms with poor real time scheduling. (The “mix-ahead +
add” method is just a compromize that makes it possible to avoid that
all audio drops out - it’s of no use if you have only one,
adequately buffered signal source.)

I guess the real problem comes from wanting to support both physical
audio output and output to a file… I didn’t think it’d be a problem,
since you can open /dev/dsp and write to it just as easily as you would
a file, but what do I know? Heh.

~jeffrey :j

slouken · February 15, 2001, 5:30pm

My code is extremely multithreaded, there is a dedicated thread for
audio output that simply reads from a buffer and calls an output
function. The problem comes when this thread has to wait until SDL
calls the audio callback, which is in a separate thread.

Why do you use an extra audio thread when SDL already sets one
up…?

SMPEG does this as well. It’s because the audio decoding needs to
happen asynchronously to the audio callback. On low end systems,
there just isn’t time to decode an entire buffer of audio date in
the time allowed by the SDL audio callback. SDL calls the audio
callback at the beginning of the audio timeslice, but when the audio
decompression takes close to the entire time to perform, or the system
is heavily loaded and the app starves, a separate thread queuing buffers
is the only way to go.

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

David_Olofson · February 15, 2001, 7:31pm

David Olofson wrote:

You can’t do that without occasional drop-outs. An audio stream
must be continous, so you can’t sync the actual stream to anything
but the audio card. What you sync with the video is the contents of
the stream, and that is preferably done by making the audio thread
aware of where the video time is, so that it can timestretch or
whatever it wants to do to stay in sync. (If you really want to sync
audio to video, and not vice versa, that is…)

Well, I suppose I should sync video to audio, right?

Since the eye is more tolerant to slight variations in frame rate (especially
on “normal” video material) than the ear is to drop-outs or crappy time
stretching; yes, I think so.

Why do you use an extra audio thread when SDL already sets one
up…?

Because the player is plugin-based, and you can use other output
targets. I.E., you could write an audio out driver that pushed audio
across a network pipe or something…

Hmmm… I’m actually designing a multimedia plugin API that’s meant to be
used for that kind of stuff, as well as for ultra low latency hard real time
audio. Guess what - plugins are normally callback driven!

Many plugin APIs use separate threads for plugins, but they all have problems
with plugins confusing the scheduler, and IPC latencies in general. That kind
of designs do work for low latency stuff on hard RTOSes, but still just
complicate things tremendously.

One DSP thread per CPU and plugins being called one by one in a fixed order
is the way to go; it’s simple, fast, efficient, reliable and works in nearly
any environment, including inside ISRs. Most importantly (for multitrack
audio and similar systems), it doesn’t make the lowest latency a function of
the number of plugins in the net.

If I removed my thread, it would cause two problems:

for an SDL audio out target, the callback function would have to know
about many internals of the player core. This is bad programming
practice.

Why does it have to now anything about that? I think it’s bad programming
practice in the first place if the API wrappers aren’t separated from the
player core.

for a different audio out target, each plugin therein would have to
write its own threading code.

The average audio application with plugins just load the plugins as shared
libraries and run them as callbacks, no matter what the target audio API is.
I don’t see the problem with moving something like that between OSS, ALSA,
DirectSound, Windows MMSystem audio and SDL…

It’s the audio plugin host that’s connected to whatever API you use, so the
plugins need to know nothing about that. It would even be possible to make
the host a form of plugin similar to the SDL audio callback, and then call
that from the different audio API wrappers.

Additionally, suppose the user wants to write the audio back into a
file, instead of outputting it? Then the audio output plugin really WILL
be doing write()… or fwrite().

Yeah… So what? VST and LADSPA handles that just fine, and so will yet
another callback driven plugin API; MAIA. If you’re just going to render
off-line to disk, all you do is:

while(playing)
{
	...
	call_plugin(..., buffer, buffer_size);
	write(fd, buffer, buffer_size);
}

For recording during real time playback, you must use a “disk butler thread”
as we call it around the Linux Audio Dev list, and some FIFO buffering in
between the audio thread and this disk thread. This is required to keep the
disk access from causing drop-outs in the real time playback.

The easiest way to implement that would be to just use a single reader-single
writer lock-free FIFO that’s written to by an interface plugin called as
usual from within the audio thread, and then have the disk butler run in it’s
own thread polling the other end of the FIFO. The FIFO should preferably have
several seconds worth of space, and the polling can be throttled by sleeping
for a sensible amount of ms calculated from the amount of data in the FIFO
when going to sleep.

This method works very well in real life, and is used in applications that
record and/or playback 20-60 channels of 44.1 kHz 16 or 32 bit (depending on
hard drive speed) audio while doing rock solid real time processing with 3 ms
input->output latency.

Why does this happen at all? Your callback function could do
something useful rather than letting the audio card underrun, as some
audio drivers would do.

What would be something useful to do when there’s no data? I have an
open mind here.

I don’t, actually. An underrun is a fatal error, almost as severe as file
corruption in a recording studio, so I tend not to accept them.

And there is no need to, at least not with Linux/lowlatency or some other
RTOS solution. Just get the design right (that is, apply control engineering
design rules - sorry, no shortcuts), and it will be rock solid as long as you
have enough CPU power. Guaranteed. We’re doing it all the time.

Anyway, if you really do want to provide some emergency solution for users
without a hard real time capable OS, you could just generate silence when
there is an underrun in the input buffer. That’s what the audio drivers do,
and actually, if you’r driving the whole audio engine from the callback,
that’s what will happen automatically if the CPU is overloaded, so you don’t
have to do anythnig special. I’m more concerned with streaming from disk;
that’s where you’d have a thread->thread connection that could get underruns
that ou might be able to handle in some nicer way than just freaking out.

Well, different or not; I don’t see why media players should be
different from other real time applications.

I can see why you’d preffer a write() style API where you can pass
buffers of varying sizes, but that brings worse determinism. Thus, it
inherently requires additional buffering, and I can’t see why you
can’t do that properly in your callback function.

Once again, the callback is encapsulated into the SDL audio output
plugin.

As to latency and reliability, SDL API support would possibly help a
little on targets with an underlying API that is capable of write()
style operations, or that supports circular buffers, or at least many
small fragments.

Howso?

Because you have a kernel driver at the other end of write(), while you have
a user space thread at the other end of your “emulated” write(). The
difference is that the latter is a lot more likely to lose the CPU when the
code it’s trying to “protect” does. Also, adding an extra thread between the
audio API and the data source thread multiplies the risk of getting huge
scheduling latency peaks and other related timing problems.

As a matter of fact, as soon as you have more than one thread per CPU
interacting in a chain style manner, you’re asking for trouble, even if
you’re using a real RTOS. In most cases it’s considered poor design from a
real time programming perspective.

The API can’t do much about that. Increased buffering is the only
cure for platforms with poor real time scheduling. (The “mix-ahead +
add” method is just a compromize that makes it possible to avoid that
all audio drops out - it’s of no use if you have only one,
adequately buffered signal source.)

I guess the real problem comes from wanting to support both physical
audio output and output to a file… I didn’t think it’d be a problem,
since you can open /dev/dsp and write to it just as easily as you would
a file, but what do I know? Heh.

There’s a lot more to it than that, indeed… heh

Just put any supposedly “non-blocking” IPC construct in between a thread that
deals with disk I/O and an audio thread, and you’ll have nice drop-outs every
now and then, for no obvious reason. You need substantial buffering, and you
need lock-free access to it, at least from the low latency (audio) thread.
Ring buffers + atomic offset updates in shared memory is the way to go; ie do
the sync using the system bus hardware to keep the OS from getting in the way.

That’s real time audio programming on workstation platforms in a nutshell.

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | --------------------------------------> david at linuxdj.com -'On Thursday 15 February 2001 18:26, Jeffrey Bridge wrote:

David_Olofson · February 15, 2001, 7:36pm

That sounds very similar to streaming to/from disk into a low latency DSP
engine… I’d propose basically the same solution; by all means, keep the
“high latency” thread, and make sure to use adequate buffering and preferably
a lock-free IPC mechanism. If that doesn’t cut it, I’m afraid you have to fix
the OS…

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | --------------------------------------> david at linuxdj.com -'On Thursday 15 February 2001 18:30, Sam Lantinga wrote:

My code is extremely multithreaded, there is a dedicated thread for
audio output that simply reads from a buffer and calls an output
function. The problem comes when this thread has to wait until SDL
calls the audio callback, which is in a separate thread.

Why do you use an extra audio thread when SDL already sets one
up…?

SMPEG does this as well. It’s because the audio decoding needs to
happen asynchronously to the audio callback. On low end systems,
there just isn’t time to decode an entire buffer of audio date in
the time allowed by the SDL audio callback. SDL calls the audio
callback at the beginning of the audio timeslice, but when the audio
decompression takes close to the entire time to perform, or the system
is heavily loaded and the app starves, a separate thread queuing buffers
is the only way to go.

Jeffrey_Bridge · February 15, 2001, 10:23pm

David Olofson wrote:

Since the eye is more tolerant to slight variations in frame rate (especially
on “normal” video material) than the ear is to drop-outs or crappy time
stretching; yes, I think so.

Allrighty, I’ll be making the minor adjustments to the sync code needed to
accomplish this. Right now, there’s no software sync between audio and video, I
just trust the sound card to output stuff at a throttled rate, and keep careful
tabs on the video frames being output at exactly the rate they should be.

Hmmm… I’m actually designing a multimedia plugin API that’s meant to be
used for that kind of stuff, as well as for ultra low latency hard real time
audio. Guess what - plugins are normally callback driven!

Well, yes! My plugins are callback driven – but not in the same sense as
SDL audio is callback driven.

Specifically… I have a disk read/decode thread that writes to a FIFO. The
audio out thread then reads from that FIFO. The audio out thread then calls a
plugin function of the form

int (*DoOutput)(void *data, AFCA_Frame *f);

which is approximately akin to

int (*DoOutput)(void *data, void *buf, long length);

The issue is that for the SDL aout plugin, its DoOutput function can’t just
output the data, it has to essentially put a pointer to buf into a shared
location, and wait for SDL to call the separate callback function, and the
callback function to complete.

Many plugin APIs use separate threads for plugins, but they all have problems
with plugins confusing the scheduler, and IPC latencies in general. That kind
of designs do work for low latency stuff on hard RTOSes, but still just
complicate things tremendously.

Mine is pretty simple, when it gets down to it

One DSP thread per CPU and plugins being called one by one in a fixed order
is the way to go; it’s simple, fast, efficient, reliable and works in nearly
any environment, including inside ISRs. Most importantly (for multitrack
audio and similar systems), it doesn’t make the lowest latency a function of
the number of plugins in the net.

Mmmmm, but when you’re playing audio and video, on a single CPU box, it can be
very difficult to schedule things if it’s all one thread. My philosophy is to
basically make lots and lots of threads, and use mutexes and condition variables
to synchronize the FIFOs in between them. This way, when a thread is waiting, it
sleep()s, and it gets woken up on the appropriate event. It works no matter how
many CPUs you have, it trusts the OS to do the scheduling, and it uses more CPUs
if you have them – automatically. (I.E., you don’t have to try and figure out
how many CPUs there are and spawn specifically that many threads.)

Why does it have to now anything about that? I think it’s bad programming
practice in the first place if the API wrappers aren’t separated from the
player core.

As I said above, the plugin DoOutput just gets a pointer and a buffer length, and
it takes it from there.

If I wanted to have the callback function handle reading from the first FIFO (the
one written to by diskread/decode), it would have to KNOW about the FIFO, which
it currently does not. Right now, the FIFO is hidden away from the plugins.

The average audio application with plugins just load the plugins as shared
libraries and run them as callbacks, no matter what the target audio API is.
I don’t see the problem with moving something like that between OSS, ALSA,
DirectSound, Windows MMSystem audio and SDL…

It’s the audio plugin host that’s connected to whatever API you use, so the
plugins need to know nothing about that. It would even be possible to make
the host a form of plugin similar to the SDL audio callback, and then call
that from the different audio API wrappers.

I don’t quite get how that all works, i.e. which specific thing is the “audio API
wrappers”… but that’s okay, I get the read of this message, I think

Yeah… So what? VST and LADSPA handles that just fine, and so will yet
another callback driven plugin API; MAIA. If you’re just going to render
off-line to disk, all you do is:
    while(playing)
    {
            ...
            call_plugin(..., buffer, buffer_size);
            write(fd, buffer, buffer_size);
    }

Which thread would this run in, though? Would it be a special type of thread
spawned only for disk writing? If so, then that’s dumb, because my threads
shouldn’t care what type of output they’re doing.

For recording during real time playback, you must use a “disk butler thread”
as we call it around the Linux Audio Dev list, and some FIFO buffering in
between the audio thread and this disk thread. This is required to keep the
disk access from causing drop-outs in the real time playback.

Right. I have plenty of FIFOs, if/when I get around to supporting one input and
two output targets, there will be plenty of buffers.

The easiest way to implement that would be to just use a single reader-single
writer lock-free FIFO that’s written to by an interface plugin called as
usual from within the audio thread, and then have the disk butler run in it’s
own thread polling the other end of the FIFO. The FIFO should preferably have
several seconds worth of space, and the polling can be throttled by sleeping
for a sensible amount of ms calculated from the amount of data in the FIFO
when going to sleep.

Ahhh, but I thought we realized that polling was a bad idea back around when
Ethernet cards used to work that way… (you had to poll the card to see if
there was new data). By using condition variables, it sleeps for the right amount
of time, automatically.

This method works very well in real life, and is used in applications that
record and/or playback 20-60 channels of 44.1 kHz 16 or 32 bit (depending on
hard drive speed) audio while doing rock solid real time processing with 3 ms
input->output latency.

Wow. For my case, I don’t really care about in/out latency, because the input is
a file and there’s lots of buffering, but that’s pretty impressive.

I don’t, actually. An underrun is a fatal error, almost as severe as file
corruption in a recording studio, so I tend not to accept them.

Well, the underrun is only for audio being played back to a user, and the only
time it would happen is when they have a slow computer and they try to open up
windows explorer (like my roomate’s comp :-).

In that case, Winamp seems to start repeating the last good buffer, getting that
stuttering effect which is so annoying.

And there is no need to, at least not with Linux/lowlatency or some other
RTOS solution. Just get the design right (that is, apply control engineering
design rules - sorry, no shortcuts), and it will be rock solid as long as you
have enough CPU power. Guaranteed. We’re doing it all the time.

Hehehehe, I’m not taking any shortcuts. But all the realtime junk isn’t as
important either.

Anyway, if you really do want to provide some emergency solution for users
without a hard real time capable OS, you could just generate silence when

Or without enough CPU power to run ${media player} and open up IE at the same
time.

there is an underrun in the input buffer. That’s what the audio drivers do,

Okey dokey.

and actually, if you’r driving the whole audio engine from the callback,
that’s what will happen automatically if the CPU is overloaded, so you don’t
have to do anythnig special. I’m more concerned with streaming from disk;
that’s where you’d have a thread->thread connection that could get underruns
that ou might be able to handle in some nicer way than just freaking out.

Yes, when there’s streaming to disk, it’ll be handled better. I’m actually
working out a whole new API for doing that, and leaving the current one
behind… but that’s another issue.

Because you have a kernel driver at the other end of write(), while you have
a user space thread at the other end of your “emulated” write(). The

Ummm, no, if there was an SDL audio_write() call, I would expect there to be no
extra threads, I would expect that call to simply output the data, returning when
it was ready.

difference is that the latter is a lot more likely to lose the CPU when the
code it’s trying to “protect” does. Also, adding an extra thread between the
audio API and the data source thread multiplies the risk of getting huge
scheduling latency peaks and other related timing problems.

Heh, when I’m playing audio I currently have three threads total, and in my new
API it might even be four. But it has worked extremely well using just mutexes
and condition variables.

As a matter of fact, as soon as you have more than one thread per CPU
interacting in a chain style manner, you’re asking for trouble, even if
you’re using a real RTOS. In most cases it’s considered poor design from a
real time programming perspective.

I have not had a single bit of trouble with the way I’m doing it. When I’m
playing an MPEG file with audio and video, I end up with around 10 threads, on a
two-CPU machine, but the audio never skips and the video is perfectly
synchronized.

There’s a lot more to it than that, indeed… heh

Just put any supposedly “non-blocking” IPC construct in between a thread that
deals with disk I/O and an audio thread, and you’ll have nice drop-outs every
now and then, for no obvious reason. You need substantial buffering, and you

Interesting, no problems for me…

need lock-free access to it, at least from the low latency (audio) thread.
Ring buffers + atomic offset updates in shared memory is the way to go; ie do
the sync using the system bus hardware to keep the OS from getting in the way.

Aha.

That’s real time audio programming on workstation platforms in a nutshell.

I appreciate the discussion and the education I’m getting out of this.

~jeffrey :j

David_Olofson · February 16, 2001, 2:28am

David Olofson wrote:

Since the eye is more tolerant to slight variations in frame rate
(especially on “normal” video material) than the ear is to drop-outs or
crappy time stretching; yes, I think so.

Allrighty, I’ll be making the minor adjustments to the sync code needed to
accomplish this. Right now, there’s no software sync between audio and
video, I just trust the sound card to output stuff at a throttled rate, and
keep careful tabs on the video frames being output at exactly the rate they
should be.

Hmmm… I’m actually designing a multimedia plugin API that’s meant to be
used for that kind of stuff, as well as for ultra low latency hard real
time audio. Guess what - plugins are normally callback driven!

Well, yes! My plugins are callback driven – but not in the same sense as
SDL audio is callback driven.

Specifically… I have a disk read/decode thread that writes to a FIFO.
The audio out thread then reads from that FIFO. The audio out thread then
calls a plugin function of the form

int (*DoOutput)(void *data, AFCA_Frame *f);

which is approximately akin to

int (*DoOutput)(void *data, void *buf, long length);

The issue is that for the SDL aout plugin, its DoOutput function can’t just
output the data, it has to essentially put a pointer to buf into a shared
location, and wait for SDL to call the separate callback function, and the
callback function to complete.

Who calls this plugin, and what is “who” synced to?

The normal solution in audio application is that “who” is the plugin host,
and would in this case be executed by the SDL audio callback.

One DSP thread per CPU and plugins being called one by one in a fixed
order is the way to go; it’s simple, fast, efficient, reliable and works
in nearly any environment, including inside ISRs. Most importantly (for
multitrack audio and similar systems), it doesn’t make the lowest latency
a function of the number of plugins in the net.

Mmmmm, but when you’re playing audio and video, on a single CPU box, it can
be very difficult to schedule things if it’s all one thread.

Well, the video is dealing with higher latencies, so it’s kind of natural to
run that as a lower priority thread.

My philosophy
is to basically make lots and lots of threads, and use mutexes and
condition variables to synchronize the FIFOs in between them. This way,
when a thread is waiting, it sleep()s, and it gets woken up on the
appropriate event. It works no matter how many CPUs you have, it trusts the
OS to do the scheduling,

That’s the problem… This doesn’t work with low latency processing on any
“normal” OS, and results in extremely complex and practically unpredictable
timing phenomena, which is why it doesn’t work very well on hard RTOS kernels
either.

and it uses more CPUs if you have them –
automatically. (I.E., you don’t have to try and figure out how many CPUs
there are and spawn specifically that many threads.)

Why does it have to now anything about that? I think it’s bad programming
practice in the first place if the API wrappers aren’t separated from the
player core.

As I said above, the plugin DoOutput just gets a pointer and a buffer
length, and it takes it from there.

If I wanted to have the callback function handle reading from the first
FIFO (the one written to by diskread/decode), it would have to KNOW about
the FIFO, which it currently does not. Right now, the FIFO is hidden away
from the plugins.

What’s the problem with actually having a host to deal with everything that
doesn’t belong in the plugin API? Works for all sorts of hard disk recording
apps with DSP plugins.

The average audio application with plugins just load the plugins as
shared libraries and run them as callbacks, no matter what the target
audio API is. I don’t see the problem with moving something like that
between OSS, ALSA, DirectSound, Windows MMSystem audio and SDL…

It’s the audio plugin host that’s connected to whatever API you use, so
the plugins need to know nothing about that. It would even be possible to
make the host a form of plugin similar to the SDL audio callback, and
then call that from the different audio API wrappers.

I don’t quite get how that all works, i.e. which specific thing is the
“audio API wrappers”… but that’s okay, I get the read of this message, I
think

Well, for example, if you have a disk I/O thread and one or more DSP plugins,
the typical audio application solution would be to implement “something” that
can host the plugins, read/write data from/to the I/O thread via the FIFO
interface, and that could run as an SDL audio callback. Only this “something”
would know about the FIFO and SDL audio. This “something” would be what audio
hackers generally call a plugin host.

In order to save time and code, you may create a generic callback interface
to drive this host, rather than plugging it in as an SDL audio callback
directly. Then you can write a bunch of simple wrappers that either “forward”
callbacks, or run their own threads, doing read()/write() style I/O and
calling the host from their I/O loops.

As a simplistic example, here’s the “wrapper” function that’s called from an
“infinite” loop to run a simple callback driven soft synth against the OSS
(read/write style) API. Non-blocking I/O is used for MIDI rather than a
separate thread, for simplicity. This function is just called repeatedly
until the application wants to stop, or the function returns an error code.

int engine_cycle()
{
int frames = adev.fragmentsize>>2;
if(external_level)
if( read(adev.infd, extbuf_s16, adev.fragmentsize) < 0 )
return -1;
player_play(&player, frames);
check_midi();
engine_process(buf_s16, frames);
if( write(adev.outfd, buf_s16, adev.fragmentsize) < 0)
return -2;
return 0;
}

player_play() - song file player
check_midi() - MIDI input decoder
engine_process() - synth engine (calls lots of callbacks in turn)

Blocking is done on either audio in or audio out, depending on whether or not
the input is used. (If external_level == 0, the input is never opened by the
init code, so the blocking is done on the writes.)

Now, your FIFO would work just like the MIDI input in this code; non-blocking
read/write. If the data isn’t there, you can’t do much - if you try to wait,
you’ll only move the drop-out beyond your control; to the audio card driver.

Yeah… So what? VST and LADSPA handles that just fine, and so will yet
another callback driven plugin API; MAIA. If you’re just going to render
off-line to disk, all you do is:
    while(playing)
    {
            ...
            call_plugin(..., buffer, buffer_size);
            write(fd, buffer, buffer_size);
    }
Which thread would this run in, though? Would it be a special type of
thread spawned only for disk writing? If so, then that’s dumb, because my
threads shouldn’t care what type of output they’re doing.

Cannot be done. If you want to do output to disk while still running in real
time, you have to pass the data via a FIFO to a separate disk writer thread.

The above was on example of off-line processing, which basically means that
you render the data to disk as fast as possible, blocking on the write() if
the CPU is faster than the drive. (This obviously cannot be used if you’re
processing “live” signals, as opposed to data from another file.

For recording during real time playback, you must use a “disk butler
thread” as we call it around the Linux Audio Dev list, and some FIFO
buffering in between the audio thread and this disk thread. This is
required to keep the disk access from causing drop-outs in the real time
playback.

Right. I have plenty of FIFOs, if/when I get around to supporting one input
and two output targets, there will be plenty of buffers.

The easiest way to implement that would be to just use a single
reader-single writer lock-free FIFO that’s written to by an interface
plugin called as usual from within the audio thread, and then have the
disk butler run in it’s own thread polling the other end of the FIFO. The
FIFO should preferably have several seconds worth of space, and the
polling can be throttled by sleeping for a sensible amount of ms
calculated from the amount of data in the FIFO when going to sleep.

Ahhh, but I thought we realized that polling was a bad idea back around
when Ethernet cards used to work that way… (you had to poll the card to
see if there was new data). By using condition variables, it sleeps for the
right amount of time, automatically.

Well, “polling” isn’t the correct term, actually; it’s more like “sleep for a
while, then check the FIFO, and go back to sleep if there’s nothing to do”.
It has to be done that way if you have a 2.1 ms latency SCHED_FIFO thread in
the other end of that FIFO, as you cannot call any IPC syscalls from within
the audio thread without occasionally missing deadlines. Deal with it, or fix
the kernel. (Or use an OS designed 100% for hard RT - lots of other issues
with those, however, so few audio hackers care much for them.)

This method works very well in real life, and is used in applications
that record and/or playback 20-60 channels of 44.1 kHz 16 or 32 bit
(depending on hard drive speed) audio while doing rock solid real time
processing with 3 ms input->output latency.

Wow. For my case, I don’t really care about in/out latency, because the
input is a file and there’s lots of buffering, but that’s pretty
impressive.

Well, it sure demonstrates that there is a bit of real time potential in
mainstream PC hardware after all…

Any problems are due to broken or flawed hardware, OS shortcommings,
application design errors - or any combination of those.

I don’t, actually. An underrun is a fatal error, almost as severe as
file corruption in a recording studio, so I tend not to accept them.

Well, the underrun is only for audio being played back to a user, and the
only time it would happen is when they have a slow computer and they try to
open up windows explorer (like my roomate’s comp :-).

hehe

In that case, Winamp seems to start repeating the last good buffer, getting
that stuttering effect which is so annoying.

DirectSound… The same would happen with mmap() on OSS or ALSA, or any other
“shared DMA buffer” interface, as the buffer just keeps looping and there
isn’t anyone around to even clear it.

And there is no need to, at least not with Linux/lowlatency or some other
RTOS solution. Just get the design right (that is, apply control
engineering design rules - sorry, no shortcuts), and it will be rock
solid as long as you have enough CPU power. Guaranteed. We’re doing it
all the time.

Hehehehe, I’m not taking any shortcuts. But all the realtime junk isn’t as
important either.

It’s real time as soon as there is any deadline to meet - be it within 50
?s, or the next day! If you miss a deadline, you’re out of control. If
you want control, don’t miss deadlines.

Anyway, if you really do want to provide some emergency solution for
users without a hard real time capable OS, you could just generate
silence when

Or without enough CPU power to run ${media player} and open up IE at the
same time.

Well, that’s not really a CPU power issue - it’s perfectly safe to fire up
Netscape and other hogs while doing 3 ms latency processing on
Linux/lowlatency, and the same applies for some 100 ?s with RTLinux or RTAI.
It’s just a matter of using an OS with a real time capable scheduler.

there is an underrun in the input buffer. That’s what the audio drivers
do,

Okey dokey.

(Well, provided you’re not using a shared memory style interface, that is.
DirectSound, regardless of “mode”, is a shared memory interface from the
kernel driver perspective…)

Because you have a kernel driver at the other end of write(), while you
have a user space thread at the other end of your “emulated” write(). The

Ummm, no, if there was an SDL audio_write() call, I would expect there to
be no extra threads, I would expect that call to simply output the data,
returning when it was ready.

Yeah… I might have lost track when writing that, but that was exactly the
point; that’s the only technical advantage of such an interface - provided it
can actually be implemented that way on the actual target.

difference is that the latter is a lot more likely to lose the CPU when
the code it’s trying to “protect” does. Also, adding an extra thread
between the audio API and the data source thread multiplies the risk of
getting huge scheduling latency peaks and other related timing problems.

Heh, when I’m playing audio I currently have three threads total, and in my
new API it might even be four. But it has worked extremely well using just
mutexes and condition variables.

(Almost) anything works with enough buffering…

As a matter of fact, as soon as you have more than one thread per CPU
interacting in a chain style manner, you’re asking for trouble, even if
you’re using a real RTOS. In most cases it’s considered poor design from
a real time programming perspective.

I have not had a single bit of trouble with the way I’m doing it. When I’m
playing an MPEG file with audio and video, I end up with around 10 threads,
on a two-CPU machine, but the audio never skips and the video is
perfectly synchronized.

Sounds good.

There’s a lot more to it than that, indeed… heh

Just put any supposedly “non-blocking” IPC construct in between a thread
that deals with disk I/O and an audio thread, and you’ll have nice
drop-outs every now and then, for no obvious reason. You need substantial
buffering, and you

Interesting, no problems for me…

It usually doesn’t make much difference on operating systems without a proper
real time scheduling mode that entirely bypasses timesharing and similir
systems - you’re likely to get drop-outs because of all sorts of other
interference long before you hit the OS IPC latency problems. This differs
between OS kernels though, so I won’t say this is always the case.

need lock-free access to it, at least from the low latency (audio)
thread. Ring buffers + atomic offset updates in shared memory is the way
to go; ie do the sync using the system bus hardware to keep the OS from
getting in the way.

Aha.

That’s real time audio programming on workstation platforms in a
nutshell.

I appreciate the discussion and the education I’m getting out of this.

Might come in handy some day, although I think you can live without the
hardcore constructs with sensible amounts of buffering in this case.

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter | --------------------------------------> david at linuxdj.com -'On Thursday 15 February 2001 23:23, Jeffrey Bridge wrote: