Audio documentation

redman · April 21, 2004, 3:00am

hi ppl.,

i’m new to the list (and sdl) and i’ve got a question.,

does anyone have some good docs about the audio part of sdl?
i’m having some speed issues and i’m not sure what’s causing it.
(working on windows / msdev6 / c)

i do some calcs in the callback function but at a certain point (ammount
of calcs) the buffer
starts skipping., but i’m SURE i have loads of cycles left to do it ,…,

i’m filling the buffer by hand (just a for loop that fills the stream
pointer).,

could someone explain when it’s the best time to call SDL_LockAudio() /
SDL_UnlockAudio() ?
at the moment i call them in the callback function before and after i’m
accesing the buffer pointer
cause the docs say that they temporarily disable the callback., but i’ve
seen some examples and
it was used in another place.,
i don’t get this.,

so i have basically two questions
1 where should i generate the data that would go into the buffer
2 when should i call SDL_LockAudio() / SDL_UnlockAudio()

greets,
aka.,

David_Olofson · April 21, 2004, 5:31am

hi ppl.,

i’m new to the list (and sdl) and i’ve got a question.,

does anyone have some good docs about the audio part of sdl?
i’m having some speed issues and i’m not sure what’s causing it.
(working on windows / msdev6 / c)

i do some calcs in the callback function but at a certain point
(ammount of calcs) the buffer
starts skipping., but i’m SURE i have loads of cycles left to do it
,…,

This is because you’re not on a real time operating system… There is
a hard deadline for delivering the audio buffer (ie returning from
the callback), and if you miss it, audio skips. Meanwhile, a standard
OS scheduler and some background system load will cause the audio
callback to start anywhere from a few ?s through tens or
(occasionally) hundreds of ms late, wasting the cycles you were going
to use for processing, or even causing audio drop-outs before you get
a chance to do anything at all. Not much you can do about that, short
of switching to Linux/lowlatency or BeOS. :-/

What you can do is increase the buffering (latency) so that the
scheduling jitter accounts for a smaller part of the audio buffer
period. That way, you can use more CPU power for audio with less
drop-outs, but on an OS like Windows, there is no way ever you’re
going to get anywhere near 100%, and/or totally eliminate the
drop-outs when doing low latency audio. (I think you can get pretty
close with Win2k or XP, a decent proffesional sound card and running
your audio code in kernel space as Kernel Stream filters, but that’s
insane for “consumer” multimedia such as games.)

[…]

so i have basically two questions
1 where should i generate the data that would go into the buffer

In the callback. (Otherwise, you’ll need extra buffering and thread
safe communication between the audio engine and the callback. That
can make sense in some cases, but definitely not when you want
interactive low latency sound effects.)

2 when should i call SDL_LockAudio() / SDL_UnlockAudio()

Preferably never. I would recommend using some sort of lock-free
communication between the main thread and the audio callback context
instead, to avoid increasing the callback “scheduling” jitter.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Wednesday 21 April 2004 11.56, redman wrote:

Neil_Bradley · April 21, 2004, 9:54am

i do some calcs in the callback function but at a certain point
(ammount of calcs) the buffer
starts skipping., but i’m SURE i have loads of cycles left to do it
(occasionally) hundreds of ms late, wasting the cycles you were going
to use for processing, or even causing audio drop-outs before you get
a chance to do anything at all. Not much you can do about that, short
of switching to Linux/lowlatency or BeOS. :-/

It’s not any better under Linux from my experience. I have an emulator
that uses SDL as its audio/video layer, doing audio chip emulation, and I
get dropouts under Linux occasionally (from machine to machine) and never
do under any Windows boxen, so your mileage may vary. Never tried BeOS,
though. I I’m running with buffer sizes around 1/30th of a second.

period. That way, you can use more CPU power for audio with less
drop-outs, but on an OS like Windows,

Or Linux (haven’t yet tried OSX)… neither of which are real time OSes.

there is no way ever you’re
going to get anywhere near 100%, and/or totally eliminate the
drop-outs when doing low latency audio.

Unless you switch to an ASIO interface, for example. I’m running with
about 2.3ms of latency in my studio, but that’s a completely different
animal than a video game (using MOTU gear).

(I think you can get pretty
close with Win2k or XP, a decent proffesional sound card and running
your audio code in kernel space as Kernel Stream filters, but that’s
insane for “consumer” multimedia such as games.)

Agreed.

so i have basically two questions
1 where should i generate the data that would go into the buffer
In the callback. (Otherwise, you’ll need extra buffering and thread
safe communication between the audio engine and the callback. That
can make sense in some cases, but definitely not when you want
interactive low latency sound effects.)

Agreed 100%. The callback is where you want to render the audio. Make sure
your buffering is set up so that it’s small enough to stay in reasonable
sync with your gameplay, but large enough so you don’t incurr too much
callback overhead or fall outside the nonrealtimeness of modern day oses.
Some experimentation will be required.

2 when should i call SDL_LockAudio() / SDL_UnlockAudio()
Preferably never. I would recommend using some sort of lock-free
communication between the main thread and the audio callback context
instead, to avoid increasing the callback “scheduling” jitter.

In my emulator, I created timestamping code that would translate emulated
change times->sound card based time, and applied them at a per sample
basis. In most cases it wasn’t necessary (just syncing on a per frame
basis was OK), but several games I had to do sample accuracy to get
consistent sound - especially when envelopes, filter points, and pitch
changes occurred.

–>Neil-------------------------------------------------------------------------------
Neil Bradley “Your mistletoe is no match for my T.O.W. missile!”
Synthcom Systems, Inc. - Santabot - Futurama
ICQ #29402898

redman · April 21, 2004, 11:51am

David Olofson wrote:

This is because you’re not on a real time operating system… There is
a hard deadline for delivering the audio buffer (ie returning from
the callback),

aah, i didnt know that actually returning from the callback is
important.,., i tought that the buffer contents
would be played even if i didn’t return in time.,

and if you miss it, audio skips. Meanwhile, a standard
OS scheduler and some background system load will cause the audio
callback to start anywhere from a few ?s through tens or
(occasionally) hundreds of ms late, wasting the cycles you were going
to use for processing, or even causing audio drop-outs before you get
a chance to do anything at all. Not much you can do about that, short
of switching to Linux/lowlatency or BeOS. :-/

i’m pretty sure it’s not due to scheduling jitter since the dropouts are
regular (it plays once and drops
the second time, then plays again, etc)., with any kind of jitter i
would expect some sort of fluctuation
in the dropouts,.,
also, my 2.8GHz pc is doing very little at the moment so the scheduling
queue should be pretty clean.,

what seems to be the problem is that sdl doesnt seem to tolerate hanging
around in the callback function
too much., .,., to my knowledge there is no reason why you couldn’t do
that unless the callback is called
very shortly before the actual playing buffer (somewhere in the guts of
sdl) runs out of data.,

i know that in directx sound you can define multiple points in the
buffer where the callback will be called so
you can make sure that the buffer is filled in time.,
but with sdl i’ve got no clue how much data there is still left in the
playig buffer (and thus how long you can
hang around ) when the callback is called,.

anyone got this info?,. is it changable?., without recompiling the libs?

the resulting problem for me is that since i cannot seem to put too much
in the callback i need another buffer
which should be filled by something that is polling wether the callback
has been called.,
…, and i have never written multitasked stuff.,

What you can do is increase the buffering (latency) so that the
scheduling jitter accounts for a smaller part of the audio buffer
period.

changing the buffer size unfortunately does not influence the skipping.,.,
the time/cycles you have to pump data into the buffer seems constant.,

That way, you can use more CPU power for audio with less
drop-outs, but on an OS like Windows, there is no way ever you’re
going to get anywhere near 100%, and/or totally eliminate the
drop-outs when doing low latency audio. (I think you can get pretty
close with Win2k or XP, a decent proffesional sound card and running
your audio code in kernel space as Kernel Stream filters, but that’s
insane for “consumer” multimedia such as games.)

latency is of no issue to me since i’m using sdl as a test platform for
developing a vst plugin,.,

since there are several software sequencers that handle multiple streams
of audio to multiple buffers/
channels that do a lot more calcs than i’m trying to do i figure that it
shouldnt be too hard to achive
both low latency and use most of my cpu time for stuff…,
i dont see the problem., unless, again, the callback stuff is set up to
work just in time.,…,

[…]

so i have basically two questions
1 where should i generate the data that would go into the buffer

In the callback. (Otherwise, you’ll need extra buffering and thread
safe communication between the audio engine and the callback. That
can make sense in some cases, but definitely not when you want
interactive low latency sound effects.)

multithreading seems like the only solution at the moment., .,., damn.,.,

2 when should i call SDL_LockAudio() / SDL_UnlockAudio()

Preferably never.

hehe.,., interesting.,

I would recommend using some sort of lock-free
communication between the main thread and the audio callback context
instead, to avoid increasing the callback “scheduling” jitter.

this is the problem, i’m not sure how to set up the communication
between the callback (which needs to tell
me that it finished filling it’s buffer) and another thread or something
that will fill its own buffer to be used
at the next callback.,.,

anyway., thnx for the help., it triggered some ideas which i will need
to work out a bit.,
in the mean time i would like to call/CRY out for some more
documentation !! :))
.,

greets,
aka.,

David_Olofson · April 21, 2004, 1:46pm

David Olofson wrote:

This is because you’re not on a real time operating system…
There is a hard deadline for delivering the audio buffer (ie
returning from the callback),

aah, i didnt know that actually returning from the callback is
important.,., i tought that the buffer contents
would be played even if i didn’t return in time.,

Nope - though it would definitely be possible to have some backends
work that way, provided you use a sample format that’s supported by
the hardware. Many games achieve very low latencies without
dangerously small buffers by directly mixing into the DMA buffer. New
sound FX can be started right in front of the DMA pointer, and then
mixed ahead to build up to a more reliable amount of buffering.

and if you miss it, audio skips. Meanwhile, a standard
OS scheduler and some background system load will cause the audio
callback to start anywhere from a few ?s through tens or
(occasionally) hundreds of ms late, wasting the cycles you were
going to use for processing, or even causing audio drop-outs
before you get a chance to do anything at all. Not much you can
do about that, short of switching to Linux/lowlatency or BeOS.
:-/

i’m pretty sure it’s not due to scheduling jitter since the
dropouts are regular (it plays once and drops
the second time, then plays again, etc)., with any kind of jitter
i would expect some sort of fluctuation
in the dropouts,.,

Yeah, like random crackling, or the occasional irregular drop-out
every now and then…

also, my 2.8GHz pc is doing very little at the moment so the
scheduling queue should be pretty clean.,

what seems to be the problem is that sdl doesnt seem to tolerate
hanging around in the callback function
too much., .,., to my knowledge there is no reason why you couldn’t
do that unless the callback is called
very shortly before the actual playing buffer (somewhere in the
guts of sdl) runs out of data.,

Actually, it’s not only about that. SDL may have to convert your audio
data, and if it doesn’t, the buffer you’re writing into doesn’t have
to be the DMA buffer. (Some drivers and some sound APIs don’t support
direct access to the DMA buffer at all, and there may still be SDL
backends that don’t use that feature even if it’s available.) So,
it’s quite likely that nothing at all happens to the DMA buffer until
you return from the callback.

i know that in directx sound you can define multiple points in the
buffer where the callback will be called so
you can make sure that the buffer is filled in time.,
but with sdl i’ve got no clue how much data there is still left in
the playig buffer (and thus how long you can
hang around ) when the callback is called,.

AFAIK, SDL should behave pretty much as if you were doing this:

while(playing)
{
	as->callback(as->userdata, a->buf, a->bufsize);
	a->convert_some(a);
	write(a->outfd, a->outbuf, a->outsize);
}

in a dedicated audio thread on a Un*x system with OSS or similar. What
happens is the sound card driver blocks in write() whenever the whole
buffer doesn’t fit between the last written byte and the current play
position in the DMA buffer. write() gets on with the job and returns
as soon as there’s enough room for the data.

“Polling” is usually done in the sound card ISR. In the OSS API, the
“fragment” size determines how frequently IRQs occur. SDL tries to
get two “fragments” of the size indicated by ‘samples’ in the
AudioSpec, so there’s usually (at least) one IRQ every ‘samples’
sample frame. What we get is this:

* The audio callback is called when the previous buffer
  has just been queued. Since the DMA buffer accepts only
  two buffers' worth of data, this happens right after
  the second last buffer has been played, and one buffer
  remains queued for output. That is, when the callback
  is entered, there are two buffers between you and the
  DAC.

* The callback's return deadline is (theoretically) at
  most the duration of two buffers ahead. At that point,
  the sound cards runs out of data, so we have to be in
  the driver's write() callback before then.

anyone got this info?,.

Read The Source.

is it changable?., without recompiling the
libs?

Setting the ‘samples’ field of the SDL_AudioSpec you pass to
SDL_OpenAudio() to a sensible value should do the trick.

(And if it doesn’t, Use The Source, Luke.

the resulting problem for me is that since i cannot seem to put too
much in the callback i need another buffer
which should be filled by something that is polling wether the
callback has been called.,
…, and i have never written multitasked stuff.,

If you really can’t do all you need in the callback, you’re either
doing it the wrong way, or you simply have more processing to do than
your CPU can handle. (The latter is rather unlikely, unless you have
high quality reverb effects and stuff.

As to the former, what kind of processing are you doing? I suspect
that you do stuff in ways that makes the CPU load vary a lot between
calls.

Uncompressing a frame of compressed audio every now and then would be
a typical example. If you can’t or won’t redesign your code to do the
job incrementally, you need to add some buffering (increase the
latency) and do the occasional heavy job in another thread.

[…]

That way, you can use more CPU power for audio with less
drop-outs, but on an OS like Windows, there is no way ever you’re
going to get anywhere near 100%, and/or totally eliminate the
drop-outs when doing low latency audio. (I think you can get
pretty close with Win2k or XP, a decent proffesional sound card
and running your audio code in kernel space as Kernel Stream
filters, but that’s insane for “consumer” multimedia such as
games.)

latency is of no issue to me since i’m using sdl as a test platform
for developing a vst plugin,.,

Ok. (Of course, VST plugins play by the exact same rules in this
regard. Keep those process*() execution times as constant as
possible…)

since there are several software sequencers that handle multiple
streams of audio to multiple buffers/
channels that do a lot more calcs than i’m trying to do i figure
that it shouldnt be too hard to achive
both low latency and use most of my cpu time for stuff…,
i dont see the problem., unless, again, the callback stuff is set
up to work just in time.,…,

Well, there’s no other way to do it… Both VST and SDL audio (and DX,
CoreAudio, JACK, ASIO, EASI, PortAudio and various other APIs) are
based on cooperative multitasking, using callbacks to drive the
“units”. A callback keeps the CPU until it returns, stalling the
whole host/driver/whatever. The only way to get things to run
smoothly at high CPU loads is to ensure that all units (plugins,
clients etc…) consume the same amount of CPU time for every buffer,
as far as possible.

(Actually, this is true for thread + IPC based plugin systems as well.
They just use a different plugin API, at the expense of making low
latency operation practically impossible without an RTOS, and very
hard to do reliably even with an RTOS. That’s why all major standards
use the callback model in one form or another.)

[…]

so i have basically two questions
1 where should i generate the data that would go into the buffer

In the callback. (Otherwise, you’ll need extra buffering and
thread safe communication between the audio engine and the
callback. That can make sense in some cases, but definitely not
when you want interactive low latency sound effects.)

multithreading seems like the only solution at the moment., .,.,
damn.,.,

Well, if you can add substantial buffering between the offending code
and the audio callback, there’s no major problem. Depending on what
you’re doing, it may well be the easiest way to get the job done
reliably.

Doing it in a VST plugin could be more problematic, though… Not sure
what your average host thinks about you creating your own worker
threads, but I think it’s doable in most cases.

[…]

I would recommend using some sort of lock-free
communication between the main thread and the audio callback
context instead, to avoid increasing the callback “scheduling”
jitter.

this is the problem, i’m not sure how to set up the communication
between the callback (which needs to tell
me that it finished filling it’s buffer) and another thread or
something that will fill its own buffer to be used
at the next callback.,.,

Actually, that would be the wrong way to do it. The whole point is to
make it asynchronous, allowing multiple buffers to be on the way from
the worker thread to the callback. That’s what allows the worker
thread to generate a few buffers at a time, every now and then, so
you can avoid breaking up and “smoothing” your DSP code.

Anyway, try a lock-free ring of buffers (FIFO) between the thread and
the callback.

I have a general purpose, portable, single reader-single writer,
lock-free FIFO over at Mixed Downloads - sfifo. It’s
using a simple read()/write() style API (that is, copying), but you
could use it for passing pointers to buffers, buffer indices or
something, or just implement your own buffer ring, using the same
“safe order” approach to updating read and write indices. (Very
simple. Just make sure the compiler doesn’t reorder stuff too much
when optimizing…)

anyway., thnx for the help., it triggered some ideas which i will
need to work out a bit.,
in the mean time i would like to call/CRY out for some more
documentation !! :))

Well, there isn’t much to document, really. (Except perhaps for
clarifying the exact relation between the AudioSpec ‘samples’ field
and the resulting latency - if you’re actually supposed to expect any
specific relation at all.)

This is standard real time audio stuff, and although there isn’t much
documentation on this specifically (you’d have to read up on DSP,
general RT programing and stuff instead), it’s rather intuitive once
you grasp the fundamental concepts.

The hard part is probably to let go of the “best average performance”
way of thinking that applies to programming in general. (That’s what
most well known algorithms are based on.) In RT programming, it’s all
about worst case performance; it has to be good enough to make the
deadlines at all times.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Wednesday 21 April 2004 20.47, redman wrote:

David_Olofson · April 21, 2004, 2:12pm

i do some calcs in the callback function but at a certain point
(ammount of calcs) the buffer
starts skipping., but i’m SURE i have loads of cycles left to
do it

(occasionally) hundreds of ms late, wasting the cycles you were
going to use for processing, or even causing audio drop-outs
before you get a chance to do anything at all. Not much you can
do about that, short of switching to Linux/lowlatency or BeOS.
:-/

It’s not any better under Linux from my experience.

No, without a recent (2.4+) kernel and/or lowlatency patches, it can
actually be worse than Windows, at least in terms of worst case
scheduling latency. (Average latency is pretty much irrelevant to RT
applications. It’s usually in the ?s range regardless of OS anyway.)

I have an
emulator that uses SDL as its audio/video layer, doing audio chip
emulation, and I get dropouts under Linux occasionally (from
machine to machine) and never do under any Windows boxen, so your
mileage may vary.

The problem with emulators is that they generally emulate everything
frame by frame, including audio. The result is that if you get a low
frame rate (which you usually do on Linux, unless you use OpenGL,
DirectFB or DGA, and/or very low resolutions), you have to use more
audio buffering to avoid drop-outs.

BTW, Audiality (the sound engine used in Kobo Deluxe) happily runs
with less than 10 ms latency even on standard 2.4 Linux kernels,
whereas I can’t seem to get it to work at all with less than ~70 ms
latency on Win32. I’m still not totally sure why that is, though;
could be an Audiality/Win32 specific problem.

Never tried BeOS, though.

It’s claimed to be capable of around 3 ms latency (on par with
Linux/lowlatency), though I’ve been told you have to bypass the
“official” audio API to get there.

I I’m running with
buffer sizes around 1/30th of a second.

Around 33 ms… Should work on a good system with some luck, but don’t
count on it. I suggest you make that user configurable, just in case,
and to allow “power users” to tweak things if they like.

period. That way, you can use more CPU power for audio with less
drop-outs, but on an OS like Windows,

Or Linux (haven’t yet tried OSX)… neither of which are real time
OSes.

Right, although Linux/lowlatency is very different from standard
Linux kernels in this regard. For multimedia applications, it can be
considered “extremely firm” real time for all practical matters, or
even hard RT on a properly tuned system.

Last thing I heard about OSX is that it has low latency potential, but
can’t quite handle the “magic” 3 ms yet for some reason. This may
well have been fixed by now.

> > there is no way ever you're > > going to get anywhere near 100%, and/or totally eliminate the > > drop-outs when doing low latency audio. > > Unless you switch to an ASIO interface, for example. I'm running > with about 2.3ms of latency in my studio, but that's a completely > different animal than a video game (using MOTU gear).

Well, ASIO avoids some of the brokenness of the Win32 kernel, and the
drivers for studio audio interfaces are generally of much higher
quality than consumer stuff. It doesn’t make Windows a hard real time
OS, but it gives you a much better chance of getting high quality
“firm real time”.

Meanwhile, Linux/lowlatency uses the standard (OSS/Free or ALSA)
drivers, and doesn’t care much what kind of h/w you use. Old consumer
ISA cards perform as well as high end audio interfaces WRT latency,
and in fact, consumer cards often have the benefit of having fewer
restrictions on DMA buffer count and size.

That said, while it’s definitely possible to set up a Linux/lowlatency
system that happily runs at 80% DSP load with 2.1 ms latentcy for
countless hours, while stressing all subsystems so hard you can
hardly access the computer, you do have to do some tuning - like
making sure DMA is enabled for all hard drives. And of course, no OS
on the planet is immune to broken drivers, broken RT applications,
broken hardware, crappy “super-NMI” BIOS power management and things
like that.

BTW, how much stress (background processes, video, disk, network etc)
can your system tolerate before you start getting drop-outs?

[…]

so i have basically two questions
1 where should i generate the data that would go into the
buffer

In the callback. (Otherwise, you’ll need extra buffering and
thread safe communication between the audio engine and the
callback. That can make sense in some cases, but definitely not
when you want interactive low latency sound effects.)

Agreed 100%. The callback is where you want to render the audio.
Make sure your buffering is set up so that it’s small enough to
stay in reasonable sync with your gameplay, but large enough so you
don’t incurr too much callback overhead or fall outside the
nonrealtimeness of modern day oses. Some experimentation will be
required.

Actually, experimentation doesn’t cut it. You still have to make the
buffer size user configurable, unless you set it very high.
Experimental tuning is especially useless if you do it on machines
with great low latency performance. (Doesn’t have to mean high
performance machines, BTW. Linux/lowlatency does <3 ms even on
Pentium CPUs, for example.)

2 when should i call SDL_LockAudio() / SDL_UnlockAudio()

Preferably never. I would recommend using some sort of lock-free
communication between the main thread and the audio callback
context instead, to avoid increasing the callback “scheduling”
jitter.

In my emulator, I created timestamping code that would translate
emulated change times->sound card based time, and applied them at a
per sample basis. In most cases it wasn’t necessary (just syncing
on a per frame basis was OK), but several games I had to do sample
accuracy to get consistent sound - especially when envelopes,
filter points, and pitch changes occurred.

Yeah, I added that to Kobo Deluxe as well recently. It helps a lot,
especially when you have to use a lot of buffering. Constant latency
is a lot less audible than the “random” latencies introduced by
quantizing to the buffer size.

In games that (ab)use sounds as grains for game logic driven granular
synthesis, timestamping is pretty much required for useful results,
unless you’re developing for specific hardware.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Wednesday 21 April 2004 18.53, Neil Bradley wrote:

Neil_Bradley · April 21, 2004, 3:16pm

emulation, and I get dropouts under Linux occasionally (from
machine to machine) and never do under any Windows boxen, so your
mileage may vary.
The problem with emulators is that they generally emulate everything
frame by frame, including audio.

Some emulators do, mine doesn’t. I treat the audio stream as a separate
“sound device”, mapping linear time with emulated time and applying
changes as writes to the virtual sound chip occur. It just happily runs
itself in the background receiving events as if the chips were attached to
the hardware, and I never get dropouts or odd problems.

BTW, Audiality (the sound engine used in Kobo Deluxe) happily runs
with less than 10 ms latency even on standard 2.4 Linux kernels,
whereas I can’t seem to get it to work at all with less than ~70 ms
latency on Win32. I’m still not totally sure why that is, though;
could be an Audiality/Win32 specific problem.

Are you using SDL or talking to the DirectSound APIs directly? I’ve
noticed that changing SDL’s thread priority for audio under Windows yields
significantly better responsiveness.

I I’m running with
buffer sizes around 1/30th of a second.
Around 33 ms… Should work on a good system with some luck, but don’t
count on it. I suggest you make that user configurable, just in case,
and to allow “power users” to tweak things if they like.

It has worked on a few hundred systems with no problems, so I’m good to
go.

Or Linux (haven’t yet tried OSX)… neither of which are real time
OSes.
Right, although Linux/lowlatency is very different from standard
Linux kernels in this regard. For multimedia applications, it can be
considered “extremely firm” real time for all practical matters, or
even hard RT on a properly tuned system.

Is it a userland or kernel mode callback?

BTW, how much stress (background processes, video, disk, network etc)
can your system tolerate before you start getting drop-outs?

I’ve loaded it up to 100% with no loss or dropouts/starvation (worst case
was a DVD encode which pushes the CPU utilization to the roof and makes
the system otherwise unresponsive).

Agreed 100%. The callback is where you want to render the audio.
Make sure your buffering is set up so that it’s small enough to
stay in reasonable sync with your gameplay, but large enough so you
don’t incurr too much callback overhead or fall outside the
nonrealtimeness of modern day oses. Some experimentation will be
required.
Actually, experimentation doesn’t cut it. You still have to make the
buffer size user configurable, unless you set it very high.

Most people don’t know what “audio buffer size” even means, let alone how
to set it. And I’m not sure what you mean by “very high”, but 1/30th of a
second has always worked for me.

In games that (ab)use sounds as grains for game logic driven granular
synthesis, timestamping is pretty much required for useful results,
unless you’re developing for specific hardware.

Case in point - there are some subtle nuances in the sound engine used in
the Midway Pacman series of games. You can update them on a per frame
basis, as that’s when the code updates them, but there are some subtle
phase shift changes that occur due to the code not writing to the device
completely synchronously that won’t properly be emulated without doing so.
Games that use samples (like Space Invaders) it doesn’t matter - just fire
the sound whenever and it’ll work OK. But there are other games that use
the YM2151 for sample, which require extreme sample based emulation to
sound even remotely correct. Yuck. ;-(

Too bad we don’t have an “ISR level” equivalent in the “real OS” world
like we did in DOS land…

–>Neil-------------------------------------------------------------------------------
Neil Bradley “Your mistletoe is no match for my T.O.W. missile!”
Synthcom Systems, Inc. - Santabot - Futurama
ICQ #29402898

Glenn_Maynard · April 21, 2004, 3:53pm

No, without a recent (2.4+) kernel and/or lowlatency patches, it can
actually be worse than Windows, at least in terms of worst case
scheduling latency. (Average latency is pretty much irrelevant to RT
applications. It’s usually in the ?s range regardless of OS anyway.)

Huh? Average scheduling latency in Linux 2.4 is at least 10ms. If
you set SCHED_RR, usleep() will give you smaller sleeps than that, but
it’s just busy looping–it isn’t actually giving up any CPU (so you
can’t actually do this constantly, or you’ll hang the system).

Are you talking about something else when you say “scheduling latency”?

BTW, Audiality (the sound engine used in Kobo Deluxe) happily runs
with less than 10 ms latency even on standard 2.4 Linux kernels,
whereas I can’t seem to get it to work at all with less than ~70 ms
latency on Win32. I’m still not totally sure why that is, though;
could be an Audiality/Win32 specific problem.

The “prefetch size” (that is, the amount DirectSound tells you to write
ahead in the buffer: the distance between the play and write cursors)
is typically very large. I think the only way to get around this right
now is to use ASIO, or to use kernel streaming, which is 2k and XP-
specific, documented as “probably won’t work in future releases”, and
prevents other apps from playing sounds.

I think this is a limitation of DirectSound, and not something that
drivers can fix.On Wed, Apr 21, 2004 at 11:11:08PM +0200, David Olofson wrote:

–
Glenn Maynard

David_Olofson · April 21, 2004, 4:34pm

emulation, and I get dropouts under Linux occasionally (from
machine to machine) and never do under any Windows boxen, so
your mileage may vary.

The problem with emulators is that they generally emulate
everything frame by frame, including audio.

Some emulators do, mine doesn’t. I treat the audio stream as a
separate “sound device”, mapping linear time with emulated time and
applying changes as writes to the virtual sound chip occur. It just
happily runs itself in the background receiving events as if the
chips were attached to the hardware, and I never get dropouts or
odd problems.

That’s the way to do it in most cases, I think. Machines where the CPU
reads stuff back from the sound chip(s) (such as the C64, which can
(ab)use the SID for random number generation, among other things)
could be problematic, though…

BTW, Audiality (the sound engine used in Kobo Deluxe) happily
runs with less than 10 ms latency even on standard 2.4 Linux
kernels, whereas I can’t seem to get it to work at all with less
than ~70 ms latency on Win32. I’m still not totally sure why that
is, though; could be an Audiality/Win32 specific problem.

Are you using SDL or talking to the DirectSound APIs directly?

As I’m doing all development on Linux these days, with occasional
crosscompiling and testing on Win32, the only APIs supported at this
point are SDL, OSS and ALSA.

I’ve
noticed that changing SDL’s thread priority for audio under Windows
yields significantly better responsiveness.

Slightly odd, actually… Do you have other CPU hungry threads that
block properly and frequently? (The video stuff perhaps, if your
drivers implement retracy sync the right way.)

If the main loop just plain hogs the CPU constantly (as it does if you
run full speed with no retrace sync), any serious scheduler should
lower it’s dynamic priority, which automatically gets it out of the
way whenever the audio thread is ready to run.

At least, this works very reliably on Linux, so messing with thread
priorities is just pointless - unless you go all the way and switch
to real time scheduling. (RT scheduling on Linux totally bypasses the
timesharing rules. The CPU is yours until you explicitly let it go.)

I I’m running with
buffer sizes around 1/30th of a second.

Around 33 ms… Should work on a good system with some luck, but
don’t count on it. I suggest you make that user configurable,
just in case, and to allow “power users” to tweak things if they
like.

It has worked on a few hundred systems with no problems, so I’m
good to go.

BTW, what are your device parameters, exactly?

Or Linux (haven’t yet tried OSX)… neither of which are real
time OSes.

Right, although Linux/lowlatency is very different from
standard Linux kernels in this regard. For multimedia
applications, it can be considered “extremely firm” real time for
all practical matters, or even hard RT on a properly tuned
system.

Is it a userland or kernel mode callback?

No callbacks at all. It’s all done in user space, using the usual
blocking file I/O API. (With a wrapper library, in the case of ALSA.)
What you do is essentially take a plain application, lock the memory
and switch the audio thread to RT scheduling.

Unfortunately, you currently need to be root to do memory locking,
since it’s not covered by the capabilities system for some reason.
That can be fixed though, and either way, it doesn’t screw up your
application code, as opposed to running things in kernel space.

BTW, how much stress (background processes, video, disk, network
etc) can your system tolerate before you start getting drop-outs?

I’ve loaded it up to 100% with no loss or dropouts/starvation
(worst case was a DVD encode which pushes the CPU utilization to
the roof and makes the system otherwise unresponsive).

Sounds pretty solid.

Agreed 100%. The callback is where you want to render the
audio. Make sure your buffering is set up so that it’s small
enough to stay in reasonable sync with your gameplay, but large
enough so you don’t incurr too much callback overhead or fall
outside the nonrealtimeness of modern day oses. Some
experimentation will be required.

Actually, experimentation doesn’t cut it. You still have to make
the buffer size user configurable, unless you set it very high.

Most people don’t know what “audio buffer size” even means, let
alone how to set it. And I’m not sure what you mean by “very high”,
but 1/30th of a second has always worked for me.

“Very high” would be well over 50 ms, but I don’t think you should
need that on any system if everything’s working properly. It depends
a bit on the game and the sounds, though.

[…]

Too bad we don’t have an “ISR level” equivalent in the “real OS”
world like we did in DOS land…

Well, that would be Kernel Streams and similar “hacks”, I guess…
Still, that alone doesn’t solve the problem, since there’s still tons
of stuff (ISRs of other drivers, among other things) that can get in
the way.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Thursday 22 April 2004 00.15, Neil Bradley wrote:

redman · April 21, 2004, 4:54pm

David Olofson wrote:

Nope - though it would definitely be possible to have some backends
work that way, provided you use a sample format that’s supported by
the hardware. Many games achieve very low latencies without
dangerously small buffers by directly mixing into the DMA buffer. New
sound FX can be started right in front of the DMA pointer, and then
mixed ahead to build up to a more reliable amount of buffering.

ok.,

Yeah, like random crackling, or the occasional irregular drop-out
every now and then…

exactly.,

also, my 2.8GHz pc is doing very little at the moment so the
scheduling queue should be pretty clean.,

what seems to be the problem is that sdl doesnt seem to tolerate
hanging around in the callback function
too much., .,., to my knowledge there is no reason why you couldn’t
do that unless the callback is called
very shortly before the actual playing buffer (somewhere in the
guts of sdl) runs out of data.,

Actually, it’s not only about that. SDL may have to convert your audio
data, and if it doesn’t, the buffer you’re writing into doesn’t have
to be the DMA buffer. (Some drivers and some sound APIs don’t support
direct access to the DMA buffer at all, and there may still be SDL
backends that don’t use that feature even if it’s available.) So,
it’s quite likely that nothing at all happens to the DMA buffer until
you return from the callback.

hmm.,., that would make sense (dma stall) and would indeed lead to the
skipping of a buffer if there
is too much time spent in the callback function.,

how i imagine it working (bar any conversions and stuff) is that sdl
basically keeps track of two buffers
(not sure if they are directly in dma or a wrap around directsound.,)
while one is being played (buff1) the other one needs to be filled with
data (buff2),

so at some point during the playing of buff1 there is a call to the
callback function.,
but i susspect that this call doesnt take place immediately but near the
end of buff1.
it could be that there is a communication lag down the lane that
prevents sdl to detect when the soundcard
is actually playing buff1 and thus not trigger the callback in time, i’m
not sure.,

i AM pretty sure that my calculations are not too heavy to be done in
time because in the main
thread i’m doing a lot of other stuff which i time and that takes a
minimal hit when i increase the
calcs in the callback function,., the main thread slows by about 5% or
so., nothing to worry about.,

what i’m basically doing is additive synthesis in the callback function.,
adding 100+ sine waves shouldnt be so hard, especially since sin is
hardwired into the math core.,

anyway., it seems that in the callback function i have just about enough
time to fill the buffer
that is hanging behind the pointer so maybe the sdl design actually
wants me to only fill the buffer
in the callback function., not sure either.,

if you calculate the ammount of data needed to be generated against the
cpu speed there should be
plenty of time to do all kinds of fancy stuff.,
i need to write 44100 samples per second into the buffer., that’s ~44Khz
44MHz is about a thousand times more
2.8GHz is another 66 times more cycles …,
so, theoretically i should have about 66000 cycles for every sample i
want to write,.,(bar any overhead)
that’s a lot.,., way more than i need.,
also, when i slow my (screen) redraw the audio skipping stays put.,.,
this also indicates this is not a speed
issue.,

so at the moment i have two suspects that could cause this.,
one is lag of the playback trigger comming from the layer beyond sdl (be
it directsound or the normal sound
card drivers)
the other is that sdl calls the callback just in time for us to fill the
buffer.,
and this last one could be coupled to a locking mechanism (as you
described for the dma) beyond sdl.,

so what i would like to do is to start calculating my data for the next
callback right after callback finishes
., which sdl doesnt provide for in the sense that i cannot call a
function and return from the callback at the
same time.,

that’s why i was thinking about having a separate thread poll for some
memory location that i could set
at the end of the callback.,
then i could start calculating stuff as soon as the scheduler hits that
thread and my problem should be over.,

but then again, how do i make a thread., (rtfm, luke) .,

the point is that i realy dont want to dig this deep into sdl since the
whole point of me using sdl was to have
a wrapper around directx and actually get some work done.,
i got realy realy scared of directx after seing that putting one pixel
on the screen would take as much code as
my whole project is at the moment (minus the sdl libs) and that includes
both sound and screen stuff.,

., boohoohoo.,., why are the designers not nice to me.,.,.,., mommy!!!.,

anyway.,., i’ll try some stuff tomorow.,., see if i can realy localize
this ‘feature’.,

i know that in directx sound you can define multiple points in the
buffer where the callback will be called so
you can make sure that the buffer is filled in time.,
but with sdl i’ve got no clue how much data there is still left in
the playig buffer (and thus how long you can
hang around ) when the callback is called,.

AFAIK, SDL should behave pretty much as if you were doing this:

while(playing)
{
as->callback(as->userdata, a->buf, a->bufsize);
a->convert_some(a);
write(a->outfd, a->outbuf, a->outsize);
}

if this was the case than i would prolly not have a problem :).

in a dedicated audio thread on a Un*x system with OSS or similar. What
happens is the sound card driver blocks in write() whenever the whole
buffer doesn’t fit between the last written byte and the current play
position in the DMA buffer. write() gets on with the job and returns
as soon as there’s enough room for the data.

in reality on win32 there are propably going to be several layers of
buffers between the sdl callback and
the soundcard., sdl buffer, directx buffer, another directx buffer ,
dma buffer.,

since thay all need to satisfy each other the spell could break in any
one of these, prolly not the dma one
since its wrapped by the drivers and they didnt give me shit with other
apps., it could be a bad setup of
the directsound buffers or the structure of the sdl buffers and how
callback is handled.,

“Polling” is usually done in the sound card ISR.

i wasnt refering to hardware polling.,
what i ment was a software polling.,., ie. one thread checking the
contents of a shared memory location
to see if it’s set to a value and if so do some stuff, reset the memory
and start polling again,.,
this should give me enough headroom to generate the data i need in time
for the next callback.,

In the OSS API, the
“fragment” size determines how frequently IRQs occur. SDL tries to
get two “fragments” of the size indicated by ‘samples’ in the
AudioSpec, so there’s usually (at least) one IRQ every ‘samples’
sample frame. What we get is this:

The audio callback is called when the previous buffer
has just been queued. Since the DMA buffer accepts only
two buffers’ worth of data, this happens right after
the second last buffer has been played, and one buffer
remains queued for output. That is, when the callback
is entered, there are two buffers between you and the
DAC.

The callback’s return deadline is (theoretically) at
most the duration of two buffers ahead. At that point,
the sound cards runs out of data, so we have to be in
the driver’s write() callback before then.

yes, yes, i know this .,

anyone got this info?,.

Read The Source.

hmm., let me sleep on this one for a night before i start diggin.,

is it changable?., without recompiling the
libs?

Setting the ‘samples’ field of the SDL_AudioSpec you pass to
SDL_OpenAudio() to a sensible value should do the trick.

this doesnt work (no real effect when changing buffer size except that
it gets longer or shorter).,
skipping maintains and seems only influenced by the ammout of work done
in the callback.
(which makes sense to me) (somehow)

(And if it doesn’t, Use The Source, Luke.

i could use a SDLightsable at the moment.,.,

the resulting problem for me is that since i cannot seem to put too
much in the callback i need another buffer
which should be filled by something that is polling wether the
callback has been called.,
…, and i have never written multitasked stuff.,

If you really can’t do all you need in the callback, you’re either
doing it the wrong way, or you simply have more processing to do than
your CPU can handle. (The latter is rather unlikely, unless you have
high quality reverb effects and stuff.

yeah ., .,cpu is unlikely…, so that’s why i need more docs about how
the writers of sdl intended it
to be used., the examples are rather minimalistic …,., i was happy to
get any sound through it

As to the former, what kind of processing are you doing? I suspect
that you do stuff in ways that makes the CPU load vary a lot between
calls.

basically adding a lot of sine waves together., nothing realy cpu
intensive.,
it works untill about 150 adds per sample.,then it starts
skipping.,.,., buffer size doesnt change this
since the calcs are done for every sample that needs to be shoved into
the buffer,

Uncompressing a frame of compressed audio every now and then would be
a typical example. If you can’t or won’t redesign your code to do the
job incrementally, you need to add some buffering (increase the
latency) and do the occasional heavy job in another thread.

threading and adding a layer of buffering was exactly my plan

Ok. (Of course, VST plugins play by the exact same rules in this
regard. Keep those process*() execution times as constant as
possible…)

afaik vst hosts give you a good deal of cpu time .,., otherwise most
plugins wouldnt work at all.,

since there are several software sequencers that handle multiple
streams of audio to multiple buffers/
channels that do a lot more calcs than i’m trying to do i figure
that it shouldnt be too hard to achive
both low latency and use most of my cpu time for stuff…,
i dont see the problem., unless, again, the callback stuff is set
up to work just in time.,…,

Well, there’s no other way to do it… Both VST and SDL audio (and DX,
CoreAudio, JACK, ASIO, EASI, PortAudio and various other APIs) are
based on cooperative multitasking, using callbacks to drive the
“units”.

yeas, i realise this.,., that’s why i figure that i’m doing stuff in the
wrong place.,
it seems tha tthe callback doesnt expect you to do a lot of stuff right
there .,.,

A callback keeps the CPU until it returns, stalling the
whole host/driver/whatever. The only way to get things to run
smoothly at high CPU loads is to ensure that all units (plugins,
clients etc…) consume the same amount of CPU time for every buffer,
as far as possible.

this is totaly the case here.,., actually, i’m doing identical calcs for
every sample.,

(Actually, this is true for thread + IPC based plugin systems as well.
They just use a different plugin API, at the expense of making low
latency operation practically impossible without an RTOS, and very
hard to do reliably even with an RTOS. That’s why all major standards
use the callback model in one form or another.)

it seems the best way to work with buffers, i agree,

[…]

so i have basically two questions
1 where should i generate the data that would go into the buffer

In the callback. (Otherwise, you’ll need extra buffering and
thread safe communication between the audio engine and the
callback. That can make sense in some cases, but definitely not
when you want interactive low latency sound effects.)

multithreading seems like the only solution at the moment., .,.,
damn.,.,

Well, if you can add substantial buffering between the offending code
and the audio callback, there’s no major problem. Depending on what
you’re doing, it may well be the easiest way to get the job done
reliably.

now i need to find out what they want from me to make threads.,prolly
gonna need other stuff as well like threads
sharing memory with other threads., or can i just use globals??

Doing it in a VST plugin could be more problematic, though… Not sure
what your average host thinks about you creating your own worker
threads, but I think it’s doable in most cases.

it should work.,. but i’m not planning to do it threaded in the vst
version.,. the host should provide the necesary
cpu time at calling…,

[…]

I would recommend using some sort of lock-free
communication between the main thread and the audio callback
context instead, to avoid increasing the callback “scheduling”
jitter.

this is the problem, i’m not sure how to set up the communication
between the callback (which needs to tell
me that it finished filling it’s buffer) and another thread or
something that will fill its own buffer to be used
at the next callback.,.,

Actually, that would be the wrong way to do it. The whole point is to
make it asynchronous, allowing multiple buffers to be on the way from
the worker thread to the callback. That’s what allows the worker
thread to generate a few buffers at a time, every now and then, so
you can avoid breaking up and “smoothing” your DSP code.

since i’m sure there is enough cpu time left i only need one buffer to
keep the callback function satisfied.,
jitter is not my problem,. all i need to do is calculate the stuff in a
place that is less critical., somehow,
and copy that to the buffer during the callback function call.,

Anyway, try a lock-free ring of buffers (FIFO) between the thread and
the callback.

I have a general purpose, portable, single reader-single writer,
lock-free FIFO over at Mixed Downloads - sfifo. It’s
using a simple read()/write() style API (that is, copying), but you
could use it for passing pointers to buffers, buffer indices or
something, or just implement your own buffer ring, using the same
“safe order” approach to updating read and write indices. (Very
simple. Just make sure the compiler doesn’t reorder stuff too much
when optimizing…)

ok., i’ll have a look., thnx

anyway., thnx for the help., it triggered some ideas which i will
need to work out a bit.,
in the mean time i would like to call/CRY out for some more
documentation !! :))

Well, there isn’t much to document, really. (Except perhaps for
clarifying the exact relation between the AudioSpec ‘samples’ field
and the resulting latency - if you’re actually supposed to expect any
specific relation at all.)

what i’d like to know is how the callback was intended precisely and how
it relates to the buffers ‘upstream’ .,

This is standard real time audio stuff, and although there isn’t much
documentation on this specifically (you’d have to read up on DSP,
general RT programing and stuff instead), it’s rather intuitive once
you grasp the fundamental concepts.

i do get the fundamentals, done something similar in direc sound a long
while ago.,

The hard part is probably to let go of the “best average performance”
way of thinking that applies to programming in general. (That’s what
most well known algorithms are based on.) In RT programming, it’s all
about worst case performance; it has to be good enough to make the
deadlines at all times.

yeah.,., and i’m wondering why exactly it doesnt.,

thnx for going over this .,
i definitely got a clearer picture of what i propably need to do

gotta got to sleep now.,
greets.,
aka.,

David_Olofson · April 21, 2004, 4:58pm

No, without a recent (2.4+) kernel and/or lowlatency patches, it
can actually be worse than Windows, at least in terms of worst
case scheduling latency. (Average latency is pretty much
irrelevant to RT applications. It’s usually in the ?s range
regardless of OS anyway.)

Huh? Average scheduling latency in Linux 2.4 is at least 10ms. If
you set SCHED_RR, usleep() will give you smaller sleeps than that,
but it’s just busy looping–it isn’t actually giving up any CPU (so
you can’t actually do this constantly, or you’ll hang the system).

Are you talking about something else when you say “scheduling
latency”?

By “scheduling latency” I mean exactly that; the latency from the time
a thread is supposed to be scheduled in, until it actually gets the
CPU.

You’re confusing it with the “jiffy” rate (the HZ #define), which
defines the granularity of timesharing, timeouts, timers and whatnot
in pre 2.6 Linux versions. (2.5+ has HZ set to 1000 or 1024 by
default on most platforms including x86, and there are high
resolution timers that do not depend on HZ.)

What I’m talking about is blocking in drivers for hardware that can
generate IRQs on it’s own. ISRs can trigger instant reschedules, so
things become very different when you have a “real” time base - such
as a sound card or a network interface - that drives your thread.

BTW, Audiality (the sound engine used in Kobo Deluxe) happily
runs with less than 10 ms latency even on standard 2.4 Linux
kernels, whereas I can’t seem to get it to work at all with less
than ~70 ms latency on Win32. I’m still not totally sure why that
is, though; could be an Audiality/Win32 specific problem.

The “prefetch size” (that is, the amount DirectSound tells you to
write ahead in the buffer: the distance between the play and write
cursors) is typically very large. I think the only way to get
around this right now is to use ASIO, or to use kernel streaming,
which is 2k and XP- specific, documented as “probably won’t work in
future releases”, and prevents other apps from playing sounds.

Using ASIO in a game? Well, why not - I guess there are quite a few
gamers with ASIO supported sound cards these days.

I think this is a limitation of DirectSound, and not something that
drivers can fix.

Though it seems like applications can bypass the problem to some
extent…

How come the “direct DMA mixing” approach still works, despite this
limitation? (Doesn’t Quake III do this on both Win32 and Linux?) One
would think that if you can write to any place in the DMA buffer, it
would just be a matter of scheduling, so you could just use a 1 kHz
mmtimer or something.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Thursday 22 April 2004 00.52, Glenn Maynard wrote:

On Wed, Apr 21, 2004 at 11:11:08PM +0200, David Olofson wrote:

redman · April 21, 2004, 5:06pm

Glenn Maynard wrote:

The “prefetch size” (that is, the amount DirectSound tells you to write
ahead in the buffer: the distance between the play and write cursors)
is typically very large. I think the only way to get around this right
now is to use ASIO, or to use kernel streaming, which is 2k and XP-
specific, documented as “probably won’t work in future releases”, and
prevents other apps from playing sounds.

I think this is a limitation of DirectSound, and not something that
drivers can fix.

afaik you can set up your own prefetch size in direct sound., or at
least that is what i remember.,
i think you can set up multiple points in the buffer where the callback
should be called.,
directx then callbacks to fill the bit that is between the previous 2
points.,
so if you have a buffer that has 5 callback points in it and it
currently plays the part between
point 4 and 5 it will make a call to fill the bit between 3 and 4 .,.,
something like that anyway.,

greets.,
aka.,

Glenn_Maynard · April 21, 2004, 6:06pm

I think this is a limitation of DirectSound, and not something that
drivers can fix.

Though it seems like applications can bypass the problem to some
extent…

You can ignore it, but that causes underruns, at least on my SBLive. I
think there’s an actual reason for the prefetch buffer being so large,
and it can’t be worked around by simply ignoring it.

How come the “direct DMA mixing” approach still works, despite this
limitation? (Doesn’t Quake III do this on both Win32 and Linux?) One
would think that if you can write to any place in the DMA buffer, it
would just be a matter of scheduling, so you could just use a 1 kHz
mmtimer or something.

Q3 gives me underruns if I set s_mixahead to less than .065, which is
around 3000 frames. It seems to be under the same buffering constraints
as my code, at least on my system.

I don’t think you typically get direct DMA access, at least with WDM
drivers[1]. You get a buffer, but it may be a DMA buffer or it may
just be a mixing buffer for kmixer, I believe, which mixes and resamples
buffers for hardware that can’t. (I suppose it’s equivalent to ALSA’s
“dmix”.) If sound is being mixed through another thread, it’d explain
why the prefetch is so high.

I don’t know why this is a problem on SBLives, since they do support hardware
mixing and resampling: I’d expect them to support having a DMA buffer per
stream. This isn’t a hardware issue, since I can use much smaller writeaheads
in ALSA than DirectSound.

I also don’t know why this would be required; forcing applications to go
through an extra mixing thread is completely broken for apps that actually
need direct access. It should shut down kmixer if an application requests a
primary buffer.

[1] Microsoft Learn: Build skills that open doors in your career Thu, Apr 22, 2004 at 01:56:19AM +0200, David Olofson wrote:

–
Glenn Maynard

David_Olofson · April 22, 2004, 1:26am

I think this is a limitation of DirectSound, and not something
that drivers can fix.

Though it seems like applications can bypass the problem to
some extent…

You can ignore it, but that causes underruns, at least on my
SBLive. I think there’s an actual reason for the prefetch buffer
being so large, and it can’t be worked around by simply ignoring
it.
[…]
Q3 gives me underruns if I set s_mixahead to less than .065, which
is around 3000 frames. It seems to be under the same buffering
constraints as my code, at least on my system.

That definitely sounds like something (probably kmixer) is getting
in between the application and the real DMA buffer… If you can
actually get at the DMA buffer, the PCI burst size is what defines
your minimum latency, regardless of OS, API and drivers.

I don’t think you typically get direct DMA access, at least with
WDM drivers[1].

So I’ve heard… I think you need to explicitly tell DSound that you
want exclusive access to the device, or it will assume you want to
play nicely with other apps - which means kmixer, unless possibly if
you have h/w mixing and proper drivers.

You get a buffer, but it may be a DMA buffer or it
may just be a mixing buffer for kmixer, I believe, which mixes and
resamples buffers for hardware that can’t. (I suppose it’s
equivalent to ALSA’s “dmix”.)

Sort of, but AFAIK, dmix (direct mixing) runs in the application
context, and has no central s/w mixer anywhere. I think mixing (with
saturation) is done into a shared 32 bit buffer which is then
converted and copied into the DMA buffer, all inside the “write”
call. That way, it can be done without additional latency.

If sound is being mixed through
another thread, it’d explain why the prefetch is so high.

Yeah…

I don’t know why this is a problem on SBLives, since they do
support hardware mixing and resampling: I’d expect them to support
having a DMA buffer per stream. This isn’t a hardware issue, since
I can use much smaller writeaheads in ALSA than DirectSound.

Right; at least my Audigy supports multiple open with both ALSA and
OSS (used Creative’s driver for a while) without latency issues. It’s
perfectly possible to use XMMS, some SCHED_FIFO synths and various
other apps together, without them interfering. (Well, they have to
fight for the CPU, but that’s not much of an issue as long as the low
latency apps don’t burn too many cycles in total.)

I also don’t know why this would be required; forcing applications
to go through an extra mixing thread is completely broken for apps
that actually need direct access. It should shut down kmixer if an
application requests a primary buffer.

Right. In fact, using kmixer at all with cards that support h/w
mixing seems incredibly stupid and counterproductive to me. What’s
the point in getting a h/w mixing card at all then…?

The only (and rather silly) excuse I can think of for Live! and
similar cards to use kmixer would be to avoid lots of open “wave”
devices stealing from the synth voice reserve. (Live! and Audigy
cards have no h/w PCM channels; one DMA streaming sampleplayer voices

so they use two of those for each stereo PCM output channel opened.
Or at least, that’s what they do with the Linux drivers…)

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Thursday 22 April 2004 03.05, Glenn Maynard wrote:

On Thu, Apr 22, 2004 at 01:56:19AM +0200, David Olofson wrote:

Neil_Bradley · April 22, 2004, 10:07am

I’ve
noticed that changing SDL’s thread priority for audio under Windows
yields significantly better responsiveness.
Slightly odd, actually… Do you have other CPU hungry threads that
block properly and frequently? (The video stuff perhaps, if your
drivers implement retracy sync the right way.)

Nope. In most emulations, the overall CPU utilization is sitting around 3%
or so. I’m not too knowledgeable about the interns of Windows’ scheduling,
so I don’t know what the net effect is of changing thread priorities.

If the main loop just plain hogs the CPU constantly (as it does if you
run full speed with no retrace sync),

These old games have fixed frame rates, so I have a callback timer that
locks me to the specific framerate the game used. Much better than things
like MAME that will sit at 100% even for extremely simple emulations and
drain your laptop’s battery.

At least, this works very reliably on Linux, so messing with thread
priorities is just pointless - unless you go all the way and switch
to real time scheduling. (RT scheduling on Linux totally bypasses the
timesharing rules. The CPU is yours until you explicitly let it go.)

In other words, cooperative multitasking? Circa Windows 3.1 MacOS < 10?

It has worked on a few hundred systems with no problems, so I’m
good to go.
BTW, what are your device parameters, exactly?

Sample rate is is 44100 and the audio buffer size is 1024 bytes, or about
43 FPS. Is that what you were asking for?

Most people don’t know what “audio buffer size” even means, let
alone how to set it. And I’m not sure what you mean by “very high”,
but 1/30th of a second has always worked for me.
“Very high” would be well over 50 ms, but I don’t think you should
need that on any system if everything’s working properly. It depends
a bit on the game and the sounds, though.

Hm… at my buffer settings it’s sitting at around 23ms and seems to work
reliably/fine.

Well, that would be Kernel Streams and similar “hacks”, I guess…
Still, that alone doesn’t solve the problem, since there’s still tons
of stuff (ISRs of other drivers, among other things) that can get in
the way.

It’s unlikely that another ISR would take 10s of milliseconds to execute,
though, but it’s not out of the question for various threads to do so.

–>Neil-------------------------------------------------------------------------------
Neil Bradley “Your mistletoe is no match for my T.O.W. missile!”
Synthcom Systems, Inc. - Santabot - Futurama
ICQ #29402898

David_Olofson · April 22, 2004, 11:13am

[…]

If the main loop just plain hogs the CPU constantly (as it does
if you run full speed with no retrace sync),

These old games have fixed frame rates, so I have a callback timer
that locks me to the specific framerate the game used. Much better
than things like MAME that will sit at 100% even for extremely
simple emulations and drain your laptop’s battery.

Yeah… I added a frame rate throttling feature to Kobo Deluxe
specifically to avoid that.

At least, this works very reliably on Linux, so messing with
thread priorities is just pointless - unless you go all the way
and switch to real time scheduling. (RT scheduling on Linux
totally bypasses the timesharing rules. The CPU is yours until
you explicitly let it go.)

In other words, cooperative multitasking? Circa Windows 3.1 MacOS <
10?

Not quite. Unlike those environments, Linux RT threads still have
blocking I/O calls and the usual threading stuff - just like any
other threads. The difference is that SCHED_FIFO threads have fixed
priority, and can only be preempted by higher priority SCHED_FIFO
threads.

(BTW, SCHED_FIFO has been around for ages. The lowlatency patches just
transform it from a system stability hazard into a really useful
feature for serious multimedia.)

[…]

Most people don’t know what “audio buffer size” even means, let
alone how to set it. And I’m not sure what you mean by “very
high”, but 1/30th of a second has always worked for me.

“Very high” would be well over 50 ms, but I don’t think you
should need that on any system if everything’s working properly.
It depends a bit on the game and the sounds, though.

Hm… at my buffer settings it’s sitting at around 23ms and seems
to work reliably/fine.

Nice. I guess I’m just using too much CPU time for the SDL buffer
configuration. Double buffering is suboptimal when it comes to
handling CPU load in the audio thread with significant scheduling
jitter. Using three or more buffers allows you to use more CPU time
and/or handle more jitter, even if you change the buffer size to get
the same total latency.

Unfortunately, some studio sound card drivers and some APIs are
hardwired for exactly two buffers. :-/

Well, that would be Kernel Streams and similar “hacks”, I
guess… Still, that alone doesn’t solve the problem, since
there’s still tons of stuff (ISRs of other drivers, among other
things) that can get in the way.

It’s unlikely that another ISR would take 10s of milliseconds to
execute, though, but it’s not out of the question for various
threads to do so.

Well, I dunno what’s happening in there, but it seems to be impossible
to guarantee solid RT scheduling if you sub ms worst case latency,
even in kernel space… On NT, it wasn’t even possible with third
party RT kernels. (Some people tried really hard to run hard RT
industrial control stuff on NT, but it’s just not possible.) In
Win2k, you can at least get firm RT (ie “rock solid most of the time,
but don’t let it fly a plane”) with decent h/w and some luck. Dunno
about XP.

I’d say “too little, too late”. I gave up on Windows years ago because
of these issues (long before people started abusing DSound and
messing with Win98 internals in softsynths), and I’ve still not
heard anything from Windows audio hackers that indicates MS have
really fixed it. External DSP solutions are still the way to go for
rock solid low latency audio processing.

Either way, I’m not going back even if Windows eventually gets real
RT support - though that’s mostly for other reasons, totally
unrelated to low latency audio.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Thursday 22 April 2004 19.05, Neil Bradley wrote:

William_Petiot · April 22, 2004, 11:32am

redman:

Just verify if your buffer size is OK, after the call to OpenAudio(), because
if your drops are regular, typically it’s a buffer size which is wrong.
in other words, after :

SDL_OpenAudio(&req,&got)

verify that you calculate your sample buffer size from got.size, and not from
got.sample, or req.size or req.sample etc.

This could be the cause.On Thursday 22 April 2004 00:02, redman wrote:

Neil_Bradley · April 22, 2004, 1:04pm

than things like MAME that will sit at 100% even for extremely
simple emulations and drain your laptop’s battery.
Yeah… I added a frame rate throttling feature to Kobo Deluxe
specifically to avoid that.

It’s amazing the number of people I come across that still don’t
understand the “don’t hog the CPU” concept, but have no problem
complaining about poor system performance.

Hm… at my buffer settings it’s sitting at around 23ms and seems
to work reliably/fine.
Nice. I guess I’m just using too much CPU time for the SDL buffer
configuration. Double buffering is suboptimal when it comes to
handling CPU load in the audio thread with significant scheduling
jitter. Using three or more buffers allows you to use more CPU time
and/or handle more jitter, even if you change the buffer size to get
the same total latency.

At least with emulation, audio can lag 3-5 frames behind the actual
gameplay and still be tolerable (and unnoticeable to most people), so my
scenario is probably a bit unique.

Well, I dunno what’s happening in there, but it seems to be impossible
to guarantee solid RT scheduling if you sub ms worst case latency,
even in kernel space…

Right, though I wasn’t suggesting that level of insanity for a game.

industrial control stuff on NT, but it’s just not possible.) In
Win2k, you can at least get firm RT (ie “rock solid most of the time,
but don’t let it fly a plane”) with decent h/w and some luck. Dunno
about XP.

XP Is significantly better. I’d get pops/clicks in ASIO based stuff under
NT, but not with XP. The scheduler is quite a bit faster at coming around.

Either way, I’m not going back even if Windows eventually gets real
RT support - though that’s mostly for other reasons, totally
unrelated to low latency audio.

And I wouldn’t touch Linux with a 10 foot pole for probably the same
reasons you wouldn’t touch Windows.

–>Neil-------------------------------------------------------------------------------
Neil Bradley “Your mistletoe is no match for my T.O.W. missile!”
Synthcom Systems, Inc. - Santabot - Futurama
ICQ #29402898

Glenn_Maynard · April 22, 2004, 1:09pm

FYI, prefetch is a DirectSound internal driver setting; it can’t be
changed by applications, and it isn’t the buffer size.On Thu, Apr 22, 2004 at 08:30:12PM +0000, William Petiot wrote:

Just verify if your buffer size is OK, after the call to OpenAudio(), because
if your drops are regular, typically it’s a buffer size which is wrong.
in other words, after :

SDL_OpenAudio(&req,&got)

verify that you calculate your sample buffer size from got.size, and not from
got.sample, or req.size or req.sample etc.

–
Glenn Maynard

redman · April 22, 2004, 1:15pm

William Petiot wrote:>On Thursday 22 April 2004 00:02, redman wrote:

redman:

Just verify if your buffer size is OK, after the call to OpenAudio(), because
if your drops are regular, typically it’s a buffer size which is wrong.
in other words, after :

SDL_OpenAudio(&req,&got)

verify that you calculate your sample buffer size from got.size, and not from
got.sample, or req.size or req.sample etc.

This could be the cause.

nope, since the dropouts only occur when i do a lot of calculations in
the callback function.,
otherwise everything is fine.,

grts,
aka.,