Audio API

I’ve written a NES emulator that generates about 800 bytes of audio data every
1/60th of a second, and needs to output that data(with blocking). SDL
doesn’t have any(that I’ve seen) sort of FIFO or atomic variable
increment/decrement functions, so it is a bit annoying.

Right now I’m using the SDL_LockAudio() call, and use an internal
buffer(written to in the emulation thread, and read in the callback
function), but it has more latency that I would like, and using SDL_Delay()
to help with CPU usage doesn’t seem to work on all systems very well(probably
due to timer granularity…dunno).

By default, under platforms that have OSS(/dev/dsp) support, I just open, set
up, and write() to the device, which works well. I would like to use the SDL
audio api by default, but it(or maybe the entire SDL api) seems to be
inherently lacking for my needs. Any suggestions?

I’ve written a NES emulator that generates about 800 bytes of audio
data every 1/60th of a second, and needs to output that data(with
blocking). SDL doesn’t have any(that I’ve seen) sort of FIFO or
atomic variable increment/decrement functions, so it is a bit
annoying.

Well, you don’t really need OS support for a
single-reader/single-writer interface… A circular buffer with
atomic input and output pointers or indices will work just fine. You
could try my “sfifo” that I’ve been using in all sorts of
environments for a good while:

http://olofson.net/mixed.html

Right now I’m using the SDL_LockAudio() call, and use an internal
buffer(written to in the emulation thread, and read in the callback
function), but it has more latency that I would like, and using
SDL_Delay() to help with CPU usage doesn’t seem to work on all
systems very well(probably due to timer granularity…dunno).

There are probably three main reasons why you get high latency:

1) On many platforms (including Windows), you simply
   can't get very low latencies without various tricks,
   such as "last instant" mixing with shared memory
   (not supported by SDL as of now), or deep hacks in
   kernel space.

2) You sunchronization might be too coarse, if you're
   using whole buffers of substantial size. You might
   want to transfer very small buffers, or maybe even
   variable size chunks with singel sample granularity.
   That way, you can get close to the minimum latency
   of the system, even if your emulator doesn't generate
   exactly one buffer per audio callback. (And yes, in
   general, it's a better idea to run the audio engine
   inside the audio callback, to avoid streaming audio
   from one context to another - but that might not
   always be a viable solution with emulators.)

3) As you suggest, timer granularity may indeed be an
   issue. If all emulator code, audio synthesis included,
   runs in the main loop, audio granularity depends on
   the exact timing of the main loop - which effectively
   means that you get one "chunk" of audio (N buffers of
   around 800 bytes) per generated video frame. Provided
   you skip frames to keep up (if you don't the emulator
   will run too slow if the machine can't keep up...),
   that means you may get some 1600 bytes every 1/30 s,
   or even more data less frequently - which in turn
   means you need more buffering, which results in higher
   latency.

By default, under platforms that have OSS(/dev/dsp) support, I just
open, set up, and write() to the device, which works well.

Does that mean you’re running the audio synthesis in a separate thread
that blocks on write()…?

I would
like to use the SDL audio api by default, but it(or maybe the
entire SDL api) seems to be inherently lacking for my needs. Any
suggestions?

How do you maintain audio/video synchronization? There are a few
alternatives, including:

* Audio runs "free", driven by async commands from the
  main emulator thread. (The way it's usually done in
  native games.)

* Using audio as the time base. The emulated 60 Hz video
  refresh rate is derived from the audio sample rate, and
  frames are somehow rendered without retrace sync, or
  passed to the video subsystem in some other non-blocking
  manner, not to interfere with audio rendering.

* Using video as the time base. (Actively timer driven,
  or driven by the rendering frame rate and/or retrace
  sync and "throttled" by SDL_GetTicks() or similar.)
  This means you have to track the current audio playback
  position, so you can decide how much audio you should
  generate for each frame. (Of course, this is just
  guessing, as you can't be sure how long it takes to
  render the next frame.)

* Same as above, but the audio callback does some form of
  time stretching or resampling to stay in sync with the
  main emulator thread.

Either way, if it’s done right, I can’t see how you can get different
results with SDL audio and OSS.

(True, you can’t have variable size blocks with SDL audio, but you
generally want to avoid that anyway, as most modern audio APIs work
very much like SDL’s callback driven audio API. The read()/write()
paradigm, in it’s traditional interpretation, just doesn’t mix with
low latency audio.)

//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se —On Saturday 16 August 2003 00.50, xodsdl at starmen.net wrote:

After quickly looking over that(sfifo) code, it appears that you’re expecting
certain C operations to be atomic. Is that a wise decision? Would it work
at all on RISC architectures? Would the compiler’s optimizer produce
assembly that you didn’t want? That code doesn’t seem to be very thread-safe
to me, but am I missing something?

My emulator can’t have the sound emulation code run in the sound callback
function, as the sound emulation is synchronized and tied into the cpu
emulation, and the cpu can read from various registers to get the state of
the sound channels.

Having the main thread block on sound writes does create noticeable screen
tearing(less noticeable if you use a high refresh rate, but that’s not very
feasible on LCD monitors). I have thought about dynamically adjusting the
output rate of the emulator(and not the actual sound device) to synchronize
the sound updates and video updates, but that would cause a little sound
distortion(such as a 4000hz square wave being output as a 3993hz square
wave). The real NES has a frame rate of about 60.1 fps. A refresh rate of
"60" hz is probably close enough to not cause noticeable sound problems, but
what about 70hz, or 72hz? Unless I do some sort of blurring, I’d expect the
video updates to appear choppy. And this is assuming I could even get double
buffering to work right under X with OpenGL. My Geforce2 card does double
buffering when I set that environment variable, but it seems to consistently
perform the flips in the middle of the screen in my code, and not in vblank.
Probably a bug in my code…> Well, you don’t really need OS support for a

single-reader/single-writer interface… A circular buffer with
atomic input and output pointers or indices will work just fine. You
could try my “sfifo” that I’ve been using in all sorts of
environments for a good while:

http://olofson.net/mixed.html

After quickly looking over that(sfifo) code, it appears that you’re
expecting certain C operations to be atomic. Is that a wise
decision? Would it work at all on RISC architectures? Would the
compiler’s optimizer produce assembly that you didn’t want? That
code doesn’t seem to be very thread-safe to me, but am I missing
something?

The only requirements are that reading and writing sfifo_atomic_t is
atomic, and that operations are done very roughly in the right order.
RISC/CISC is not an issue, as all operations but the actual reads and
writes to those variables are irrelevant. As to order, that’s just a
matter of bumping the read/write indices after reading/writing the
data in the buffer, so you keep the “other end” away from the part of
the buffer you’re messing with. Both sides read both indices, but the
writer only changes the write index, and the reader only changes the
read index.

The only environment I’m expecting problems with is SPARC SMP systems.
AFAIK, those have a maximum atomic transfer size (between CPUs) of 24
bits, which means that my “volatile int” typedef won’t work.

Actually, the problem is one of alignment rather than the actual size
of the index variables (only the low N bits are used, where N depends
on the FIFO buffer size), so it might actually work anyway, but I
wouldn’t bet on it…

A compiler that thinks int is 64 bits would create a similar
situation; indices may not be atomic (N bit CPU bus != N bit atomic
size on SMP systems), but only the 31 low bits are actually used, so
with normal alignment, it should work anyway.

Anyway, here’s a list of compiler / OS / architecture combos sfifo is
known to work on:

gcc / Linux / x86:              Works
gcc / Linux / x86 kernel:       Works
gcc / FreeBSD / x86:            Works
gcc / NetBSD / x86:             Works
gcc / OpenBSD / x86:            Works
gcc / OpenBSD / PPC:            Works
gcc / OpenBSD / SPARC:          Works
gcc / OpenBSD / SPARC64:        Works
gcc / Solaris / x86:            Works
gcc / Solaris / SPARC:          Works
gcc / Mac OS X / PPC:           Works
gcc / BeOS / x86:               Works
gcc / BeOS / PPC:               Works
gcc / AmigaOS / 68k:            Works
gcc / AmigaOS / PPC:            Works
gcc / Win32 / x86:              Works
gcc / Linux / PlayStation 2:    Works
Borland C++ / DOS / x86RM:      Works

Unfortunately, I don’t have any reliable info on SMP systems. Anyone
who got Kobo Deluxe to work properly with sound effects on SMP
systems, please report! :slight_smile:

My emulator can’t have the sound emulation code run in the sound
callback function, as the sound emulation is synchronized and tied
into the cpu emulation, and the cpu can read from various registers
to get the state of the sound channels.

Yeah, that’s the usual situation with emulators, apparently…

Having the main thread block on sound writes does create noticeable
screen tearing(less noticeable if you use a high refresh rate, but
that’s not very feasible on LCD monitors). I have thought about
dynamically adjusting the output rate of the emulator(and not the
actual sound device) to synchronize the sound updates and video
updates, but that would cause a little sound distortion(such as a
4000hz square wave being output as a 3993hz square wave).

And more seriously, that kind of sync is very hard to get right unless
you have a real time OS. If you measure “current time” in the audio
callback and in the main emulator loop, or something like that,
you’ll have to do some substantial filtering to extract useful data.
Way too much scheduling jitter to do anything with the raw data.

The real
NES has a frame rate of about 60.1 fps. A refresh rate of “60” hz
is probably close enough to not cause noticeable sound problems,
but what about 70hz, or 72hz?

Not close enough… And then there are the occasional dropped frames.
If you “hard sync” audio to video, a dropped video frame causes an
audio glitch. (Audio glitches are much more annoying than occasional
dropped video frames…)

Unless I do some sort of blurring,
I’d expect the video updates to appear choppy.

I don’t think there’s any reasonably simple filtering that can solve
that. You can cross fade or something, but I doubt that will do much
good on this kind of video. (Sort of works if there’s motion blur, I
think, but it’s still far from perfect.) The proper solution would be
to interpolate the scroll and sprite positions, but that might cause
weird effects with some games…

And this is
assuming I could even get double buffering to work right under X
with OpenGL. My Geforce2 card does double buffering when I set
that environment variable, but it seems to consistently perform the
flips in the middle of the screen in my code, and not in vblank.
Probably a bug in my code…

I doubt it’s your code. (OpenGL doesn’t give you that kind of control,
AFAIK.) Retrace sync causing tearing seems to be “normal” for current
Linux drivers. The problem is that nearly all drivers do "fake"
double buffering (back->front blits instead of h/w page flipping),
which means that you get tearing if the “flip” isn’t started at the
exact right moment.

//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se —On Saturday 16 August 2003 19.13, xodnizel wrote:

And this is
assuming I could even get double buffering to work right under X
with OpenGL. My Geforce2 card does double buffering when I set
that environment variable, but it seems to consistently perform the
flips in the middle of the screen in my code, and not in vblank.
Probably a bug in my code…

I doubt it’s your code. (OpenGL doesn’t give you that kind of control,
AFAIK.) Retrace sync causing tearing seems to be “normal” for current
Linux drivers. The problem is that nearly all drivers do "fake"
double buffering (back->front blits instead of h/w page flipping),
which means that you get tearing if the “flip” isn’t started at the
exact right moment.

I think OpenML is supposed to solve the audio video sync problem somehow.

And this is
assuming I could even get double buffering to work
right under X
with OpenGL. My Geforce2 card does double
buffering when I set
that environment variable, but it seems to
consistently perform the
flips in the middle of the screen in my code, and
not in vblank.
Probably a bug in my code…

I doubt it’s your code. (OpenGL doesn’t give you
that kind of control,
AFAIK.) Retrace sync causing tearing seems to be
"normal" for current
Linux drivers. The problem is that nearly all
drivers do "fake"
double buffering (back->front blits instead of h/w
page flipping),
which means that you get tearing if the “flip” isn’t
started at the
exact right moment.

I’ll assume you have:
SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER , 1);
and:
SDL_GL_SwapBuffers();

For double buffering that syncs, you’ll need DRI
drivers installed. The ones for my NVidia TNT2 card
work fine… considering the better specs of yours,
and their unified code base (for easy porting) I’m
assuming you don’t have a DRI driver installed… or a
recent enough version of XFree.

(It MIGHT just be the DRI driver dosn’t support that
card yet…)

A link to NVidia’s linux driver section:
http://www.nvidia.com/object/linux_display_ia32_1.0-4496.html

A link to the DRI homepage:
http://dri.sourceforge.net/

Note: I had to upgrade both from what RH9 installed.__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

I only saw a “typedef int sfifo_atomic_t”, no “volatile”, in sfifo 1.3.
That’s why I asked about compiler optimizations. Is this a typo?

Btw, I couldn’t send email directly to you. I’m not sure if it’s a problem
with my mail server or yours…

Hi. This is the qmail-send program at starmen.net.
I’m afraid I wasn’t able to deliver your message to the following addresses.
This is a permanent error; I’ve given up. Sorry it didn’t work out.

:
TLS found no client cert in control/clientcert.pem
I’m not going to try again; this message has been in the queue too long.On Sunday 17 August 2003 04:46, David Olofson wrote:

The only environment I’m expecting problems with is SPARC SMP systems.
AFAIK, those have a maximum atomic transfer size (between CPUs) of 24
bits, which means that my “volatile int” typedef won’t work.

The only environment I’m expecting problems with is SPARC SMP
systems. AFAIK, those have a maximum atomic transfer size
(between CPUs) of 24 bits, which means that my "volatile int"
typedef won’t work.

I only saw a “typedef int sfifo_atomic_t”, no “volatile”, in sfifo
1.3. That’s why I asked about compiler optimizations. Is this a
typo?

IIRC, I removed the “volatile” early on, but it’s back in 1.4 or
something… (Well, it’s obviously in my local version.)

Anyway, I have yet to see a compiler + CPU combo that requires the
"volatile" for the code to function properly. There’s just too much
code for any normal optimizations to change the order of the buffer
access and index access code.

That’s just luck, though; the “volatile” should obviously be there.
One should also use “atomic_t” or whatever there is in environments
that have such a thing, but I haven’t bothered with it so far, as I
don’t know how to detect the systems that really need it. (No
autotools.)

Btw, I couldn’t send email directly to you. I’m not sure if it’s a
problem with my mail server or yours…

Well, there’s a virus + spam filter on my server, but the error
message indicates some actual problem. (The filter just eats the
offending mail.)

//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se —On Sunday 24 August 2003 19.29, xodnizel wrote:

On Sunday 17 August 2003 04:46, David Olofson wrote: