David Olofson wrote:
Since the eye is more tolerant to slight variations in frame rate
(especially on “normal” video material) than the ear is to drop-outs or
crappy time stretching; yes, I think so.
Allrighty, I’ll be making the minor adjustments to the sync code needed to
accomplish this. Right now, there’s no software sync between audio and
video, I just trust the sound card to output stuff at a throttled rate, and
keep careful tabs on the video frames being output at exactly the rate they
should be.
Hmmm… I’m actually designing a multimedia plugin API that’s meant to be
used for that kind of stuff, as well as for ultra low latency hard real
time audio. Guess what - plugins are normally callback driven!
Well, yes! My plugins are callback driven – but not in the same sense as
SDL audio is callback driven.
Specifically… I have a disk read/decode thread that writes to a FIFO.
The audio out thread then reads from that FIFO. The audio out thread then
calls a plugin function of the form
int (*DoOutput)(void *data, AFCA_Frame *f);
which is approximately akin to
int (*DoOutput)(void *data, void *buf, long length);
The issue is that for the SDL aout plugin, its DoOutput function can’t just
output the data, it has to essentially put a pointer to buf into a shared
location, and wait for SDL to call the separate callback function, and the
callback function to complete.
Who calls this plugin, and what is “who” synced to?
The normal solution in audio application is that “who” is the plugin host,
and would in this case be executed by the SDL audio callback.
One DSP thread per CPU and plugins being called one by one in a fixed
order is the way to go; it’s simple, fast, efficient, reliable and works
in nearly any environment, including inside ISRs. Most importantly (for
multitrack audio and similar systems), it doesn’t make the lowest latency
a function of the number of plugins in the net.
Mmmmm, but when you’re playing audio and video, on a single CPU box, it can
be very difficult to schedule things if it’s all one thread.
Well, the video is dealing with higher latencies, so it’s kind of natural to
run that as a lower priority thread.
My philosophy
is to basically make lots and lots of threads, and use mutexes and
condition variables to synchronize the FIFOs in between them. This way,
when a thread is waiting, it sleep()s, and it gets woken up on the
appropriate event. It works no matter how many CPUs you have, it trusts the
OS to do the scheduling,
That’s the problem… This doesn’t work with low latency processing on any
"normal" OS, and results in extremely complex and practically unpredictable
timing phenomena, which is why it doesn’t work very well on hard RTOS kernels
either.
and it uses more CPUs if you have them –
automatically. (I.E., you don’t have to try and figure out how many CPUs
there are and spawn specifically that many threads.)
Why does it have to now anything about that? I think it’s bad programming
practice in the first place if the API wrappers aren’t separated from the
player core.
As I said above, the plugin DoOutput just gets a pointer and a buffer
length, and it takes it from there.
If I wanted to have the callback function handle reading from the first
FIFO (the one written to by diskread/decode), it would have to KNOW about
the FIFO, which it currently does not. Right now, the FIFO is hidden away
from the plugins.
What’s the problem with actually having a host to deal with everything that
doesn’t belong in the plugin API? Works for all sorts of hard disk recording
apps with DSP plugins.
The average audio application with plugins just load the plugins as
shared libraries and run them as callbacks, no matter what the target
audio API is. I don’t see the problem with moving something like that
between OSS, ALSA, DirectSound, Windows MMSystem audio and SDL…
It’s the audio plugin host that’s connected to whatever API you use, so
the plugins need to know nothing about that. It would even be possible to
make the host a form of plugin similar to the SDL audio callback, and
then call that from the different audio API wrappers.
I don’t quite get how that all works, i.e. which specific thing is the
"audio API wrappers"… but that’s okay, I get the read of this message, I
think
Well, for example, if you have a disk I/O thread and one or more DSP plugins,
the typical audio application solution would be to implement “something” that
can host the plugins, read/write data from/to the I/O thread via the FIFO
interface, and that could run as an SDL audio callback. Only this "something"
would know about the FIFO and SDL audio. This “something” would be what audio
hackers generally call a plugin host.
In order to save time and code, you may create a generic callback interface
to drive this host, rather than plugging it in as an SDL audio callback
directly. Then you can write a bunch of simple wrappers that either "forward"
callbacks, or run their own threads, doing read()/write() style I/O and
calling the host from their I/O loops.
As a simplistic example, here’s the “wrapper” function that’s called from an
"infinite" loop to run a simple callback driven soft synth against the OSS
(read/write style) API. Non-blocking I/O is used for MIDI rather than a
separate thread, for simplicity. This function is just called repeatedly
until the application wants to stop, or the function returns an error code.
int engine_cycle()
{
int frames = adev.fragmentsize>>2;
if(external_level)
if( read(adev.infd, extbuf_s16, adev.fragmentsize) < 0 )
return -1;
player_play(&player, frames);
check_midi();
engine_process(buf_s16, frames);
if( write(adev.outfd, buf_s16, adev.fragmentsize) < 0)
return -2;
return 0;
}
player_play() - song file player
check_midi() - MIDI input decoder
engine_process() - synth engine (calls lots of callbacks in turn)
Blocking is done on either audio in or audio out, depending on whether or not
the input is used. (If external_level == 0, the input is never opened by the
init code, so the blocking is done on the writes.)
Now, your FIFO would work just like the MIDI input in this code; non-blocking
read/write. If the data isn’t there, you can’t do much - if you try to wait,
you’ll only move the drop-out beyond your control; to the audio card driver.
Yeah… So what? VST and LADSPA handles that just fine, and so will yet
another callback driven plugin API; MAIA. If you’re just going to render
off-line to disk, all you do is:
while(playing)
{
...
call_plugin(..., buffer, buffer_size);
write(fd, buffer, buffer_size);
}
Which thread would this run in, though? Would it be a special type of
thread spawned only for disk writing? If so, then that’s dumb, because my
threads shouldn’t care what type of output they’re doing.
Cannot be done. If you want to do output to disk while still running in real
time, you have to pass the data via a FIFO to a separate disk writer thread.
The above was on example of off-line processing, which basically means that
you render the data to disk as fast as possible, blocking on the write() if
the CPU is faster than the drive. (This obviously cannot be used if you’re
processing “live” signals, as opposed to data from another file.
For recording during real time playback, you must use a “disk butler
thread” as we call it around the Linux Audio Dev list, and some FIFO
buffering in between the audio thread and this disk thread. This is
required to keep the disk access from causing drop-outs in the real time
playback.
Right. I have plenty of FIFOs, if/when I get around to supporting one input
and two output targets, there will be plenty of buffers.
The easiest way to implement that would be to just use a single
reader-single writer lock-free FIFO that’s written to by an interface
plugin called as usual from within the audio thread, and then have the
disk butler run in it’s own thread polling the other end of the FIFO. The
FIFO should preferably have several seconds worth of space, and the
polling can be throttled by sleeping for a sensible amount of ms
calculated from the amount of data in the FIFO when going to sleep.
Ahhh, but I thought we realized that polling was a bad idea back around
when Ethernet cards used to work that way… (you had to poll the card to
see if there was new data). By using condition variables, it sleeps for the
right amount of time, automatically.
Well, “polling” isn’t the correct term, actually; it’s more like “sleep for a
while, then check the FIFO, and go back to sleep if there’s nothing to do”.
It has to be done that way if you have a 2.1 ms latency SCHED_FIFO thread in
the other end of that FIFO, as you cannot call any IPC syscalls from within
the audio thread without occasionally missing deadlines. Deal with it, or fix
the kernel. (Or use an OS designed 100% for hard RT - lots of other issues
with those, however, so few audio hackers care much for them.)
This method works very well in real life, and is used in applications
that record and/or playback 20-60 channels of 44.1 kHz 16 or 32 bit
(depending on hard drive speed) audio while doing rock solid real time
processing with 3 ms input->output latency.
Wow. For my case, I don’t really care about in/out latency, because the
input is a file and there’s lots of buffering, but that’s pretty
impressive.
Well, it sure demonstrates that there is a bit of real time potential in
mainstream PC hardware after all…
Any problems are due to broken or flawed hardware, OS shortcommings,
application design errors - or any combination of those.
I don’t, actually. An underrun is a fatal error, almost as severe as
file corruption in a recording studio, so I tend not to accept them.
Well, the underrun is only for audio being played back to a user, and the
only time it would happen is when they have a slow computer and they try to
open up windows explorer (like my roomate’s comp :-).
hehe
In that case, Winamp seems to start repeating the last good buffer, getting
that stuttering effect which is so annoying.
DirectSound… The same would happen with mmap() on OSS or ALSA, or any other
"shared DMA buffer" interface, as the buffer just keeps looping and there
isn’t anyone around to even clear it.
And there is no need to, at least not with Linux/lowlatency or some other
RTOS solution. Just get the design right (that is, apply control
engineering design rules - sorry, no shortcuts), and it will be rock
solid as long as you have enough CPU power. Guaranteed. We’re doing it
all the time.
Hehehehe, I’m not taking any shortcuts. But all the realtime junk isn’t as
important either.
It’s real time as soon as there is any deadline to meet - be it within 50
?s, or the next day! If you miss a deadline, you’re out of control. If
you want control, don’t miss deadlines.
Anyway, if you really do want to provide some emergency solution for
users without a hard real time capable OS, you could just generate
silence when
Or without enough CPU power to run ${media player} and open up IE at the
same time.
Well, that’s not really a CPU power issue - it’s perfectly safe to fire up
Netscape and other hogs while doing 3 ms latency processing on
Linux/lowlatency, and the same applies for some 100 ?s with RTLinux or RTAI.
It’s just a matter of using an OS with a real time capable scheduler.
there is an underrun in the input buffer. That’s what the audio drivers
do,
Okey dokey.
(Well, provided you’re not using a shared memory style interface, that is.
DirectSound, regardless of “mode”, is a shared memory interface from the
kernel driver perspective…)
Because you have a kernel driver at the other end of write(), while you
have a user space thread at the other end of your “emulated” write(). The
Ummm, no, if there was an SDL audio_write() call, I would expect there to
be no extra threads, I would expect that call to simply output the data,
returning when it was ready.
Yeah… I might have lost track when writing that, but that was exactly the
point; that’s the only technical advantage of such an interface - provided it
can actually be implemented that way on the actual target.
difference is that the latter is a lot more likely to lose the CPU when
the code it’s trying to “protect” does. Also, adding an extra thread
between the audio API and the data source thread multiplies the risk of
getting huge scheduling latency peaks and other related timing problems.
Heh, when I’m playing audio I currently have three threads total, and in my
new API it might even be four. But it has worked extremely well using just
mutexes and condition variables.
(Almost) anything works with enough buffering…
As a matter of fact, as soon as you have more than one thread per CPU
interacting in a chain style manner, you’re asking for trouble, even if
you’re using a real RTOS. In most cases it’s considered poor design from
a real time programming perspective.
I have not had a single bit of trouble with the way I’m doing it. When I’m
playing an MPEG file with audio and video, I end up with around 10 threads,
on a two-CPU machine, but the audio never skips and the video is
perfectly synchronized.
Sounds good.
There’s a lot more to it than that, indeed… heh
Just put any supposedly “non-blocking” IPC construct in between a thread
that deals with disk I/O and an audio thread, and you’ll have nice
drop-outs every now and then, for no obvious reason. You need substantial
buffering, and you
Interesting, no problems for me…
It usually doesn’t make much difference on operating systems without a proper
real time scheduling mode that entirely bypasses timesharing and similir
systems - you’re likely to get drop-outs because of all sorts of other
interference long before you hit the OS IPC latency problems. This differs
between OS kernels though, so I won’t say this is always the case.
need lock-free access to it, at least from the low latency (audio)
thread. Ring buffers + atomic offset updates in shared memory is the way
to go; ie do the sync using the system bus hardware to keep the OS from
getting in the way.
Aha.
That’s real time audio programming on workstation platforms in a
nutshell.
I appreciate the discussion and the education I’m getting out of this.
Might come in handy some day, although I think you can live without the
hardcore constructs with sensible amounts of buffering in this case.
//David
.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |
--------------------------------------> david at linuxdj.com -'On Thursday 15 February 2001 23:23, Jeffrey Bridge wrote: