David Olofson wrote:
Nope - though it would definitely be possible to have some backends
work that way, provided you use a sample format that’s supported by
the hardware. Many games achieve very low latencies without
dangerously small buffers by directly mixing into the DMA buffer. New
sound FX can be started right in front of the DMA pointer, and then
mixed ahead to build up to a more reliable amount of buffering.
ok.,
Yeah, like random crackling, or the occasional irregular drop-out
every now and then…
exactly.,
also, my 2.8GHz pc is doing very little at the moment so the
scheduling queue should be pretty clean.,
what seems to be the problem is that sdl doesnt seem to tolerate
hanging around in the callback function
too much., .,., to my knowledge there is no reason why you couldn’t
do that unless the callback is called
very shortly before the actual playing buffer (somewhere in the
guts of sdl) runs out of data.,
Actually, it’s not only about that. SDL may have to convert your audio
data, and if it doesn’t, the buffer you’re writing into doesn’t have
to be the DMA buffer. (Some drivers and some sound APIs don’t support
direct access to the DMA buffer at all, and there may still be SDL
backends that don’t use that feature even if it’s available.) So,
it’s quite likely that nothing at all happens to the DMA buffer until
you return from the callback.
hmm.,., that would make sense (dma stall) and would indeed lead to the
skipping of a buffer if there
is too much time spent in the callback function.,
how i imagine it working (bar any conversions and stuff) is that sdl
basically keeps track of two buffers
(not sure if they are directly in dma or a wrap around directsound.,)
while one is being played (buff1) the other one needs to be filled with
data (buff2),
so at some point during the playing of buff1 there is a call to the
callback function.,
but i susspect that this call doesnt take place immediately but near the
end of buff1.
it could be that there is a communication lag down the lane that
prevents sdl to detect when the soundcard
is actually playing buff1 and thus not trigger the callback in time, i’m
not sure.,
i AM pretty sure that my calculations are not too heavy to be done in
time because in the main
thread i’m doing a lot of other stuff which i time and that takes a
minimal hit when i increase the
calcs in the callback function,., the main thread slows by about 5% or
so., nothing to worry about.,
what i’m basically doing is additive synthesis in the callback function.,
adding 100+ sine waves shouldnt be so hard, especially since sin is
hardwired into the math core.,
anyway., it seems that in the callback function i have just about enough
time to fill the buffer
that is hanging behind the pointer so maybe the sdl design actually
wants me to only fill the buffer
in the callback function., not sure either.,
if you calculate the ammount of data needed to be generated against the
cpu speed there should be
plenty of time to do all kinds of fancy stuff.,
i need to write 44100 samples per second into the buffer., that’s ~44Khz
44MHz is about a thousand times more
2.8GHz is another 66 times more cycles …,
so, theoretically i should have about 66000 cycles for every sample i
want to write,.,(bar any overhead)
that’s a lot.,., way more than i need.,
also, when i slow my (screen) redraw the audio skipping stays put.,.,
this also indicates this is not a speed
issue.,
so at the moment i have two suspects that could cause this.,
one is lag of the playback trigger comming from the layer beyond sdl (be
it directsound or the normal sound
card drivers)
the other is that sdl calls the callback just in time for us to fill the
buffer.,
and this last one could be coupled to a locking mechanism (as you
described for the dma) beyond sdl.,
so what i would like to do is to start calculating my data for the next
callback right after callback finishes
., which sdl doesnt provide for in the sense that i cannot call a
function and return from the callback at the
same time.,
that’s why i was thinking about having a separate thread poll for some
memory location that i could set
at the end of the callback.,
then i could start calculating stuff as soon as the scheduler hits that
thread and my problem should be over.,
but then again, how do i make a thread., (rtfm, luke) .,
the point is that i realy dont want to dig this deep into sdl since the
whole point of me using sdl was to have
a wrapper around directx and actually get some work done.,
i got realy realy scared of directx after seing that putting one pixel
on the screen would take as much code as
my whole project is at the moment (minus the sdl libs) and that includes
both sound and screen stuff.,
., boohoohoo.,., why are the designers not nice to me.,.,.,., mommy!!!.,
anyway.,., i’ll try some stuff tomorow.,., see if i can realy localize
this ‘feature’., 
i know that in directx sound you can define multiple points in the
buffer where the callback will be called so
you can make sure that the buffer is filled in time.,
but with sdl i’ve got no clue how much data there is still left in
the playig buffer (and thus how long you can
hang around ) when the callback is called,.
AFAIK, SDL should behave pretty much as if you were doing this:
while(playing)
{
as->callback(as->userdata, a->buf, a->bufsize);
a->convert_some(a);
write(a->outfd, a->outbuf, a->outsize);
}
if this was the case than i would prolly not have a problem :).
in a dedicated audio thread on a Un*x system with OSS or similar. What
happens is the sound card driver blocks in write() whenever the whole
buffer doesn’t fit between the last written byte and the current play
position in the DMA buffer. write() gets on with the job and returns
as soon as there’s enough room for the data.
in reality on win32 there are propably going to be several layers of
buffers between the sdl callback and
the soundcard., sdl buffer, directx buffer, another directx buffer ,
dma buffer.,
since thay all need to satisfy each other the spell could break in any
one of these, prolly not the dma one
since its wrapped by the drivers and they didnt give me shit with other
apps., it could be a bad setup of
the directsound buffers or the structure of the sdl buffers and how
callback is handled.,
“Polling” is usually done in the sound card ISR.
i wasnt refering to hardware polling.,
what i ment was a software polling.,., ie. one thread checking the
contents of a shared memory location
to see if it’s set to a value and if so do some stuff, reset the memory
and start polling again,.,
this should give me enough headroom to generate the data i need in time
for the next callback.,
In the OSS API, the
“fragment” size determines how frequently IRQs occur. SDL tries to
get two “fragments” of the size indicated by ‘samples’ in the
AudioSpec, so there’s usually (at least) one IRQ every ‘samples’
sample frame. What we get is this:
-
The audio callback is called when the previous buffer
has just been queued. Since the DMA buffer accepts only
two buffers’ worth of data, this happens right after
the second last buffer has been played, and one buffer
remains queued for output. That is, when the callback
is entered, there are two buffers between you and the
DAC.
-
The callback’s return deadline is (theoretically) at
most the duration of two buffers ahead. At that point,
the sound cards runs out of data, so we have to be in
the driver’s write() callback before then.
yes, yes, i know this
.,
anyone got this info?,.
Read The Source. 
hmm., let me sleep on this one for a night before i start diggin.,
is it changable?., without recompiling the
libs? 
Setting the ‘samples’ field of the SDL_AudioSpec you pass to
SDL_OpenAudio() to a sensible value should do the trick.
this doesnt work (no real effect when changing buffer size except that
it gets longer or shorter).,
skipping maintains and seems only influenced by the ammout of work done
in the callback.
(which makes sense to me) (somehow)
(And if it doesn’t, Use The Source, Luke. 
i could use a SDLightsable at the moment.,.,
the resulting problem for me is that since i cannot seem to put too
much in the callback i need another buffer
which should be filled by something that is polling wether the
callback has been called.,
…, and i have never written multitasked stuff.,
If you really can’t do all you need in the callback, you’re either
doing it the wrong way, or you simply have more processing to do than
your CPU can handle. (The latter is rather unlikely, unless you have
high quality reverb effects and stuff. 
yeah ., .,cpu is unlikely…, so that’s why i need more docs about how
the writers of sdl intended it
to be used., the examples are rather minimalistic …,., i was happy to
get any sound through it 
As to the former, what kind of processing are you doing? I suspect
that you do stuff in ways that makes the CPU load vary a lot between
calls.
basically adding a lot of sine waves together., nothing realy cpu
intensive.,
it works untill about 150 adds per sample.,then it starts
skipping.,.,., buffer size doesnt change this
since the calcs are done for every sample that needs to be shoved into
the buffer,
Uncompressing a frame of compressed audio every now and then would be
a typical example. If you can’t or won’t redesign your code to do the
job incrementally, you need to add some buffering (increase the
latency) and do the occasional heavy job in another thread.
threading and adding a layer of buffering was exactly my plan 
Ok. (Of course, VST plugins play by the exact same rules in this
regard. Keep those process*() execution times as constant as
possible…)
afaik vst hosts give you a good deal of cpu time .,., otherwise most
plugins wouldnt work at all.,
since there are several software sequencers that handle multiple
streams of audio to multiple buffers/
channels that do a lot more calcs than i’m trying to do i figure
that it shouldnt be too hard to achive
both low latency and use most of my cpu time for stuff…,
i dont see the problem., unless, again, the callback stuff is set
up to work just in time.,…,
Well, there’s no other way to do it… Both VST and SDL audio (and DX,
CoreAudio, JACK, ASIO, EASI, PortAudio and various other APIs) are
based on cooperative multitasking, using callbacks to drive the
“units”.
yeas, i realise this.,., that’s why i figure that i’m doing stuff in the
wrong place.,
it seems tha tthe callback doesnt expect you to do a lot of stuff right
there .,.,
A callback keeps the CPU until it returns, stalling the
whole host/driver/whatever. The only way to get things to run
smoothly at high CPU loads is to ensure that all units (plugins,
clients etc…) consume the same amount of CPU time for every buffer,
as far as possible.
this is totaly the case here.,., actually, i’m doing identical calcs for
every sample.,
(Actually, this is true for thread + IPC based plugin systems as well.
They just use a different plugin API, at the expense of making low
latency operation practically impossible without an RTOS, and very
hard to do reliably even with an RTOS. That’s why all major standards
use the callback model in one form or another.)
it seems the best way to work with buffers, i agree,
[…]
so i have basically two questions
1 where should i generate the data that would go into the buffer
In the callback. (Otherwise, you’ll need extra buffering and
thread safe communication between the audio engine and the
callback. That can make sense in some cases, but definitely not
when you want interactive low latency sound effects.)
multithreading seems like the only solution at the moment., .,.,
damn.,.,
Well, if you can add substantial buffering between the offending code
and the audio callback, there’s no major problem. Depending on what
you’re doing, it may well be the easiest way to get the job done
reliably.
now i need to find out what they want from me to make threads.,prolly
gonna need other stuff as well like threads
sharing memory with other threads., or can i just use globals?? 
Doing it in a VST plugin could be more problematic, though… Not sure
what your average host thinks about you creating your own worker
threads, but I think it’s doable in most cases.
it should work.,. but i’m not planning to do it threaded in the vst
version.,. the host should provide the necesary
cpu time at calling…,
[…]
I would recommend using some sort of lock-free
communication between the main thread and the audio callback
context instead, to avoid increasing the callback “scheduling”
jitter.
this is the problem, i’m not sure how to set up the communication
between the callback (which needs to tell
me that it finished filling it’s buffer) and another thread or
something that will fill its own buffer to be used
at the next callback.,.,
Actually, that would be the wrong way to do it. The whole point is to
make it asynchronous, allowing multiple buffers to be on the way from
the worker thread to the callback. That’s what allows the worker
thread to generate a few buffers at a time, every now and then, so
you can avoid breaking up and “smoothing” your DSP code.
since i’m sure there is enough cpu time left i only need one buffer to
keep the callback function satisfied.,
jitter is not my problem,. all i need to do is calculate the stuff in a
place that is less critical., somehow,
and copy that to the buffer during the callback function call.,
Anyway, try a lock-free ring of buffers (FIFO) between the thread and
the callback.
I have a general purpose, portable, single reader-single writer,
lock-free FIFO over at Mixed Downloads - sfifo. It’s
using a simple read()/write() style API (that is, copying), but you
could use it for passing pointers to buffers, buffer indices or
something, or just implement your own buffer ring, using the same
“safe order” approach to updating read and write indices. (Very
simple. Just make sure the compiler doesn’t reorder stuff too much
when optimizing…)
ok., i’ll have a look., thnx 
anyway., thnx for the help., it triggered some ideas which i will
need to work out a bit.,
in the mean time i would like to call/CRY out for some more
documentation !! :))
Well, there isn’t much to document, really. (Except perhaps for
clarifying the exact relation between the AudioSpec ‘samples’ field
and the resulting latency - if you’re actually supposed to expect any
specific relation at all.)
what i’d like to know is how the callback was intended precisely and how
it relates to the buffers ‘upstream’ .,
This is standard real time audio stuff, and although there isn’t much
documentation on this specifically (you’d have to read up on DSP,
general RT programing and stuff instead), it’s rather intuitive once
you grasp the fundamental concepts.
i do get the fundamentals, done something similar in direc sound a long
while ago.,
The hard part is probably to let go of the “best average performance”
way of thinking that applies to programming in general. (That’s what
most well known algorithms are based on.) In RT programming, it’s all
about worst case performance; it has to be good enough to make the
deadlines at all times.
yeah.,., and i’m wondering why exactly it doesnt., 
thnx for going over this .,
i definitely got a clearer picture of what i propably need to do 
gotta got to sleep now.,
greets.,
aka.,