AW: Re: Audio Input: Needed? Wanted? Implementation pla nned?

Oisin_Mulvihill · March 27, 2001, 9:13am

Hi,

this would definitely be a nice idea, but you’d need serious
DSP (digital signal processing) to get from voice commands to
an sdl event. DSP is also quite computationally intensive. Is
there a gnu speech recognition library?

om>It just occured to me that you might like to have an “Audio Trigger”

event –
i.e. that you can set a volume to trigger an SDL event. That would be
like a voice-activated mike. It’d be very handy for a voice-command
system,
and would seem to fit into the SDL API reasonably well. Voice events
could
be interpreted by application code just like keyboard events etc. The
callback function would probably have to go and get the buffered audio
input datastream. You could also use this threshhold to determine when
audio
should be recorded to the buffer.

Just an idea.

–
Terry Hancock
hancock at earthlink.net

Sascha_Gunther · March 27, 2001, 9:52am

As I said before, I do not think that SDL should handle that. Maybe there
could be an event like SDL_EVENT_MICON, where SDL checks for a sound in
between a minimum and a maximum frequency with a minimum volume to come in
and pushs this event to the queue, but voice commands… ohhh, I guess this
should be handeled above SDL, while this seems to be not a general input
source. (Maybe one wants to handle Audio-Input for other reasons, such as
simple recording the input for a later output or stuff…)

I think, Audio-input should be as “simple” as possible, or better lets say,
as basic as possible…

Regards,

Sascha

Am Dienstag, 27. M?rz 2001 11:13 schrieben Sie:> Hi,

this would definitely be a nice idea, but you’d need serious
DSP (digital signal processing) to get from voice commands to
an sdl event. DSP is also quite computationally intensive. Is
there a gnu speech recognition library?

om

It just occured to me that you might like to have an "Audio Trigger"
event –
i.e. that you can set a volume to trigger an SDL event. That would be
like a voice-activated mike. It’d be very handy for a voice-command
system,
and would seem to fit into the SDL API reasonably well. Voice events
could
be interpreted by application code just like keyboard events etc. The
callback function would probably have to go and get the buffered audio
input datastream. You could also use this threshhold to determine when
audio
should be recorded to the buffer.

Just an idea.

–
Terry Hancock
hancock at earthlink.net

Sam_Hart · March 27, 2001, 3:25pm

Hi,

this would definitely be a nice idea, but you’d need serious
DSP (digital signal processing) to get from voice commands to
an sdl event.

Not necessarily.

I could envision some games where simple “claps” or other loud noices could be
used as controls for a game (perhaps you must clap to the beat, or something…
heck, I could use it in another educational game!

If sound input were to be used as an event generator in basic SDL, then this
would be all I would expect it to do.

DSP is also quite computationally intensive. Is
there a gnu speech recognition library?

I actually have heard of some… but don’t know of any of them off the top of
my head (as I recall, there was a BSD licensed one available from some
university somewhere as well…)On Tue, 27 Mar 2001, you wrote:

–
Sam “Criswell” Hart <@Sam_Hart> AIM, Yahoo!:
Homepage: < http://www.geekcomix.com/snh/ >
PGP Info: < http://www.geekcomix.com/snh/contact/ >
Advogato: < http://advogato.org/person/criswell/ >

David_Olofson · March 27, 2001, 3:34pm

Right; just pure audio I/O. As long as the hardware/driver interface is
there, the rest can be handled by external libs. It’s the “external libs
trying to fight SDL for control of the hardware” scenario that I don’t want
to see. (And that’s what you get if you don’t do audio output AND input in
one place - which must be in SDL, as some silly platforms want to bind audio
to video…)

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |--------------------------------------> david at linuxdj.com -'On Tuesday 27 March 2001 11:52, Sascha G?nther wrote:

As I said before, I do not think that SDL should handle that. Maybe there
could be an event like SDL_EVENT_MICON, where SDL checks for a sound in
between a minimum and a maximum frequency with a minimum volume to come in
and pushs this event to the queue, but voice commands… ohhh, I guess this
should be handeled above SDL, while this seems to be not a general input
source. (Maybe one wants to handle Audio-Input for other reasons, such as
simple recording the input for a later output or stuff…)

I think, Audio-input should be as “simple” as possible, or better lets say,
as basic as possible…

Oisin_Mulvihill · March 27, 2001, 4:15pm

Hi,

you could go a step further. I don’t know if you’ve
done much dsp, its also been a while since I did dsp,
but you could recognise a set of simple sounds. This
could be achieved by using an FIR (finite impulse
response) or IIR (infinte impulse response) filter
and train it to recognise a simple set of tones/sounds,
dum beat, whistle, etc. There could be more to it
then that, however as I mentioned earlier, its been a
while. It would be an interesting project alright.
Maybe someone out there with more experience in this,
could correct my errors/give a little more info about
it.

om> ----- Original Message -----

From: Samuel Hart [mailto:criswell@geekcomix.com]
Sent: Tuesday, March 27, 2001 4:25 PM
To: sdl at lokigames.com
Subject: RE: AW: [SDL] Re: Audio Input: Needed? Wanted? Implementation
pla nned?

On Tue, 27 Mar 2001, you wrote:

Hi,

this would definitely be a nice idea, but you’d need serious
DSP (digital signal processing) to get from voice commands to
an sdl event.

Not necessarily.

I could envision some games where simple “claps” or other loud noices could
be
used as controls for a game (perhaps you must clap to the beat, or
something…
heck, I could use it in another educational game!

If sound input were to be used as an event generator in basic SDL, then this
would be all I would expect it to do.

DSP is also quite computationally intensive. Is
there a gnu speech recognition library?

I actually have heard of some… but don’t know of any of them off the top
of
my head (as I recall, there was a BSD licensed one available from some
university somewhere as well…)

–
Sam “Criswell” Hart AIM, Yahoo!:
Homepage: < http://www.geekcomix.com/snh/ >
PGP Info: < http://www.geekcomix.com/snh/contact/ >
Advogato: < http://advogato.org/person/criswell/ >

Terry_Hancock · March 27, 2001, 7:11pm

WHOA! No I didn’t mean SDL should catch voice commands! – I meant
that a “Mic-On” event would be useful for an application that
wanted to use voice commands (there are probably other applications
to, like recording voice notes). (I.e. I agree with what Sascha
Gunther says below).

I figured that rather than having to poll the audio stream continuously,
it would be good to set a threshhold and do two things on receiving
sound “in range”: 1) trigger an event, and 2) buffer the data so
the application can capture it and process it as a block. This only
requires SDL to check the volume periodically (unless the soundcard
itself can do it, which would be even better).

Sascha G?nther wrote:

As I said before, I do not think that SDL should handle that. Maybe there
could be an event like SDL_EVENT_MICON, where SDL checks for a sound in
between a minimum and a maximum frequency with a minimum volume to come in
and pushs this event to the queue, but voice commands… ohhh, I guess this
should be handeled above SDL, while this seems to be not a general input
source. (Maybe one wants to handle Audio-Input for other reasons, such as
simple recording the input for a later output or stuff…)

I think, Audio-input should be as “simple” as possible, or better lets say,
as basic as possible…

Am Dienstag, 27. M?rz 2001 11:13 schrieben Sie:

this would definitely be a nice idea, but you’d need serious
DSP (digital signal processing) to get from voice commands to
an sdl event. DSP is also quite computationally intensive. Is
there a gnu speech recognition library?

There’s some open-source voice command software – one package
is called “ears”. But I haven’t checked it out. Mind you,
I don’t have any voice command applications that I’m working
on – though I have thought about it some as a no-hands solution
for inputing macro commands in a teleoperation system (where
both hands are normally, or at least often, busy). Not a really
harsh environment for voice commands – it would have a very
limited vocabulary and could be trained for individual operators.
Not anything like a speech to text system.> > >It just occured to me that you might like to have an “Audio Trigger”

event –
i.e. that you can set a volume to trigger an SDL event. That would be
like a voice-activated mike. It’d be very handy for a voice-command
system,
and would seem to fit into the SDL API reasonably well. Voice events
could
be interpreted by application code just like keyboard events etc. The
callback function would probably have to go and get the buffered audio
input datastream. You could also use this threshhold to determine when
audio
should be recorded to the buffer.

–
Terry Hancock
@Terry_Hancock

David_Olofson · March 27, 2001, 6:08pm

For few bands and high signal quality requirements, a bunch of filters can
indeed be a viable solution, but in this case, I think a small FFT performed
on adequately prepprocessed data could be more efficient and much more
powerful.

LP filter + downsample to say 8 kHz, perform a few FFT’s per second, window
size 32…256. (Don’t know what you’d need; bigger handles low frequencies
better.) What you get is a stream of frequency spectra that you can run
through a weighted best fit algorithm of some kind to find the best match in
a database of commands.

The last step is probably the most complicated part, as “best fit” for this
kind of data is rather hard to define in a way that’s useful to a computer…
Voice pitch (low and strong frequency components) and speed must be allowed
to vary a lot, while timbre (relative formant frequency distribution) is very
sensitive, as that’s what makes it possible to tell vowels apart.

//David

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------> http://www.linuxaudiodev.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |--------------------------------------> david at linuxdj.com -'On Tuesday 27 March 2001 18:15, Oisin Mulvihill wrote:

Hi,

you could go a step further. I don’t know if you’ve
done much dsp, its also been a while since I did dsp,
but you could recognise a set of simple sounds. This
could be achieved by using an FIR (finite impulse
response) or IIR (infinte impulse response) filter
and train it to recognise a simple set of tones/sounds,
dum beat, whistle, etc. There could be more to it
then that, however as I mentioned earlier, its been a
while. It would be an interesting project alright.
Maybe someone out there with more experience in this,
could correct my errors/give a little more info about
it.

Oisin_Mulvihill · March 28, 2001, 9:23am

Creating an SDL_Voice lib would be how I’d do it. I’d be
interested in looking further into this. I’ve been
looking for a oppertunity to use the DSP stuff I learned
in college for some time. Perhaps there are others out
there who would be interested in working on this type
of project as well?

om> ----- Original Message -----

From: Terry Hancock [mailto:hancock@earthlink.net]
Sent: Tuesday, March 27, 2001 8:48 PM
To: sdl at lokigames.com
Subject: Re: AW: [SDL] Re: Audio Input: Needed? Wanted? Implementation
planned?

Oisin Mulvihill wrote:

you could go a step further. I don’t know if you’ve
done much dsp, its also been a while since I did dsp,
but you could recognise a set of simple sounds. This
could be achieved by using an FIR (finite impulse
response) or IIR (infinte impulse response) filter
and train it to recognise a simple set of tones/sounds,
dum beat, whistle, etc. There could be more to it
then that, however as I mentioned earlier, its been a
while. It would be an interesting project alright.
Maybe someone out there with more experience in this,
could correct my errors/give a little more info about
it.

Surely, though, that’s separate enough to take out of SDL –
it ought to go into an SDL_Voice lib or something: The
main thing is, once you have the chunk of audio data you
are no longer hardware dependent and you don’t have
hardware acceleration of any kind. At that point you’re
just doing DSP on the data, so it’s separable from SDL.

That’s also desireable, because although you talk about
very simple voice recognition, someone is unquestionably
going to want more, so the voice features will creep
over time to include more and more. Better to keep it
separate now, I would think – then there’s no reason
why it shouldn’t grow.

Given the “simple core + add ons” strategy of SDL so far,
that sounds like the right approach. IMHO this is like
vector graphics primitives with respect to video –
SDL doesn’t do them, but you can use SGE or some other
SDL-based library to do them.

–
Terry Hancock
hancock at earthlink.net