[CM] Snd: matching utterances with a DTW-like algorithm

Ville Koskinen j.v.o.koskinen@gmail.com
Sun, 22 Jul 2007 17:42:39 +0300


Hello,

I've been pondering the possibility of scripting Snd to automatically
detect reader errors from spoken word audio. Usually the reading is
like this:

"At the beginning of July, during a spell of exceptionally hot WOTHER,
... <sigh> <silence>... , during a spell of exceptionally hot weather
..."

That is, most of the time the errors in the reading are followed by a
period of silence, and after that there is an overlap to the already
recorded part.

I've found out that in speech recognition an algorithm called "dynamic
time warping" is often used to match utterances to templates. With DTW
you should be able to match a spectrogram of "speech" to the
spectrogram of "sspeechh".

I wonder if Snd could be scripted to:
1) create a local spectrogram from the data at the beginning of the
reader's correction (assuming that long pause is always followed by a
correction)
2) go backwards to beginnings of previous sentences (this I can
already do) and to create local spectrograms from these positions
3) compare past spectrograms to the one from the correction in order
to detect the overlap.

I wonder if anyone has tried to implement anything like this (or a
subset)? Any suggestions?

Kind Regards,
Ville Koskinen