[ This post has moved to - http://tempuri.org/tempuri.html ]

But we don't have that luxury. We have a primitive Answering Machine Detection (AMD) system which is primarily based on cadence analysis. Although we do have a Gortzel based fax tone detection algorithm, up until now we don't have any sophisticated DSP-based processing as part of the core algorithm. Now one of our customer from Turkey reported really bad experience while dialing two particular Turkey number as far as AMD is concerned. All calls to these two numbers are supposed to be Answering Machine instead our software is declaring it as Human or live. They have a reason to be disgusted. When we analyzed the drama, -

1. Made call to some 88.x.y.z SER (router) server for a offending number say P (P@88.x.y.z)

2. Initial response 100/180 (SIP provisional response) came sharp

3. Then the early media is started by parallel media sever from 88.x.y.z1. The media is very similar to - "You have reached Mr. Gordon's residence..." - media one usually get after call gets connected.

4. After some time 200 OK (call gets connected) reached to our end and a tone follows with a silence.

5. Our primitive algorithm is detecting the tone as similar "Hello" and the subsequent silence made it sound like a "Hello, ..." and declared as a potential Human.

To make our life even tougher the tone came at various different frequencies and is not any of the standard DTMF tones. Even the Gortzel DFT based fax detection was tuned to 2100 Hz (Fax Answering Tone Freq.). What we need was a Singular Frequency Tone detection where the frequency is not fixed. We need some quick solution otherwise a big customer may be on its way out. One of these recorded media streams is shown below

The first challenge is to find a pattern to identify the tone. We knew that we might need to turn to frequency / spectrum analysis for this puzzle to solve. We used Adobe Audition's built-in frequency analysis and results are enough to kick us going -

The peak is clear there. Now there are various tone detection algorithms including SETI, but we decided to go our own - a decision which would prove useful later. We already had a license of Intel IPP libraries - http://www.intel.com/cd/software/products/asmo-na/eng/302910.htm - a high performance signal processing library. Initially FFT is applied to the whole recorded WAV sample (tone/voice). The results were disappointing and time was running by fast. We shifted our attention towards Windowed-FFT with a window size 1024 as shown in the Adobe Audition's frequency analysis panel. FFT size defines the frequency resolution we would measure as

F.R = sampling rate / FFT bin size

Now as we split the whole sample in 1024 windows (within which signal is not periodic), overlapping effect starts kicking in. To reduce such effects usually some windowed-preprocessing is done and then FFT is applied. Obtaining power was the obvious option. Let's jot down what steps we've gone through

1. Applied Hamming window over the window

//1. Hamming window (smooth out)...

status = ippsWinHamming_32f_I(in_dbl,

status = ippsWinHamming_32f_I(in_dbl,

windowSize);

// Show message box if status is wrong

IppErrorMessage("ippsWinHamming_32f_I", status);

if (status < 0)

return;

2. Apply FFT with order N=10 (etc. log2(1024))

////////////////////

//Core FFT

////////////////////

status = ippsFFTFwd_RToCCS_32f(in_dbl,

out,

spec,

NULL);

IppErrorMessage("ippsFFTFwd_RToCCS_32f", status);

return;

4. Obtain magnitude vector from FFT

///////////////////

//Magnitude

status = ippsMagnitude_32fc((Ipp32fc*)out,

power_spectrum,

windowSize / 2);

IppErrorMessage("ippsMagnitude_32fc", status);

return;

//Scale the fft so that it is not a function of the length of x, mx = mx/length(x)

status = ippsDivC_32f_I((Ipp32f)(windowSize / 2),

power_spectrum,

windowSize / 2);

IppErrorMessage("ippsDivC_32f_I", status);

if (status < 0)

return;

//We need power ~ sqr(mag.)

status = ippsSqr_32f_I(power_spectrum,

windowSize / 2);

IppErrorMessage("ippsSqr_32f_I", status);

if (status < 0)

return;

We were clueless after this. Should we go for a sophisticated peak detection algorithm or there is something for KISS strategy? IPP's high level peak detection function made us tempted enough to try it

IppStatus ippsFindPeaks_32f8u(

const Ipp32f* *pSrc*,

Ipp8u* *pDstPeaks*,

int *len*,

int *searchSize*,

int *movingAvgSize*);

a. in voice window, power is dispersed over the whole FFT spectrum-window

b. in tone window, power is concentrated in 1 or 2 frequency bins (FFT output array index for the window) and for our Singular Frequency Tone, we may well concentrate on the frequency bin at which the max window power occurs

Powered by this simple observation we decided to do following

1. Normalize the power spectrum by dividing the power vector by the total power (iff total power for the window is non-zero)

2. Get the max normalized power of the window.

3. If the value crosses a pre-determined (trial) threshold over for some consecutive number (trial) of times we say that there may be a tone (SF) with a given degree of probability.

And you know, the strategy paid off beautifully. The idea worked consistently over many recorded samples including single frequency tones. We also can print or log the frequency as by following formula

frq := max_power_bin * F.R

But all the samples we have were offline recording and we need to integrate this little DSP logic into our VoIP / media stack where frame (RTP, 10 / 20 ms of audio) used to come sequentially and our chosen window size (1024) is not a multiple of frame-length. So some residual sampling handling have to be done. After these integration efforts and some round of testing we became green to go. I hate working at Sunday nights.

Happy programming and have some fun.

-Thanks

Deb. Posted on Friday, November 7, 2008 8:49 PM .NET Core , Scientific Geekology , lab.geek.com | Back to top

Your comment:
Title:
Name:
Comment: *Allowed tags: blockquote, a, strong, em, p, u, strike, super, sub, code*
Verification:
var RecaptchaOptions = {
theme : 'white',
tabindex : 0
};

I am currently working on the same thing. Trying to get frequency from FFT result, but I am new to DSP.

I have FFT code already to process each 1024 samples of WAV.

Can you tell me how to use the FFT result to get frequency of those 1024 samples?

frq := max_power_bin * F.R What does it mean?

Thanks,

ZWW