The following article has been re-posted, as-is, from the now defunct SubjectiveMachine blog. First published Oct 9, 2011.
Here are the beginnings of a (slightly) more principled look at how my prototype implementation of a scale-independent “frequency” transform, called PWT, differs from the standard windowed approach, called STFT.
Below are frequency analyses of a synthesized signal using both techniques. The signal consists of three successive waves: a simple sine wave, a sawtooth wave, and a square wave. Each wave descends logarithmically in pitch from 10,000 hz to 10 hz over the course of half a second or so. The sample rate is 44,100 hz. For STFT, I’ve generated output using a range of window (block) sizes: 128 samples, 512, and 2048 samples. For PWT there is no window parameter, so only one output is given.
Below are the same outputs on a logarithmic scale, to give a slightly different view.
One of the most notable features in these plots is the lack of harmonics present in the PWT analysis of the sawtooth wave. This is because PWT does not analyse the signal in terms of constituent sine waves. Rather, it is looking for periodic signals in a more general sense, and in this sense there is little difference between the two waveforms (note that the square wave is slightly harder to deal with in this respect and this is one of the things I am looking at addressing).
This property of PWT might make it particularly useful for identifying tonal content in music, or formants in human speech (see the plot in the previous post), in a way that is robust to the more subtle timbral variations between different speakers and instruments. Maybe 🙂
Notice also that the frequency appears to be identified much more precisely and consistently with PWT (with the debatable exceptions of the square wave and the very high frequencies). In contrast, the precision of STFT is at its best where the window size is matched to the wavelength (i.e. large window sizes give more precise results for low frequencies and vice-versa), although moderate “bleeding” is present in all of the examples given here.