An Efficient Time–Frequency Method for Synthesizing Noisy Sounds With Short Transients and Narrow Spectral Components


Authors: Marelli D., Aramaki M., Kronland-Martinet R., Verron C.
Publication Date: May 2012
Journal: IEEE Transactions on Audio, Speech and Language Processing (vol. 20(4), pp. 1400-1408, 2012)



The inverse fast Fourier transform (IFFT) method is a time–frequency technique which was proposed to alleviate the complexity of the additive sound synthesis method in real-time applications. However, its application is limited by its inherent tradeoff between time and fre- quency resolutions, which are determined by the number of frequencies used for time–frequency processing. In a previous work, the authors proposed a frequency-refining technique for overcoming this frequency limitation, permitting achieving any time and frequency resolution using a small number of frequencies. In this correspondence we extend this work, by proposing a time-refining technique which permits overcoming the time resolution limitation for a given number of frequencies. Additionally, we propose an alternative to the frequency-refining technique proposed in our previous work, which requires about half the computations. The combination of these two results permits achieving any time and frequency resolution for any given number of frequencies. Using this property, we find the number of frequencies which minimizes the overall complexity. We do so considering two different application scenarios (i.e., offline sound design and online real-time synthesis). This results in a major complexity reduction in comparison with the design proposed in our previous work.

Glass impact sound

Synthesis using:

Other sound examples

Synthesis using time-frequency domain (TFD) and time-domain (TD) methods

Aerodynamic sounds


Solid sounds

Liquid sounds

The synthesis parameters have been obtained with different methods (spectral analysis of real sounds, physical models, heuristics, …).

Sound examples corresponding to Section V.A

Synthesis with time-varying amplitudes applied in the time domain (TD) and in the subband domain (SD) with 32, 64, 128, 256, 512 and 1024 subbands used.

Glass sound

Fire sound