Application of time-frequency and time-scale methods to the analysis, synthesis and transformation of natural sounds.

Authors: Kronland-Martinet R., Grossmann A.
Publication Date: October 1991
Journal: Representations of Musical Signals (Invited paper, C. Roads, G. De Poli, A. Picciali Eds, pp 45-85, MIT Press)

Tags: Musical Sounds, Timbre, Wavelets

One of the critical problems in the use of digital synthesis of sound, whether in real time or deffered time, is the establishment of a correspondence between synthetic and natural sounds- Since 1957 the digital synthesis of sound has been successfully used in the domains of speech and music; it became clear, however, that one of the main problems would be to relate it with natural sound. The first synthetic sounds were obtained with calcu-lations of samples given by simple mathematical models without direct reference to real sounds. However, it is musically interesting to be able to synthesize sounds that imitate or refer to natural sounds. These sounds would have the richness and distinctiveness of natural sounds and could still be manipulated as is done in all systems of synthesis. To approach such a problem, it is necessary to bring two complementary aspects into play: the analysis and the resynthesis of signals (natural sound waves). This leads to the setting up of relationships between the physical (or psychoacoustic) parameters extracted from the analysis and the pa-rameters of synthesis corresponding to a mathematical algorithm. The analysis aspect should take into account significant parameters such as frequency, time envelopes, microvariations (accidental noise, random or regular modulations), and the distribution of partials. But it should also encompass data reduction resulting from the characteristics of auditory perception, For instance, the subjective notion of timbre cannot be always modeled with the help of the Fourier transform of the signal only. In the case of instruments with formants, it is useful to separate the contributions of the resonances of the physical system (the modes of the piano, resonant modes of the voice, and so on) from the effects of the system of excitation (the struck or plucked string, vibrating vocal chords, and the like). The synthesis aspect consists of the creation of digitally and musically efficient algorithms. However, the parameters that determine the produc-tion of sound only rarely come from existing natural sounds. There are, however, cases where synthesis gives convincing results (for example, with trumpet or voice). Some methods, such as additive or subtractive synthesis, frequency modulation, or waveshaping can give adequate results, particu-larly if they are completed by adding microvariations or combined with other methods.

The work we present is an approach to the problem of extracting parameters for synthesis and sound modifications based on the use of analysis and synthesis methods that combine time and frequency informa-tion. In this framework we deal with digital synthesis and analysis of signals by parametric and nonparametric methods. We stress in particular the exploration of acoustic applications of a new method of signal decomposi-tion, the wavelet transform. This method, which is strictly speaking “time-scale” rather than “time-frequency,” has turned out to be very fruitful, especially in the area of analysis-synthesis, When one speaks of analysis, one thinks in general about a mathematical representation— as faithful as possible—of a physical phenomenon de-scribed in mathematical form. The parameters appearing in the representa-tion must consequently be related in a straightforward way to physical parameters that represent the real world. In the case of audible signals, the “real world” is not limited to the phenomena of sound production and propagation but includes also a biological captor of the greatest impor-tance: our ear. Although many studies in psychoacoustics have contributed to our understanding of the auditory system, it is nevertheless true that the only criterion for deciding on the auditory relevance of a physical parameter is still the ear. Using this criterion, Jean-Claude Risset has developed a powerful technique, analysis by synthesis, which consists of refining and characterizing the parameters of a method of synthesis on the basis of its psychoacoustical effect. These results have been quite conclusive, especially in the analysis of the timbre of the trumpet, pointing out the importance of the temporal aspect associated with the evolution of the spectral components. Methods of digital synthesis attempt to simulate the sometimes rapid evolutions of sound through the manipulation of parameters that should -if possible—have psychoacoustical relevance. Relating those parameters to parameters coming out of analysis requires time-frequency analysis methods, so that the time evolutions of spectral properties of the analyzed signal can be described. Time-frequency methods can be divided into two types: parametric methods and nonparametric methods. Methods of the first kind consist of the determination of parameters in a specific model of sound production. Therefore they require some a priori knowledge about the signal being analyzed. In this chapter we are mostly concerned with “blind” analysis and so with nonparametric methods, supplemented when necessary by parametric methods after a “precharac-terization” phase. However, in some situations (signals with formants) parametric methods are very useful. This will be illustrated by the example of cross-synthesis of two natural sounds. In cross-synthesis the charac-teristics of one source sound are used to drive a system whose response is based on another source sound.