Masquage Auditif Temps-Fréquence: Mesures psychoacoustiques et application à l’analyse-synthèse des sons

Authors: Necciari T.
Publication Date: October 2010 (PhD thesis, University Aix-Marseille I, 2010)

Abstract

Auditory Time-Frequency Masking: Psychoacoustical measures and application to the analysis-synthesis of sound signals

Many audio applications, such as sound analysis-synthesis tools or audio codecs, call for specific signal representations enabling the analysis, processing, and synthesis of non stationary signals. Most of them are concerned with time-frequency (TF) representations such as the Gabor and wavelet transforms that allow decomposing any real-world sound into a set of elementary functions (or “atoms”) well localized in the TF domain. On the purpose of adapting these representations to the human auditory perception, the present study investigated auditory masking in the TF domain.
Masking has been extensively investigated with simultaneous (frequency masking) and non-simultaneous (temporal masking) presentation of masker and target. A few studies examined TF relations of masking between masker and target. Because those studies involved stimuli that are not maximally compact in the TF plane (i.e., they were temporally and/or spectrally broad), their results are not suitable for predicting masking effects between TF atoms. In this study, we investigated auditory TF masking with masker and target signals having minimum spread in the TF plane, namely Gaussian-shaped sinusoids (referred to as Gaussians). The masker had a carrier frequency of 4 kHz and a level of 60 dB SL. Masker and target were separated either in frequency, in time, or both. The results of the TF conditions provide the TF spread of masking for stimuli that are maximally concentrated in the TF domain. The results of the simultaneous and non-simultaneous conditions allowed to show that a simple superposition of frequency and temporal masking functions does not provide an accurate representation of the measured TF masking function for Gaussian maskers. Two additional experiments were carried out that examined the effects of masker level and masker frequency in simultaneous conditions. Decreasing the masker level from 60 to 30 dB SL resulted in a reversal of the masking patterns’ asymmetry and a narrowing of the frequency spread of masking. The frequency spread of masking at 0.75 kHz was similar to that obtained at 4 kHz when compared on an ERB scale. This is compatible with the constant-Q frequency analysis by the human auditory system.
Finally, a first attempt was made to implement the gathered masking data in a sound signal processing algorithm allowing to remove the perceptually irrelevant atoms in the TF representations of audio signals. Potential applications of such an approach are, for instance, audio codecs and sound analysis-synthesis tools.

Abstract (in french)

De nombreuses applications audio, telles que les outils d’analyse-synthèse ou les codeurs audio, nécessitent des représentations des signaux linéaires et adaptées aux signaux non stationnaires. Typiquement, ces représentations sont de types « Gabor » ou « ondelettes ». Elles permettent de décomposer n’importe quel signal en une somme de fonctions élémentaires (ou « atomes ») bien localisées dans le plan temps-fréquence (TF). Dans le but d’adapter ces représentations à la perception auditive humaine, ce travail porte sur l’étude du masquage auditif dans le plan TF.
Dans la littérature, le masquage a été considérablement étudié dans les plans fréquentiel et temporel. Peu d’études se sont intéressées au masquage dans le plan TF. D’autre part, toutes ces études ont employé des stimuli de longue durée et/ou large bande, donc pour lesquels la concentration d’énergie dans le plan TF n’est pas maximale. En conséquence, les résultats ne permettent pas de prédire les effets de masquage entre des atomes TF. Au cours de cette thèse, le masquage a donc été mesuré dans le plan TF avec des stimuli — masque et cible — dotés d’une localisation TF maximale : des sinusoïdes modulées par une fenêtre Gaussienne de courte durée (ERD = 1,7 ms) et à support fréquentiel compact (ERB = 600 Hz). La fréquence du masque était fixée à 4 kHz et son niveau à 60 dB SL. Masque et cible étaient séparés en fréquence, en temps, ou en TF. Les résultats pour les conditions TF fournissent une estimation de l’étalement du masquage TF pour un atome. Les résultats pour les conditions fréquence et temps ont permis de montrer qu’une combinaison linéaire des fonctions de masquage fréquentiel et temporel ne fournit pas une représentation exacte du masquage TF pour un atome. Deux expériences supplémentaires ont été menées afin d’étudier les effets du niveau et de la fréquence du masque Gaussien sur le pattern de masquage fréquentiel. Une diminution du niveau du masque de 60 à 30 dB SL a provoqué un renversement de l’asymétrie des patterns de masquage et un rétrécissement de l’étalement spectral du masquage, conformément à la littérature. La comparaison sur une échelle ERB des patterns mesurés à 0,75 et 4 kHz a révélé un étalement spectral du masquage similaire pour les deux fréquences. Ce résultat est cohérent avec l’analyse fréquentielle à facteur de qualité constant du système auditif.
La thèse s’achève sur une tentative d’implémentation des données psychoacoustiques dans un outil de traitement du signal visant à éliminer les atomes inaudibles dans les représentations TF des signaux sonores. Les applications potentielles d’une telle approche concernent les outils d’analyse-synthèse ou les codeurs audio.