Sparse Decomposition OF Audio Signals Using A Perceptual Measure of Distortion. Application to Lossy Audio Coding

Authors: Toumi, I., Derrien O.
Publication Date: December 2015
Journal: International Conference on Digital Audio Effects (DAFx-15) (Trondheim, Norway, Nov 30 - Dec 3, 2015)

Tags: Analysis Synthesis, Audio Coding, Time-Frequency Masking

Abstract

State-of the art audio codecs use time-frequency transforms derived from cosine bases, followed by a quantification stage. The quantization steps are set according to perceptual considerations. In the last decade, several studies applied adaptive sparse time-frequency transforms to audio coding, e.g. on unions of cosine bases using a Matching-Pursuit-derived algorithm. This was shown to significantly improve the coding efficiency. We propose another approach based on a variational algorithm, i.e. the optimization of a cost function taking into account both a perceptual distortion measure derived form a hearing model and a sparsity constraint, which favors the coding efficiency. In this early version, we show that, using a coding scheme without perceptual control of quantization, our method outperforms a codec from the literature with the same quantization scheme. In future work, a more sophisticated quantization scheme would probably allow our method to challenge standard codecs e.g. AAC.