This paper received the 2nd Best Paper Award. Decorrelation of audio signals is a critical step for spatial sound reproduction on multichannel configurations. Correlated signals yield a focused phantom source between the reproduction loudspeakers and may produce undesirable comb-filtering artifacts when the signal reaches the listener with small phase differences. Decorrelation techniques reduce such artifacts and extend the spatial auditory image by randomizing the phase of a signal while minimizing the spectral coloration. This paper proposes a method to optimize the decorrelation properties of a sparse noise sequence, called velvet noise, to generate short sparse FIR decorrelation filters. The sparsity allows a highly efficient time-domain convolution. The listening test results demonstrate that the proposed optimization method can yield effective and colorless decorrelation filters. In comparison to a white noise sequence, the filters obtained using the proposed method preserve better the spectrum of a signal and produce good quality broadband decorrelation while using 76% fewer operations for the convolution. Satisfactory results can be achieved with an even lower impulse density which decreases the computational cost by 88%.
Figure 1: Decorrelator sequences in the time domain: white noise $\mathrm{WN}$, exponential velvet noise $\mathrm{EVN}30$, and two optimized velvet-noise sequences $\mathrm{OVN}30$ and $\mathrm{OVN}15$. Positive impulses are indicated by $\bullet$ and negative gains by $\circ$ (except for $\mathrm{WN}$).
A central challenge in decorrelation is the coloration caused by a non-flat magnitude response of the decorrelator. The continuous formulation plays a critical role in the optimization process as it allows continuous modification of both impulse location and impulse gain.
**Figure 2:** Single pulse optimization with corresponding phase slope of the frequency response.The optimization problem is a constrained, non-linear and non-convex problem such that the optimal solution, i.e., the global minimum, is generally difficult to find. However, local minima can be attained by various gradient descent algorithms. Here we employ a variant of the interior-point method. The initial point is given by a randomly generated EVN.
**Figure 3:** Complete optimization procedure of a velvet noise sequence and the corresponding magnitude response.We provide 32 velvet noise sequences optimized for a sampling frequency of 48 kHz, both as MATLAB MAT files and as WAV files.
32 Decorrelators 30 & 15 pulses as .mat
32 Decorrelators with 30 pulses as .wav
32 Decorrelators with 15 pulses as .wav
The first listening test evaluated how much the decorrelation filters colorate the input signal. The input signal was convolved with a single decorrelation filter, and the difference to the unprocessed signal was rated by the participants. In MUSHRA terminology, the unprocessed mono signal was the reference, and the input signal processed with a lowpass filter having a 3.5 kHz cutoff frequency was the anchor. The resulting mono signals were reproduced on both headphone channels. The main coloration was expected to be caused by the change in timbre and smearing of transients.
First set of decorrelators with drum signal
First set of decorrelators with guitar signal
First set of decorrelators with female vocalist
First set of decorrelators with speech signal
The second listening test evaluated the effectiveness of the decorrelators in extending the auditory source width and the overall spatial quality. The input signal was convolved with a decorrelation filter for each channel (left and right) and the participants were asked to rate the perceived width, localization at the center, and overall quality. In this test, no ideal reference could be defined, so the unprocessed mono signal was provided only for guidance. The lowpass filtered mono signal was given as the anchor. The resulting stereo signal was reproduced on the left and right headphone channels.
Stereo decorrelators with drum signal
Stereo decorrelators with guitar signal
Stereo decorrelators with female vocalist
Stereo decorrelators with speech signal
Trackswitch.js was developed by Nils Werner, Stefan Balke, Fabian-Rober Stöter, Meinard Müller and Bernd Edler.