The proposed approach is based on a small Recurrent Neural Network (RNN) which adds memory to the residual echo suppressor, enabling it to compensate both types of non-linear distortions.
The acoustic front-end of hands-free communication devices introduces a variety of distortions to the linear echo path between the loudspeaker and the microphone. While the ampli-fiers may introduce a memory-less non-linearity, mechanical vibrations transmitted from the loudspeaker to the microphone via the housing of the device introduce non-linarities with memory, which are much harder to compensate. These distortions significantly limit the performance of linear Acoustic Echo Cancellation (AEC) algorithms. While there already exists a wide range of Residual Echo Suppressor (RES) techniques for individual use cases, our contribution specifically aims at a low-resource implementation that is also real-time capable. The proposed approach is based on a small Recurrent Neural Network (RNN) which adds memory to the residual echo suppressor, enabling it to compensate both types of non-linear distortions. We evaluate the performance of our system in terms of Echo Return Loss En-hancement (ERLE), Signal to Distortion Ratio (SDR) and Word Error Rate (WER), obtained during realistic double-talk situations. Further, we compare the postfilter against a state-of-the art implementation. Finally, we analyze the numerical complexity of the overall system.