3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-extraction-1
Use these libraries to find speech-extraction-1 models and implementations
No subtasks available.
Strategies for improving the speaker discrimination capability of SpeakerBeam are investigated and it is shown experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures, and outperform TasNet in terms of target speech extraction.
This work proposes two methods for exploiting the multi-channel spatial information to extract the target speech by using a target speech adaptation layer in a parallel encoder architecture and designing a channel decorrelation mechanism to Extract the inter-channel differential information to enhance theMulti-channel encoder representation.
This paper explores a sequential approach for target speech extraction by combining blind source separation (BSS) with the x-vector based speaker recognition (SR) module, and extends the training of MVAE to evaluate its generalization to unseen speakers.
A densely-connected pyramid complex convolutional network, termed DPCCN, is proposed to improve the robustness of speech separation under complicated conditions and is generalized to target speech extraction (TSE) by integrating a new specially designed speaker encoder.
The experiments in a difficult speech extraction scenario confirm the importance of non-linear spatial filtering, which outperforms an oracle linear spatial filter by 0.24 POLQA score, and demonstrate that joint processing results in a large performance gap of 0.4 POLZA score between network architectures exploiting spectral versus temporal information besides spatial information.
It is observed that BSS is relatively robust to emotions, while TSE, which requires identifying and extracting the speech of a tar-get speaker, is much more sensitive to emotions.
This article focuses on recent neural-based approaches in target speech/speaker extraction and presents an in-depth overview of TSE, guiding readers through the different major approaches, emphasizing the similarities among frameworks and discussing potential future directions.
This paper presents the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker.
Adding a benchmark result helps the community track progress.