3260 papers • 126 benchmarks • 313 datasets
Extract the dialogue content of the specified target in a multi-person dialogue.
(Image credit: Papersgraph)
These leaderboards are used to track progress in target-speaker-extraction-1
No benchmarks available.
Use these libraries to find target-speaker-extraction-1 models and implementations
No datasets available.
No subtasks available.
A unified speaker verification framework for both single and multi-talker speech, that is able to pay selective auditory attention to the target speaker, that jointly optimizes a speaker attention module and a speaker representation module via multitask learning is proposed.
This work proposes a self-supervised pre-training strategy, to exploit the speech-lip synchronization cue for target speaker extraction, which allows us to leverage abundant unlabeled in-domain data.
This paper designs a speaker localizer driven by the target speaker’s embedding to extract the spatial features, including direction-of-arrival (DOA) of thetarget speaker and beamforming output, and proposes an end-to-end localized target speaker extraction on pure speech cues, that is called L-SpEx.
Experimental results show that the proposed loss function reduces the over-suppression and improves the word error rate of speech recognition on both clean and noisy two-speakers mixtures, without harming the re-constructed speech quality.
A joint speaker extraction and visual embedding inpainting framework to explore the mutual benefits and shows that the experimental results show that the proposed method outperforms the baseline in terms of signal quality, perceptual quality, and intelligibility.
An improved implementation of GSS is described that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference.
Adding a benchmark result helps the community track progress.