3260 papers • 126 benchmarks • 313 datasets
Categorical speech emotion recognition. Emotion categories: Happy (+ excitement), Sad, Neutral, Angry Modality: Speech Only For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-emotion-recognition-22
Use these libraries to find speech-emotion-recognition-22 models and implementations
No subtasks available.
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
The proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
It is shown that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.
A framework to exploit acoustic information in tandem with lexical data using two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance and an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities.
The combined MFCC-Text Convolutional Neural Network model proved to be the most accurate in recognizing emotions in IEMOCAP data.
The proposed deep graph approach to address the task of speech emotion recognition achieves state-of-the-art performance with significantly fewer learnable parameters and outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of the approach.
A novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) makes use of a pre-trained speech emotion recognition model to transfer emotional style during training and at run-time inference, which achieves remarkable performance by consistently outperforming the baseline framework.
The Audio Spectrogram Transformer is introduced, the first convolution-free, purely attention-based model for audio classification, which achieves new state-of-the-art results on various audio classification benchmarks.
The proposed Speech Emotion Recognition Adaptation Benchmark (SERAB) is a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER, and a selection of standard hand-crafted feature sets and state-of-the-art DNN representations are evaluated.
This work proposes a transfer learning method for speech emotion recognition where features extracted from pre-trained wav2vec 2.0 models are modeled using simple neural networks, showing superior performance compared to results in the literature.
Adding a benchmark result helps the community track progress.