3260 papers • 126 benchmarks • 313 datasets
Categorical speech emotion recognition. Emotion categories: Happy (+ excitement), Sad, Neutral, Angry Modality: Speech Only For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-emotion-recognition-25
Use these libraries to find speech-emotion-recognition-25 models and implementations
No subtasks available.
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
It is shown that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.
The proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
A framework to exploit acoustic information in tandem with lexical data using two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance and an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities.
The proposed deep graph approach to address the task of speech emotion recognition achieves state-of-the-art performance with significantly fewer learnable parameters and outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of the approach.
The combined MFCC-Text Convolutional Neural Network model proved to be the most accurate in recognizing emotions in IEMOCAP data.
A novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) makes use of a pre-trained speech emotion recognition model to transfer emotional style during training and at run-time inference, which achieves remarkable performance by consistently outperforming the baseline framework.
The Audio Spectrogram Transformer is introduced, the first convolution-free, purely attention-based model for audio classification, which achieves new state-of-the-art results on various audio classification benchmarks.
This work proposes a transfer learning method for speech emotion recognition where features extracted from pre-trained wav2vec 2.0 models are modeled using simple neural networks, showing superior performance compared to results in the literature.
The proposed Speech Emotion Recognition Adaptation Benchmark (SERAB) is a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER, and a selection of standard hand-crafted feature sets and state-of-the-art DNN representations are evaluated.
Adding a benchmark result helps the community track progress.