Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

speech-7

Speech Emotion Recognition

3260 papers • 126 benchmarks • 313 datasets

Categorical speech emotion recognition. Emotion categories: Happy (+ excitement), Sad, Neutral, Angry Modality: Speech Only For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in speech-emotion-recognition-22

Trend

Dataset

Best Model

Actions

IEMOCAP

IEMOCAP

CREMA-D

CREMA-D

RAVDESS

RAVDESS

Libraries

i

Use these libraries to find speech-emotion-recognition-22 models and implementations

raulsteleac/Speech_Emotion_Recognit…

3 papers 11

Datasets

IEMOCAP

RAVDESS

CREMA-D

ShEMO

SAVEE

EmoDB Dataset

Subtasks

No subtasks available.

Most implemented papers

Continuous control with deep reinforcement learning

A. Pritzel, N. Heess, Daan Wierstra, David Silver, T. Lillicrap, Jonathan J. Hunt, Tom Erez, Yuval Tassa•Tue Sep 08 2015

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

14495

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

MSP-Podcast (Valence)

MSP-Podcast (Valence)

MSP-Podcast (Activation)

MSP-Podcast (Activation)

MSP-Podcast (Dominance)

MSP-Podcast (Dominance)

ShEMO

ShEMO

EmoDB Dataset

EmoDB Dataset

CASIA

CASIA

EMOVO

EMOVO

SAVEE

SAVEE

aris-ai/Audio-and-text-based-emotio…

2 papers 68

EMOVO

EMOVO

SEWA DB

MSP-Podcast

AESI

0

Multimodal Speech Emotion Recognition Using Audio and Text

Seunghyun Yoon, Seokhyun Byun, Kyomin Jung•Tue Oct 09 2018

The proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.

341 0

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Gaurav Sahu•Thu Apr 11 2019

It is shown that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.

52 0

Speech Emotion Recognition Using Multi-hop Attention Mechanism

Seunghyun Yoon, Seokhyun Byun, Kyomin Jung, S. Dey•Mon Apr 22 2019

A framework to exploit acoustic information in tandem with lexical data using two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance and an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities.

130 0

Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions

Suraj Tripathi, Abhay Kumar, A. Ramesh, Chirag Singh, Promod Yenigalla•Mon Jun 10 2019

The combined MFCC-Text Convolutional Neural Network model proved to be the most accurate in recognizing emotions in IEMOCAP data.

61 0

Compact Graph Architecture for Speech Emotion Recognition

T. Guha, A. Shirian•Tue Aug 04 2020

The proposed deep graph approach to address the task of speech emotion recognition achieves state-of-the-art performance with significantly fewer learnable parameters and outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of the approach.

50 0

Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset

Berrak Sisman, Rui Liu, Haizhou Li, Kun Zhou•Tue Oct 27 2020

A novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) makes use of a pre-trained speech emotion recognition model to transfer emotional style during training and at run-time inference, which achieves remarkable performance by consistently outperforming the baseline framework.

244 0

AST: Audio Spectrogram Transformer

Yuan Gong, James R. Glass, Yu-An Chung•Sun Apr 04 2021

The Audio Spectrogram Transformer is introduced, the first convolution-free, purely attention-based model for audio classification, which achieves new state-of-the-art results on various audio classification benchmarks.

1188 0

SERAB: A Multi-Lingual Benchmark for Speech Emotion Recognition

Neil Scheidwasser, M. Kegler, P. Beckmann, M. Cernak•Wed Oct 06 2021

The proposed Speech Emotion Recognition Adaptation Benchmark (SERAB) is a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER, and a selection of standard hand-crafted feature sets and state-of-the-art DNN representations are evaluated.

53 0

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

Leonardo Pepino, P. Riera, Luciana Ferrer•Wed Apr 07 2021

This work proposes a transfer learning method for speech emotion recognition where features extracted from pre-trained wav2vec 2.0 models are modeled using simple neural networks, showing superior performance compared to results in the literature.

432 0

Adding a benchmark result helps the community track progress.

Speech Emotion Recognition | State-of-the-Art