Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

speech-8

Multimodal Emotion Recognition

3260 papers • 126 benchmarks • 313 datasets

This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are A: Acoustic T: Text V: Visual Please include the modality in the bracket after the model name. All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multimodal-emotion-recognition-25

Trend

Dataset

Best Model

Actions

IEMOCAP

IEMOCAP

Expressive hands and faces dataset (EHF).

Expressive hands and faces dataset (EHF).

Libraries

i

Use these libraries to find multimodal-emotion-recognition-25 models and implementations

SenticNet/conv-emotion

2 papers 1,030

Datasets

IEMOCAP

CPED

EMOTIC

DEAP

SES

AESI

Subtasks

No subtasks available.

Most implemented papers

Multimodal Speech Emotion Recognition Using Audio and Text

Seunghyun Yoon, Seokhyun Byun, Kyomin Jung•Tue Oct 09 2018

The proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.

341

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

0

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Gaurav Sahu•Thu Apr 11 2019

It is shown that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.

52 0

Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Feiyang Chen, Ziqian Luo, Yanyan Xu•Tue Apr 16 2019

Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition.

86 0

End-to-End Multimodal Emotion Recognition Using Deep Neural Networks

S. Zafeiriou, M. Nicolaou, Björn Schuller, Panagiotis Tzirakis, George Trigeorgis•Wed Apr 26 2017

This work proposes an emotion recognition system using auditory and visual modalities using a convolutional neural network to extract features from the speech, while for the visual modality a deep residual network of 50 layers is used.

614 0

Context-Dependent Sentiment Analysis in User-Generated Videos

Amir Zadeh, Louis-philippe Morency, E. Cambria, Devamanyu Hazarika, Soujanya Poria, Navonil Majumder•Fri Jun 30 2017

A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.

808 0

Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning

Samarth Tripathi, Sarthak Tripathi, Homayoon Beigi•Sun Apr 15 2018

This approach is the first that uses the multiple modes of data offered by IEMOCAP for a more robust and accurate emotion detection, and it is hoped that this approach will help improve the quality of emotion detection systems in the future.

163 0

DialogueRNN: An Attentive RNN for Emotion Detection in Conversations

E. Cambria, Devamanyu Hazarika, Soujanya Poria, Navonil Majumder, Rada Mihalcea, Alexander Gelbukh•Wed Oct 31 2018

A new method based on recurrent neural networks that keeps track of the individual party states throughout the conversation and uses this information for emotion classification and outperforms the state of the art by a significant margin on two different datasets.

864 0

Emotion Recognition in Audio and Video Using Deep Neural Networks

Mandeep Singh, Yuanye Fang•Sun Jun 14 2020

This work attempts to explore different neural networks to improve accuracy of emotion recognition and finds (CNN+RNN) + 3DCNN multi-model architecture which processes audio spectrograms and corresponding video frames giving emotion prediction accuracy of 54.0% among 4 emotions and 71.75% among 3 emotions using IEMOCAP[2] dataset.

21 0

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

E. Cambria, Devamanyu Hazarika, Soujanya Poria, Navonil Majumder, Alexander Gelbukh•Fri Jun 15 2018

A novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities, which outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate.

362 0

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition

Zheng Lian, Ya Li, J. Tao, Jian Huang•Wed Sep 12 2018

This paper presents the effort for the audio-video based sub-challenge of the Emotion Recognition in the Wild (EmotiW) 2018 challenge, which requires participants to assign a single emotion label to the video clip from the six universal emotions.

22 0

Adding a benchmark result helps the community track progress.

Multimodal Emotion Recognition | State-of-the-Art