3260 papers • 126 benchmarks • 313 datasets
Emotion Recognition is an important area of research to enable effective human-computer interaction. Human emotions can be detected using speech signal, facial expressions, body language, and electroencephalography (EEG). Source: Using Deep Autoencoders for Facial Expression Recognition
(Image credit: Papersgraph)
These leaderboards are used to track progress in emotion-recognition-8
Use these libraries to find emotion-recognition-8 models and implementations
The Multimodal EmotionLines Dataset (MELD), an extension and enhancement of Emotion lines, contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends and shows the importance of contextual and multimodal information for emotion recognition in conversations.
The Recurrent Attended Variation Embedding Network (RAVEN) is proposed that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues to capture the dynamic nature of non verbal intents.
The proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
It is shown that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.
A 2-step approach is proposed to address this new ECPE task, which first performs individual emotion extraction and cause extraction via multi-task learning, and then conduct emotion-cause pairing and filtering.
The proposed architecture is independent of any hand-crafted feature extraction and performs better than the earlier proposed convolutional neural network based approaches and visualize the automatically extracted features which have been learned by the network in order to provide a better understanding.
The main strength of the model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent part called the Long-short Term Hybrid Memory (LSTHM).
The Low-rank Multimodal Fusion method is proposed, which performs multimodal fusion using low-rank tensors to improve efficiency and is indeed much more efficient in both training and inference compared to other methods that utilize tensor representations.
Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition.
This work trains a unified model to perform three tasks: facial action unit detection, expression classification, and valence-arousal estimation, and proposes an algorithm for the multitask model to learn from missing (incomplete) labels.
Adding a benchmark result helps the community track progress.