predict the degrees of valence and arousal for the given vocal bursts
3260 papers • 126 benchmarks • 313 datasets
predict the degrees of valence and arousal for the given vocal bursts
(Image credit: Papersgraph)
These leaderboards are used to track progress in speech-emotion-recognition
Use these libraries to find speech-emotion-recognition models and implementations
No subtasks available.
IBN-Net is presented, a novel convolutional architecture, which remarkably enhances a CNN’s modeling ability on one domain as well as its generalization capacity on another domain without finetuning.
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
It is shown that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process, and numerical experiments suggests that packing provides significant improvements in practice as well.
This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation, resulting in a content-aware generative algorithm that offers meaningful control over the outcome.
This report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain, and extends the Caffe toolbox into Multi-GPU implementation with high computational efficiency and low memory consumption.
It is shown that both the pairing and global features are useful on their own, and their combination achieved an F1 of 92.6% of identifying insentence discourse boundaries, which is a 17.8% error-rate reduction over the state of the art performance.
The authors' ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency and significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.
In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.
This paper presents a simple two-stream feature interaction model, namely FinalMLP, which employs only MLPs in both streams yet achieves surprisingly strong performance and could serve as a new strong baseline for future development of two- stream CTR models.
Adding a benchmark result helps the community track progress.