3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in scene-recognition-14
Use these libraries to find scene-recognition-14 models and implementations
No subtasks available.
A network for Congested Scene Recognition called CSRNet is proposed to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps.
DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.
A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.
A novel translation-invariant visual memory is proposed for recalling and identifying interesting scenes, then a three-stage architecture of long- term, short-term, and online learning is designed that achieves much higher accuracy than the state-of-the-art algorithms on challenging robotic interestingness datasets.
These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner and can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture.
This report describes the implementation of training the VGGNets on the large-scale Places205 dataset by using a Multi-GPU extension of Caffe toolbox with high computational efficiency and achieves the state-of-the-art performance of trained models on three datasets.
A multi-resolution CNN architecture that captures visual content and structure at multiple levels is proposed and two knowledge guided disambiguation techniques to deal with the problem of label ambiguity are designed.
Thorough experimental evaluation has shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks and can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth.
This work studies scene recognition from 3D point cloud (or voxel) data, and shows that it greatly outperforms methods based on 2D birds-eye views, and advocates multi-task learning as a way to improve scene recognition.
This paper proposes a novel pretext task to address the self-supervised video representation learning problem, inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents.
Adding a benchmark result helps the community track progress.