computer-vision-7

Scene Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in scene-recognition-14

Trend

Dataset

Best Model

Actions

YUP++

MIT Indoor Scenes

AID

Libraries

i

Use these libraries to find scene-recognition-14 models and implementations

Datasets

Subtasks

No subtasks available.

Most implemented papers

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Deming Chen, Xiaofan Zhang, Yuhong Li•Mon Feb 26 2018

A network for Congested Scene Recognition called CSRNet is proposed to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps.

1524

Content

SUN-RGBD

Places365

SUN397

ScanNet

ADE20K

AID

0

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Jeff Donahue, O. Vinyals, Trevor Darrell, Yangqing Jia, Judy Hoffman, Ning Zhang, Eric Tzeng•Fri Oct 04 2013

DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

5047 0

Paper Graph

CNN Features Off-the-Shelf: An Astounding Baseline for Recognition

A. Razavian, Hossein Azizpour, Josephine Sullivan, S. Carlsson•Sat Mar 22 2014

A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

5058 0

Paper Graph

Visual Memorability for Robotic Interestingness via Unsupervised Online Learning

S. Scherer, Yuheng Qiu, Wenshan Wang, Yafei Hu, Chen Wang•Sun May 17 2020

A novel translation-invariant visual memory is proposed for recalling and identifying interesting scenes, then a three-stage architecture of long- term, short-term, and online learning is designed that achieves much higher accuracy than the state-of-the-art algorithms on challenging robotic interestingness datasets.

21 0

Paper Graph

Bilinear CNNs for Fine-grained Visual Recognition

Subhransu Maji, Tsung-Yu Lin, Aruni RoyChowdhury•Tue Apr 28 2015

These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner and can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture.

94 0

Paper Graph

Places205-VGGNet Models for Scene Recognition

Y. Qiao, Limin Wang, Sheng Guo, Weilin Huang•Thu Aug 06 2015

This report describes the implementation of training the VGGNets on the large-scale Places205 dataset by using a Multi-GPU extension of Caffe toolbox with high computational efficiency and achieves the state-of-the-art performance of trained models on three datasets.

169 0

Paper Graph

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs

Y. Qiao, Limin Wang, Yuanjun Xiong, Sheng Guo, Weilin Huang•Mon Oct 03 2016

A multi-resolution CNN architecture that captures visual content and structure at multiple levels is proposed and two knowledge guided disambiguation techniques to deal with the problem of label ambiguity are designed.

153 0

Paper Graph

HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN

Paritosh Parmar, B. Morris•Tue Sep 07 2021

Thorough experimental evaluation has shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks and can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth.

7 0

Paper Graph

Indoor Scene Recognition in 3D

K. Schindler, Shengyu Huang, Mikhail (Misha) Usvyatsov•Thu Feb 27 2020

This work studies scene recognition from 3D point cloud (or voxel) data, and shows that it greatly outperforms methods based on 2D birds-eye views, and advocates multi-task learning as a way to improve scene recognition.

21 0

Paper Graph

Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics

Shengfeng He, Wei Liu, Linchao Bao, Jianbo Jiao, Jiangliu Wang, Yunhui Liu•Sun Aug 30 2020

This paper proposes a novel pretext task to address the self-supervised video representation learning problem, inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents.

59 0

Paper Graph

Adding a benchmark result helps the community track progress.