3260 papers • 126 benchmarks • 313 datasets
Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here. ( Image credit: Tensorflow Object Detection API )
(Image credit: Papersgraph)
These leaderboards are used to track progress in object-recognition
Use these libraries to find object-recognition models and implementations
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
A deep convolutional neural network architecture codenamed Inception is proposed that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
The proposed Residual Attention Network is a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion and can be easily scaled up to hundreds of layers.
The results suggest that prediction represents a powerful framework for unsupervised learning, allowing for implicit learning of object and scene structure.
The role of scale in pre-trained deep networks is explored, providing ways to extrapolate networks tuned for limited scales to rather extreme ranges and demonstrating state-of-the-art results on massively-benchmarked face datasets.
This work identifies a vocabulary of forty-seven texture terms and uses them to describe a large dataset of patterns collected "in the wild", and shows that they both outperform specialized texture descriptors not only on this problem, but also in established material recognition datasets.
This work equips the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement, and develops a new network structure, called SPP-net, which can generate a fixed-length representation regardless of image size/scale.
A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
A new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition is introduced, showing that across layers the texture representations increasingly capture the statistical properties of natural images while making object information more and more explicit.
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Adding a benchmark result helps the community track progress.