computer-vision-2

Fine-Grained Image Recognition

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in fine-grained-image-recognition-2

Trend

Dataset

Best Model

Actions

OVEN

CUB-200-2011

CUB Birds

Libraries

i

Use these libraries to find fine-grained-image-recognition-2 models and implementations

Datasets

Subtasks

No subtasks available.

Most implemented papers

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Fan Zhang, Meng Li, G. Zhai, Yizhao Liu•Thu Mar 19 2020

Through the comprehensive experiments demonstrate that the multi-branch and multi-scale learning network, MMAL-Net, has good classification ability and robustness for images of different scales and can achieves state-of-the-art results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets.

136

Content

CNFOOD-241-Chen

Crowd Activity Dataset

WikiChurches

CNFOOD-241-Chen

0

Paper Graph

Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition

Jianlong Fu, Jiebo Luo, Tao Mei, Heliang Zheng•Sat Sep 30 2017

Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other. MA-CNN consists of convolution, channel grouping and part classification sub-networks. The channel grouping network takes as input feature channels from convolutional layers, and generates multiple parts by clustering, weighting and pooling from spatially-correlated channels. The part classification network further classifies an image by each individual part, through which more discriminative fine-grained features can be learned. Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way. MA-CNN does not need bounding box/part annotation and can be trained end-to-end. We incorporate the learned parts from MA-CNN with part-CNN for recognition, and show the best performances on three challenging published fine-grained datasets, e.g., CUB-Birds, FGVC-Aircraft and Stanford-Cars.

905 0

Paper Graph

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Qilong Wang, P. Li, Jiangtao Xie, Zilin Gao•Sun Dec 03 2017

This work proposes an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks, which is much faster than EIG or SVD based methods, since it involves only matrix multiplications, suitable for parallel implementation on GPU.

289 0

Paper Graph

Destruction and Construction Learning for Fine-Grained Image Recognition

Tao Mei, Yalong Bai, Yue Chen, Wei Zhang•Fri May 31 2019

Delicate feature representation about object parts plays a critical role in fine-grained recognition. For example, experts can even distinguish fine-grained objects relying only on object parts according to professional knowledge. In this paper, we propose a novel "Destruction and Construction Learning" (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge. Besides the standard classification backbone network, another "destruction and construction" stream is introduced to carefully "destruct" and then "reconstruct" the input image, for learning discriminative regions and features. More specifically, for "destruction", we first partition the input image into local regions and then shuffle them by a Region Confusion Mechanism (RCM). To correctly recognize these destructed images, the classification network has to pay more attention to discriminative regions for spotting the differences. To compensate the noises introduced by RCM, an adversarial loss, which distinguishes original images from destructed ones, is applied to reject noisy patterns introduced by RCM. For "construction", a region alignment network, which tries to restore the original spatial layout of local regions, is followed to model the semantic correlation among local regions. By jointly training with parameter sharing, our proposed DCL injects more discriminative local details to the classification network. Experimental results show that our proposed framework achieves state-of-the-art performance on three standard benchmarks. Moreover, our proposed method does not need any external knowledge during training, and there is no computation overhead at inference time except the standard classification network feed-forwarding. Source code: https://github.com/JDAI-CV/DCL.

458 0

Paper Graph

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Kristina Toutanova, Hexiang Hu, Yi Luan, Yang Chen, Urvashi Khandelwal, Mandar Joshi, Kenton Lee, Ming-Wei Chang•Tue Feb 21 2023

This study on state-ofthe-art pre-trained models reveals large headroom in generalizing to the massive-scale label space and shows that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning.

92 0

Paper Graph

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Basil Mustafa, N. Houlsby, Mario Lucic, Xiaohua Zhai, Yuanzhong Xu, Alexander Kolesnikov, Lucas Beyer, Mostafa Dehghani, Anurag Arnab, M. Minderer, Arsha Nagrani, Radu Soricut, Sebastian Goodman, Yang Li, Siamak Shakeri, A. Piergiovanni, J. Djolonga, Michael Tschannen, Bo Pang, Soravit Changpinyo, Austin Waters, Gang Li, Hexiang Hu, Mandar Joshi, Kenton Lee, Yi Tay, Ceslee Montgomery, Piotr Padlewski, Xi Chen, A. Angelova, Jialin Wu, Carlos Riquelme Ruiz, Xiao Wang, Daniel M. Salz, Paulina Pietrzyk, Marvin Ritter, Filip Pavetic, Ibrahim M. Alabdulmohsin, J. Amelot, A. Steiner, Daniel Keysers, Keran Rong, Mojtaba Seyedhosseini•Sun May 28 2023

PaLI-X, a multilingual vision and language model, advances the state-of-the-art on most vision-and-language benchmarks considered and observes emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.

255 0

Paper Graph

Local Patch AutoAugment With Multi-Agent Collaboration

Xin Jin, Shiqi Lin, Tao Yu, Ruoyu Feng, Xin Li, Zhibo Chen•Fri Mar 19 2021

This paper proposes a more fine-grained automated DA approach, dubbed Patch AutoAugment, to divide an image into a grid of patches and search for the joint optimal augmentation policies for the patches as a multi-agent reinforcement learning (MARL) problem.

16 0

Paper Graph

Hawkeye: A PyTorch-based Library for Fine-Grained Image Recognition with Deep Learning

Jiabei He, Yang Shen, Xiu-Shen Wei, Ye Wu•Fri Oct 13 2023

Hawkeye is designed with a modular architecture, emphasizing high-quality code and human-readable configuration, providing a comprehensive solution for FGIR tasks, and represents the first open-source PyTorch-based library dedicated to FGIR.

0 0

Paper Graph

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment

Chang Liu, Yu Cao, Yan Luo, Guanling Chen, V. Vokkarane, Yunsheng Ma•Tue May 24 2016

A new Convolutional Neural Network CNN-based food image recognition algorithm is proposed to improve the accuracy of dietary assessment by analyzing the food images captured by mobile devices e.g., smartphone.

274 0

Paper Graph

Piecewise Classifier Mappings: Learning Fine-Grained Learners for Novel Categories With Few Examples

Chunhua Shen, Jianxin Wu, Xiu-Shen Wei, Lingqiao Liu, Peng Wang•Thu May 10 2018

An end-to-end trainable deep network inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task is proposed, which generates the decision boundary via learning a set of more attainable sub-classifiers in a more parameter-economic way.

136 0

Paper Graph

Adding a benchmark result helps the community track progress.