3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in fine-grained-image-recognition-2
Use these libraries to find fine-grained-image-recognition-2 models and implementations
No subtasks available.
Through the comprehensive experiments demonstrate that the multi-branch and multi-scale learning network, MMAL-Net, has good classification ability and robustness for images of different scales and can achieves state-of-the-art results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets.
Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other. MA-CNN consists of convolution, channel grouping and part classification sub-networks. The channel grouping network takes as input feature channels from convolutional layers, and generates multiple parts by clustering, weighting and pooling from spatially-correlated channels. The part classification network further classifies an image by each individual part, through which more discriminative fine-grained features can be learned. Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way. MA-CNN does not need bounding box/part annotation and can be trained end-to-end. We incorporate the learned parts from MA-CNN with part-CNN for recognition, and show the best performances on three challenging published fine-grained datasets, e.g., CUB-Birds, FGVC-Aircraft and Stanford-Cars.
This work proposes an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks, which is much faster than EIG or SVD based methods, since it involves only matrix multiplications, suitable for parallel implementation on GPU.
Delicate feature representation about object parts plays a critical role in fine-grained recognition. For example, experts can even distinguish fine-grained objects relying only on object parts according to professional knowledge. In this paper, we propose a novel "Destruction and Construction Learning" (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge. Besides the standard classification backbone network, another "destruction and construction" stream is introduced to carefully "destruct" and then "reconstruct" the input image, for learning discriminative regions and features. More specifically, for "destruction", we first partition the input image into local regions and then shuffle them by a Region Confusion Mechanism (RCM). To correctly recognize these destructed images, the classification network has to pay more attention to discriminative regions for spotting the differences. To compensate the noises introduced by RCM, an adversarial loss, which distinguishes original images from destructed ones, is applied to reject noisy patterns introduced by RCM. For "construction", a region alignment network, which tries to restore the original spatial layout of local regions, is followed to model the semantic correlation among local regions. By jointly training with parameter sharing, our proposed DCL injects more discriminative local details to the classification network. Experimental results show that our proposed framework achieves state-of-the-art performance on three standard benchmarks. Moreover, our proposed method does not need any external knowledge during training, and there is no computation overhead at inference time except the standard classification network feed-forwarding. Source code: https://github.com/JDAI-CV/DCL.
This study on state-ofthe-art pre-trained models reveals large headroom in generalizing to the massive-scale label space and shows that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning.
PaLI-X, a multilingual vision and language model, advances the state-of-the-art on most vision-and-language benchmarks considered and observes emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.
This paper proposes a more fine-grained automated DA approach, dubbed Patch AutoAugment, to divide an image into a grid of patches and search for the joint optimal augmentation policies for the patches as a multi-agent reinforcement learning (MARL) problem.
Hawkeye is designed with a modular architecture, emphasizing high-quality code and human-readable configuration, providing a comprehensive solution for FGIR tasks, and represents the first open-source PyTorch-based library dedicated to FGIR.
A new Convolutional Neural Network CNN-based food image recognition algorithm is proposed to improve the accuracy of dietary assessment by analyzing the food images captured by mobile devices e.g., smartphone.
An end-to-end trainable deep network inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task is proposed, which generates the decision boundary via learning a set of more attainable sub-classifiers in a more parameter-economic way.
Adding a benchmark result helps the community track progress.