3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in fine-grained-visual-recognition-7
No benchmarks available.
Use these libraries to find fine-grained-visual-recognition-7 models and implementations
No subtasks available.
This work proposes a novel Attribute-Mask RCNN model to jointly perform instance segmentation and localized attribute recognition, and provides a novel evaluation metric for the task.
These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner and can be trained from scratch on the ImageNet dataset offering consistent improvements over the baseline architecture.
A deep siamese architecture that when trained on positive and negative pairs of images learn an embedding that accurately approximates the ranking of images in order of visual similarity notion is presented.
The proposed MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample size, and can be regarded as Power-Euclidean metric between covariances, effectively exploiting their geometry.
This work proposes a novel approach explicitly designed to address a number of subtle yet important issues which have stymied earlier DML algorithms, which maintains an explicit model of the distributions of the different classes in representation space and employs this knowledge to adaptively assess similarity, and achieve local discrimination by penalizing class distribution overlap.
A cross-layer bilinear pooling approach is proposed to capture the inter-layer part feature relations, which results in superior performance compared with other bilinears pooling based approaches.
This paper proposes a competing novel CNN architecture, called MILDNet, which merits by being vastly compact (about 3 times), and Inspired by the fact that successive CNN layers represent the image with increasing levels of abstraction, compressed the authors' deep ranking model to a single CNN by coupling activations from multiple intermediate layers along with the last layer.
A unified attention block --- X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning is introduced.
A novel dataset FeatherV1, containing 28,272 images of feathers categorized by 595 bird species, was created to perform taxonomic identification of bird species by a single feather, which can be applied in amateur and professional ornithology.
This work uses CLIP (Contrastive Language-Image Pre-Training) for training a neural network on a variety of art images and text pairs, being able to learn directly from raw descriptions about images, or if available, curated labels, with zero-shot capability.
Adding a benchmark result helps the community track progress.