Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

miscellaneous-3

Image Retrieval with Multi-Modal Query

3260 papers • 126 benchmarks • 313 datasets

The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in image-retrieval-with-multi-modal-query-3

Trend

Dataset

Best Model

Actions

Fashion200k

Fashion200k

MIT-States

MIT-States

FashionIQ

FashionIQ

Libraries

i

Use these libraries to find image-retrieval-with-multi-modal-query-3 models and implementations

Datasets

MIT-States

Fashion IQ

Subtasks

Cross-Modal Retrieval Zero-Shot Cross-Modal Retrieval Cross-Modal Information Retrieval Multi-Modal Person Identification

Most implemented papers

Show and tell: A neural image caption generator

O. Vinyals, Samy Bengio, D. Erhan, Alexander Toshev•Sun Nov 16 2014

This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.

6333

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

0

A simple neural network module for relational reasoning

P. Battaglia, Mateusz Malinowski, David Raposo, Adam Santoro, Razvan Pascanu, T. Lillicrap, D. Barrett•Sun Jun 04 2017

This work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.

1672 0

FiLM: Visual Reasoning with a General Conditioning Layer

Aaron C. Courville, Vincent Dumoulin, H. D. Vries, Florian Strub, Ethan Perez•Thu Aug 31 2017

It is shown that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning.

3011 0

Composing Text and Image for Image Retrieval - an Empirical Odyssey

K. Murphy, Chen Sun, Lu Jiang, Li-Jia Li, Li Fei-Fei, James Hays, Nam S. Vo•Mon Dec 17 2018

This paper proposes a new way to combine image and text through residual connection, that outperforms existing approaches on 3 different datasets, namely Fashion-200k, MIT-States and a new synthetic dataset the authors create based on CLEVR.

442 0

Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction

Hyeonwoo Noh, Bohyung Han, P. H. Seo•Tue Nov 17 2015

The proposed network-joint network with the CNN for ImageQA and the parameter prediction network-is trained end-to-end through back-propagation, where its weights are initialized using a pre-trained CNN and GRU.

335 0

Automatic Spatially-Aware Fashion Concept Discovery

Xintong Han, Zuxuan Wu, L. Davis, Xiao Zhang, Yuan Li, Menglong Zhu, Yang Zhao, Phoenix X. Huang•Wed Aug 02 2017

This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites, and decomposes the visual-semantic embedding space into multiple concept-specific subspaces to facilitate structured browsing and attribute-feedback product retrieval.

300 0

Compositional Learning of Image-Text Query for Image Retrieval

M. Anwaar, Egor Labintcev, M. Kleinsteuber•Thu Jun 18 2020

This paper proposes an autoencoder based model, ComposeAE, to learn the composition of image and text query for retrieving images, which is able to outperform the state-of-the-art method TIRG on three benchmark datasets, namely: MIT-States, Fashion200k and Fashion IQ.

109 0

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

Tat-Seng Chua, Zhedong Zheng, Wei Ji, Yiyang Chen, Leigang Qu•Sun Nov 13 2022

A unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty is introduced, which prevents the model from pushing away potential candidates in the early stage, and thus improves the recall rate.

70 0

Adding a benchmark result helps the community track progress.

Image Retrieval with Multi-Modal Query | State-of-the-Art