3260 papers • 126 benchmarks • 313 datasets
Zero-Shot Cross-Modal Retrieval is the task of finding relevant items across different modalities without having received any training examples. For example, given an image, find a text or vice versa. The main challenge in the task is known as the heterogeneity gap: since items from different modalities have different data types, the similarity between them cannot be measured directly. Therefore, the majority of methods published to date attempt to bridge this gap by learning a latent representation space, where the similarity between items from different modalities can be measured. Source: Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
(Image credit: Papersgraph)
These leaderboards are used to track progress in zero-shot-cross-modal-retrieval-16
Use these libraries to find zero-shot-cross-modal-retrieval-16 models and implementations
No datasets available.
No subtasks available.
Adding a benchmark result helps the community track progress.