3260 papers • 126 benchmarks • 313 datasets
Cross-Modal Retrieval is used for implementing a retrieval task across different modalities. such as image-text, video-text, and audio-text Cross-Modal Retrieval. The main challenge of Cross-Modal Retrieval is the modality gap and the key solution of Cross-Modal Retrieval is to generate new representations from different modalities in the shared subspace, such that new generated features can be applied in the computation of distance metrics, such as cosine distance and Euclidean distance. References: [1] Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study [2] Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval
(Image credit: Papersgraph)
These leaderboards are used to track progress in cross-modal-retrieval-34
Use these libraries to find cross-modal-retrieval-34 models and implementations
No datasets available.
Adding a benchmark result helps the community track progress.