ImagesTexts

Flickr30k

Introduced in From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions2013

About this Dataset

The Flickr30k dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators.

Source: Guiding Long-Short Term Memory for Image Caption Generation

Image Source: Dual-Path Convolutional Image-Text Embedding with Instance Loss

Source: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

Dataset Variants

Flickr30kFlickrFlickr30k Captions testFlickr30K 1K test

Papers1

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

This work proposes to use the visual denotations of linguistic expressions to define novel denotational similarity metrics, which are shown to be at least as beneficial as distributional similarities for two tasks that require semantic inference.

Dataset Loaders

EDIT

🔥

pytorch/vision

pytorch

🔥

facebookresearch/ParlAI

pytorch

🔥

activeloopai/Hub

tfpytorch

Tasks

EDIT

Node Classification Image Retrieval Image Captioning Cross-Modal Retrieval Image-to-Text Retrieval Phrase Grounding Zero-shot Text-to-Image Retrieval Zero-Shot Cross-Modal Retrieval Semi Supervised Learning for Image Captioning mage-to-Text Retrieval Video Description

Similar Datasets

MNIST

CelebA

JFT-300M

Statistics

Papers

1

Tasks

0

Introduced

2013

License

Custom (research-only, non-commercial)

Modalities

ImagesTexts

Languages

English