Microsoft COCO Captions: Data Collection and Evaluation Server

Published in

arXiv.org(2015)

External Links:

Generate Graph

TL;DR

The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.

Abstract

In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When completed, the dataset will contain over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions will be provided. To ensure consistency in evaluation of automatic caption generation algorithms, an evaluation server is used. The evaluation server receives candidate captions and scores them using several popular metrics, including BLEU, METEOR, ROUGE and CIDEr. Instructions for using the evaluation server are provided.

Authors

C. L. Zitnick

19 papers

Xinlei Chen

9 papers

Piotr Dollár

12 papers

References46 items

Phrase-based Image Captioning

Combining Language and Vision with a Multimodal Skip-gram Model

Deep visual-semantic alignments for generating image descriptions

Simple Image Description Generator via a Linear Phrase-Based Approach

CIDEr: Consensus-based image description evaluation

Microsoft COCO Captions: Data Collection and Evaluation Server

Published in

arXiv.org(2015)

External Links:

Generate Graph

TL;DR

The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.

Abstract

Authors

C. L. Zitnick

19 papers

Xinlei Chen

9 papers

Piotr Dollár

12 papers

References46 items

Phrase-based Image Captioning

Combining Language and Vision with a Multimodal Skip-gram Model

Deep visual-semantic alignments for generating image descriptions

Simple Image Description Generator via a Linear Phrase-Based Approach

CIDEr: Consensus-based image description evaluation

Ramakrishna Vedantam

3 papers

Tsung-Yi Lin

8 papers

Saurabh Gupta

5 papers

Learning a Recurrent Visual Representation for Image Caption Generation

From captions to visual concepts and back

Long-term recurrent convolutional networks for visual recognition and description

Show and tell: A neural image caption generator

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

TreeTalk: Composition and Compression of Trees for Image Descriptions

Explain Images with Multimodal Recurrent Neural Networks

Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections

Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

Multimodal Neural Language Models

The Stanford CoreNLP Natural Language Processing Toolkit

Meteor Universal: Language Specific Translation Evaluation for Any Target Language

Comparing Automatic Evaluation Measures for Image Description

Nonparametric Method for Data-driven Image Captioning

Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world

Microsoft COCO: Common Objects in Context

AutoCaption: Automatic caption generation for personal photos

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

BabyTalk: Understanding and Generating Simple Image Descriptions

Image Description using Visual Dependency Representations

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

Automatic Caption Generation for News Images

ImageNet classification with deep convolutional neural networks

Choosing Linguistics over Vision to Describe Images

Collective Generation of Natural Image Descriptions

Distributional Semantics in Technicolor

Midge: Generating Image Descriptions From Computer Vision Detections

Im2Text: Describing Images Using 1 Million Captioned Photographs

Corpus-Guided Sentence Generation of Natural Images

Every Picture Tells a Story: Generating Sentences from Images

ImageNet: A large-scale hierarchical image database

Re-evaluating the Role of Bleu in Machine Translation Research

ROUGE: A Package for Automatic Evaluation of Summaries

A Model for Learning the Semantics of Pictures

Bleu: a Method for Automatic Evaluation of Machine Translation

Learning the semantics of words and pictures

Long Short-Term Memory

WordNet: A Lexical Database for English

Déjà Image-Captions: A Corpus of Expressive Descriptions in Repetition

The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems

Matching Words and Pictures

Field of Study

Computer Science

Journal Information

Name

ArXiv

Volume

abs/2005.00687

Venue Information

Name

arXiv.org

Type

URL

https://arxiv.org

Alternate Names

ArXiv