Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

natural-language-processing-5

Multimodal Machine Translation

3260 papers • 126 benchmarks • 313 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text. ( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multimodal-machine-translation-5

Trend

Dataset

Best Model

Actions

Multi30K

Multi30K

Hindi Visual Genome (Test Set)

Hindi Visual Genome (Test Set)

Libraries

i

Use these libraries to find multimodal-machine-translation-5 models and implementations

2 papers 125

Datasets

Multi30K

Hindi Visual Genome

Hindi Visual Genome

WMT 2016 IT

WMT 2016 Biomedical

WMT 2016 Biomedical

Subtasks

Face to Face Translation Multimodal Lexical Translation

Most implemented papers

Attention is All you Need

Noam Shazeer, Ashish Vaswani, Lukasz Kaiser, Jakob Uszkoreit, Niki Parmar, I. Polosukhin, Llion Jones, Aidan N. Gomez•Sun Jun 11 2017

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

164803

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

Hindi Visual Genome (Challenge Set)

Hindi Visual Genome (Challenge Set)

0

Multi30K: Multilingual English-German Image Descriptions

Lucia Specia, Desmond Elliott, Stella Frank, K. Sima'an•Sun May 01 2016

This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions.

631 0

Does Multimodality Help Human and Machine for Translation and Image Captioning?

Joost van de Weijer, Loïc Barrault, Fethi Bougares, Ozan Caglayan, Yaxing Wang, Walid Aransa, Marc Masana, Mercedes García-Martínez•Sun May 29 2016

The systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge are presented, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data.

87 0

NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

Loïc Barrault, Fethi Bougares, Ozan Caglayan, Walid Aransa, Mercedes García-Martínez, Adrien Bardet•Wed May 31 2017

Abstract In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM’s top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.

65 0

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Zhou Yu, Yong Jae Lee, Mingyang Zhou, Runxiang Cheng•Tue Jul 31 2018

A novel multimodal machine translation model that utilizes parallel visual and textual information and jointly optimizes the learning of a shared visual-language embedding and a translator that achieves competitive state-of-the-art results on the Multi30K and the Ambiguous COCO datasets.

83 0

Findings of the Third Shared Task on Multimodal Machine Translation

Loïc Barrault, Lucia Specia, Desmond Elliott, Stella Frank, Fethi Bougares, Chiraag Lala•Tue Oct 30 2018

Compared to last year, the performance of the multimodal submissions improved, but text-only systems remain competitive.

160 0

UMONS Submission for WMT18 Multimodal Translation Task

S. Dupont, Jean-Benoit Delbrouck•Sun Oct 14 2018

A novel architecture, called deepGRU, is explored, based on recent findings in the related task of Neural Image Captioning, which leads to the best METEOR translation score for both constrained (English, image) -> German and (English) -> French sub-tasks.

9 0

Latent Variable Model for Multi-modal Translation

Iacer Calixto, Miguel Rios, Wilker Aziz•Wed Oct 31 2018

This work proposes to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model, and shows that the latent variable MMT formulation improves considerably over strong baselines.

68 0

Multimodal Machine Translation with Embedding Prediction

Mamoru Komachi, Tosho Hirasawa, Hayahide Yamagishi, Yukio Matsumura•Sun Mar 31 2019

This study effectively combines two approaches to improve NMT of low-resource domains in the context of multimodal NMT and explores how to take full advantage of pretrained word embeddings to better translate rare words.

16 0

Adding a benchmark result helps the community track progress.

Multimodal Machine Translation | State-of-the-Art