Texts

SubEdits

Introduced in Can Automatic Post-Editing Improve NMT?

About this Dataset

SubEdits is a human-annnoated post-editing dataset of neural machine translation outputs, compiled from in-house NMT outputs and human post-edits of subtitles form Rakuten Viki. It is collected from English-German annotations and contains 160k triplets.

Source: https://github.com/shamilcm/pedra Image Source: Chollampatt et al

Source: Can Automatic Post-Editing Improve NMT?

Dataset Variants

SubEdits

Papers1

Can Automatic Post-Editing Improve NMT?

A larger corpus of human post-edits of English to German NMT is compiled and empirically shows that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field.

Dataset Loaders

EDIT

📦

shamilcm/pedra

none

Tasks

EDIT

Machine Translation Automatic Post-Editing

Similar Datasets

GLUE

MultiNLI

Penn Treebank

Statistics

Papers

1

Tasks

64

Modalities

Texts

Languages

EnglishGerman