Texts

PARANMT-50M

Introduced in ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

About this Dataset

PARANMT-50M is a dataset for training paraphrastic sentence embeddings. It consists of more than 50 million English-English sentential paraphrase pairs.

Source: ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

Dataset Variants

PARANMT-50M

Papers1

ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

This work uses ParaNMT-50M, a dataset of more than 50 million English-English sentential paraphrase pairs, to train paraphrastic sentence embeddings that outperform all supervised systems on every SemEval semantic textual similarity competition, in addition to showing how it can be used for paraphrase generation.

Tasks

EDIT

Machine Translation Semantic Textual Similarity Paraphrase Generation

Similar Datasets

GLUE

MultiNLI

Penn Treebank

Statistics

Papers

1

Tasks

32

License

Unknown

Modalities

Texts

Languages

English