Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

natural-language-processing-3

Cross-Lingual Document Classification

3260 papers • 126 benchmarks • 313 datasets

Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in cross-lingual-document-classification-3

Trend

Dataset

Best Model

Actions

MLDoc Zero-Shot English-to-Spanish

MLDoc Zero-Shot English-to-Spanish

MLDoc Zero-Shot English-to-French

MLDoc Zero-Shot English-to-French

Libraries

i

Use these libraries to find cross-lingual-document-classification-3 models and implementations

Datasets

RCV1

MLDoc

Subtasks

News Classification

Most implemented papers

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

Holger Schwenk, Mikel Artetxe•Tue Dec 25 2018

An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.

1102

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

MLDoc Zero-Shot English-to-German

MLDoc Zero-Shot English-to-German

MLDoc Zero-Shot English-to-Chinese

MLDoc Zero-Shot English-to-Chinese

MLDoc Zero-Shot English-to-Russian

MLDoc Zero-Shot English-to-Russian

MLDoc Zero-Shot English-to-Italian

MLDoc Zero-Shot English-to-Italian

MLDoc Zero-Shot English-to-Japanese

MLDoc Zero-Shot English-to-Japanese

Reuters RCV1/RCV2 English-to-German

Reuters RCV1/RCV2 English-to-German

Reuters RCV1/RCV2 German-to-English

Reuters RCV1/RCV2 German-to-English

MLDoc Zero-Shot German-to-French

MLDoc Zero-Shot German-to-French

0

ZeRO: Memory optimizations Toward Training Trillion Parameter Models

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He•Thu Oct 03 2019

A novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, vastly improving training speed while increasing the model size that can be efficiently trained, allowing to scale the model size proportional to the number of devices with sustained high efficiency.

1469 0

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Sebastian Ruder, Jeremy Howard, Sylvain Gugger, Julian Martin Eisenschlos, Marcin Kardas, Piotr Czapla•Mon Sep 09 2019

Multi-lingual language model Fine-Tuning (MultiFiT) is proposed to enable practitioners to train and fine-tune language models efficiently in their own language and a zero-shot method using an existing pretrained cross-lingUAL model is proposed.

104 0

Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification

Kilian Q. Weinberger, Claire Cardie, Xilun Chen, Ben Athiwaratkun•Sun Jun 05 2016

An Adversarial Deep Averaging Network (ADAN1) is proposed to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist.

324 0

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

Yoshua Bengio, G. Corrado, Stephan Gouws•Wed Oct 08 2014

It is shown that bilingual embeddings learned using the proposed BilBOWA model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.

394 0

A Corpus for Multilingual Document Classification in Eight Languages

Holger Schwenk, Xian Li•Mon Apr 30 2018

A new subset of the Reuters corpus with balanced class priors for eight languages is proposed, adding Italian, Russian, Japanese and Chinese, which provides strong baselines for all language transfer directions using multilingual word and sentence embeddings respectively.

147 0

Robust Cross-lingual Embeddings from Parallel Sentences

Martin Jaggi, Robert West, A. Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier•Tue Sep 24 2019

This work proposes a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word and sentence representations and significantly improves cross-lingsual sentence retrieval performance over all other approaches while maintaining parity with the current state-of-the-art methods on word-translation.

15 0

Multilingual Distributed Representations without Word Alignment

Karl Moritz Hermann, Phil Blunsom•Thu Dec 19 2013

This work proposes a method for learning distributed representations in a multilingual setup and shows that these representations are semantically informative and apply them to a cross-lingual document classification task where they outperform the previous state of the art.

159 0

Multilingual Models for Compositional Distributed Semantics

Karl Moritz Hermann, Phil Blunsom•Wed Apr 16 2014

A novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings and demonstrates that these representations are semantically plausible and can capture semantic relationships across languages without parallel data.

320 0

Learning Crosslingual Word Embeddings without Bilingual Corpora

Long Duong, H. Kanayama, Tengfei Ma, Steven Bird, Trevor Cohn•Wed Jun 29 2016

This method takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages to achieve state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the Monolingual word similarity and cross-lingual document classification task.

116 0

Adding a benchmark result helps the community track progress.

Cross-Lingual Document Classification | State-of-the-Art