AdapterHub: A Framework for Adapting Transformers (2020-07-15T00:00:00.000000Z)

TL;DR

AdaptersHub is proposed, a framework that allows dynamic “stiching-in” of pre-trained adapters for different tasks and languages that enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios.

Abstract

The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters—small learnt bottleneck layers inserted within each layer of a pre-trained model— ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic “stiching-in” of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at AdapterHub.ml

Authors

Sebastian Ruder

20 papers

Kyunghyun Cho

23 papers

Ivan Vulic

27 papers

TL;DR

Abstract

Authors

References44 items

AdapterDrop: On the Efficiency of Adapters in Transformers

Language Models are Few-Shot Learners

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Scaling Laws for Neural Language Models

Unsupervised Cross-lingual Representation Learning at Scale

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Simple, Scalable Adaptation for Neural Machine Translation

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Energy and Policy Considerations for Deep Learning in NLP

Episodic Memory in Lifelong Language Learning

Transfer Learning in Natural Language Processing

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Parameter-Efficient Transfer Learning for NLP

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Neural Network Acceptability Judgments

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Universal Language Model Fine-tuning for Text Classification

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

An Overview of Multi-Task Learning in Deep Neural Networks

Attention is All you Need

Learning multiple visual domains with residual adapters

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Layer Normalization

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Learning to Compose Neural Networks for Question Answering

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT and PALs: Projected attention layers for efﬁcient adaptation in multi-task learning

First quora dataset release: Question pairs

The Seventh PASCAL Recognizing Textual Entailment Challenge

The Third PASCAL Recognizing Textual Entailment Challenge

The Second PASCAL Recognising Textual Entailment Challenge

Automatically Constructing a Corpus of Sentential Paraphrases

The PASCAL Recognising Textual Entailment Challenge

Anthony Moi an-dArt Pierric Cistac

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names