Prefix-Tuning: Optimizing Continuous Prompts for Generation (2021-01-01T00:00:00.000000Z)

TL;DR

Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which is called the prefix.

Abstract

Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. Prefix-tuning draws inspiration from prompting for language models, allowing subsequent tokens to attend to this prefix as if it were “virtual tokens”. We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We show that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics that are unseen during training.

Authors

Percy Liang

37 papers

Xiang Lisa Li

2 papers

TL;DR

Abstract

Authors

References55 items

The Power of Scale for Parameter-Efficient Prompt Tuning

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

WARP: Word-level Adversarial ReProgramming

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Conditioned Natural Language Generation using only Unconditioned Language Model: An Exploration

Transformers: State-of-the-Art Natural Language Processing

GeDi: Generative Discriminator Guided Sequence Generation

DART: Open-Domain Structured Data Record to Text Generation

Language Models are Few-Shot Learners

Text-to-Text Pre-Training for Data-to-Text Tasks

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

How fine can fine-tuning be? Learning efficient language models

Extractive Summarization as Text Matching

BLEURT: Learning Robust Metrics for Text Generation

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

Incorporating BERT into Neural Machine Translation

Multilingual Denoising Pre-training for Neural Machine Translation

Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

How Can We Know What Language Models Know?

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

CTRL: A Conditional Transformer Language Model for Controllable Generation

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Text Summarization with Pretrained Encoders

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Can Unconditional Language Models Recover Arbitrary Sentences?

BERTScore: Evaluating Text Generation with BERT

Pragmatically Informative Text Generation

Parameter-Efficient Transfer Learning for NLP

Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Decoupled Weight Decay Regularization

The WebNLG Challenge: Generating Text from RDF Data

The E2E Dataset: New Challenges For End-to-End Generation

Attention is All you Need

Learning multiple visual domains with residual adapters

Federated Learning of Deep Networks using Model Averaging

Privacy-preserving deep learning

CIDEr: Consensus-based image description evaluation

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

ROUGE: A Package for Automatic Evaluation of Summaries

Bleu: a Method for Automatic Evaluation of Machine Translation

Autoprompt: Eliciting knowledge from language models with automatically generated prompts

Language Models are Unsupervised Multitask Learners

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Comparing Automatic and Human Evaluation of NLG Systems

A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION

Table of contents.

for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

volume 108 of Proceedings of Machine Learning Research

, Sylvain Gugger , Mariama Drame , Quentin Lhoest , and Alexander M . Rush . 2020 . Transformers : State - ofthe - art natural language processing

Field of Study

Journal Information

Name

Page

Venue Information

Name

Type

URL

Alternate Names