Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks (2022-04-16T00:00:00.000000Z)

TL;DR

Tk-Instruct is built, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples) that outperforms existing instruction-following models such as InstructGPT by over 9% on the authors' benchmark despite being an order of magnitude smaller.

Abstract

How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions—training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.

References54 items

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Can language models learn from explanations in context?

One-Shot Learning from a Demonstration with Hierarchical Latent Language

Training language models to follow instructions with human feedback

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

ZeroPrompt: Scaling Prompt-Based Pretraining to 1, 000 Tasks Improves Zero-Shot Generalization

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

MetaICL: Learning to Learn In Context

Multitask Prompted Training Enables Zero-Shot Task Generalization

FILM: Following Instructions in Language with Modular Methods

Reframing Instructional Prompts to GPTk’s Language

Finetuned Language Models Are Zero-Shot Learners

FLEX: Unifying Evaluation for Few-Shot NLP

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

The Power of Scale for Parameter-Efficient Prompt Tuning

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems

Learning to Generate Task-Specific Adapters from Task Description

Author’s Sentiment Prediction

Learning from Task Descriptions

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

The Turking Test: Can Language Models Understand Instructions?

Transformers: State-of-the-Art Natural Language Processing

Understanding Points of Correspondence between Sentences for Abstractive Summarization

Natural language to SQL: Where are we today?

Language Models are Few-Shot Learners

UnifiedQA: Crossing Format Boundaries With a Single QA System

Fast Domain Adaptation for Goal-Oriented Dialogue Using a Hybrid Generative-Retrieval Transformer

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

The Natural Language Decathlon: Multitask Learning as Question Answering

Deep Learning Scaling is Predictable, Empirically

Harvesting Common-sense Navigational Knowledge for Robotics from Uncurated Text Corpora

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

The E2E Dataset: New Challenges For End-to-End Generation

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

The Language Demographics of Amazon Mechanical Turk

Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning

A unified architecture for natural language processing: deep neural networks with multitask learning

ROUGE: A Package for Automatic Evaluation of Summaries

Scaling to Very Very Large Corpora for Natural Language Disambiguation

Multitask Learning

OHSUMED: an interactive retrieval evaluation and new large test collection for research

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

An Adversarial Winograd Schema Challenge at Scale

The Sixth PASCAL Recognizing Textual Entailment Challenge

The PASCAL Recognising Textual Entailment Challenge