LLaMA: Open and Efficient Foundation Language Models

Published in

arXiv.org(2023)

External Links:

Generate Graph

TL;DR

LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, is introduced and it is shown that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.

Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

Authors

Edouard Grave

12 papers

Armand Joulin

18 papers

Hugo Touvron

5 papers

References80 items

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Solving Quantitative Reasoning Problems with Language Models

Emergent Abilities of Large Language Models

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

LLaMA: Open and Efficient Foundation Language Models

Published in

arXiv.org(2023)

External Links:

Generate Graph

TL;DR

Abstract

Authors

Edouard Grave

12 papers

Armand Joulin

18 papers

Hugo Touvron

5 papers

References80 items

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Solving Quantitative Reasoning Problems with Language Models

Emergent Abilities of Large Language Models

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Aur'elien Rodriguez

3 papers

Guillaume Lample

15 papers

Thibaut Lavril

10 papers

Naman Goyal

13 papers

Eric Hambro

5 papers

Gautier Izacard

5 papers

Xavier Martinet

2 papers

Timothée Lacroix

6 papers

Baptiste Rozière

3 papers

Faisal Azhar

1 papers

Reducing Activation Recomputation in Large Transformer Models

OPT: Open Pre-trained Transformer Language Models

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

InCoder: A Generative Model for Code Infilling and Synthesis

PaLM: Scaling Language Modeling with Pathways

Training Compute-Optimal Large Language Models

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Training language models to follow instructions with human feedback

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

LaMDA: Language Models for Dialog Applications

Self-attention Does Not Need $O(n^2)$ Memory

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Sustainable AI: Environmental Implications, Challenges and Opportunities

Training Verifiers to Solve Math Word Problems

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Program Synthesis with Large Language Models

Evaluating Large Language Models Trained on Code

RoFormer: Enhanced Transformer with Rotary Position Embedding

Measuring Mathematical Problem Solving With the MATH Dataset

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Measuring Massive Multitask Language Understanding

Language Models are Few-Shot Learners

GLU Variants Improve Transformer

Scaling Laws for Neural Language Models

PIQA: Reasoning about Physical Commonsense in Natural Language

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Root Mean Square Layer Normalization

A Constructive Prediction of the Generalization Error Across Scales

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

The Woman Worked as a Babysitter: On Biases in Language Generation

Natural Questions: A Benchmark for Question Answering Research

Quantifying Social Biases in Contextual Word Representations

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

HellaSwag: Can a Machine Really Finish Your Sentence?

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Gender Bias in Coreference Resolution

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Deep Learning Scaling is Predictable, Empirically

Decoupled Weight Decay Regularization

Attention is All you Need

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

RACE: Large-scale ReAding Comprehension Dataset From Examinations

Exploring the Limits of Language Modeling

Neural Machine Translation of Rare Words with Subword Units

N-gram Counts and Language Models from the Common Crawl

One billion word benchmark for measuring progress in statistical language modeling

Generating Sequences With Recurrent Neural Networks

Scalable Modified Kneser-Ney Language Model Estimation

Large Language Models in Machine Translation

A Neural Probabilistic Language Model

Text Compression as a Test for Artificial Intelligence

Long Short-Term Memory

Improved backing-off for M-gram language modeling

A Statistical Approach to Machine Translation

Finding Structure in Time

Estimation of probabilities from sparse data for the language model component of a speech recognizer

A Maximum Likelihood Approach to Continuous Speech Recognition

Computing Machinery and Intelligence

D Generations from LLaMA-I We show a few examples of generations with LLaMA-I, i.e. LLaMA-65B fine-tuned with the protocol and instruction dataset from

Jurassic-1: Technical details and evaluation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Language Models are Unsupervised Multitask Learners

An Adversarial Winograd Schema Challenge at Scale

Socialiqa: Com-monsense reasoning about social interactions

Improving Language Understanding by Generative Pre-Training

Recurrent neural network based language model

A Mathematical Theory of Communication

Prediction and Entropy of Printed English

Pluto: What? Come on, man. That's not fair. Sun: I'm sorry, but it's true. You just don't meet the criteria anymore. Pluto: This is bulls**t! I've been a planet for over 70 years! Sun: Things change

Field of Study

Computer Science

Journal Information

Name

ArXiv

Volume

abs/2005.00687

Venue Information

Name

arXiv.org

Type

URL

https://arxiv.org

Alternate Names

ArXiv