This paper presents an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher.
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.
A. Mensch
5 papers
Katie Millican
3 papers
Roman Ring
2 papers
Eliza Rutherford
3 papers
Jacob Menick
3 papers
Sebastian Borgeaud
4 papers
Aida Nematzadeh
4 papers
O. Vinyals
41 papers
K. Simonyan
25 papers
Yujia Li
9 papers
Igor Babuschkin
2 papers
Po-Sen Huang
3 papers
Johannes Welbl
7 papers
K. Kavukcuoglu
26 papers
D. Hassabis
17 papers
James Bradbury
4 papers
Chris Dyer
16 papers
L. Sifre
7 papers
George van den Driessche
4 papers
Erich Elsen
10 papers
D. Budden
5 papers
G. Irving
3 papers
Laura Weidinger
5 papers
Maribeth Rauh
3 papers
Iason Gabriel
3 papers
William S. Isaac
2 papers
Saffron Huang
2 papers
Lisa Anne Hendricks
8 papers
I. Higgins
3 papers
Diego de Las Casas
5 papers
Jack W. Rae
3 papers
Siddhant M. Jayakumar
2 papers
Xiang Lorraine Li
3 papers
Francis Song
2 papers
Cyprien de Masson d'Autume
2 papers
Amy Wu
2 papers
Angeliki Lazaridou
2 papers
A. Kuncoro
5 papers
Michela Paganini
2 papers
Trevor Cai
2 papers
Jordan Hoffmann
3 papers
John Aslanides
2 papers
Sarah Henderson
2 papers
Susannah Young
1 papers
Tom Hennigan
2 papers
Albin Cassirer
2 papers
Richard Powell
1 papers
Amelia Glaese
1 papers
Sumanth Dathathri
1 papers
Jonathan Uesato
3 papers
John F. J. Mellor
1 papers
Antonia Creswell
1 papers
Nat McAleese
1 papers
Elena Buchatskaya
3 papers
Esme Sutherland
1 papers
Lena Martens
1 papers
E. Gribovskaya
1 papers
Domenic Donato
1 papers
Jean-Baptiste Lespiau
2 papers
M. Tsimpoukelli
1 papers
N. Grigorev
1 papers
Doug Fritz
1 papers
Thibault Sottiaux
1 papers
Mantas Pajarskas
1 papers
Tobias Pohlen
1 papers
Z. Gong
1 papers
Daniel Toyama
2 papers
Tayfun Terzi
1 papers
Vladimir Mikulik
1 papers
Aidan Clark
3 papers
Aurelia Guy
3 papers
Chris Jones
1 papers
Matthew G. Johnson
1 papers
Blake A. Hechtman
1 papers
Edward Lockhart
3 papers
Simon Osindero
5 papers
Laura Rimell
1 papers
Kareem W. Ayoub
1 papers
J. Stanway
1 papers
L. Bennett
1 papers