LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, is introduced and it is shown that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Aur'elien Rodriguez
3 papers
Guillaume Lample
15 papers
Thibaut Lavril
10 papers
M. Lachaux
8 papers
Naman Goyal
13 papers
Eric Hambro
5 papers
Gautier Izacard
5 papers
Xavier Martinet
2 papers
Timothée Lacroix
6 papers
Baptiste Rozière
3 papers
Faisal Azhar
1 papers