Mistral 7B

Published in

arXiv.org(2023)

External Links:

Generate Graph DownloadPDF

TL;DR

This work introduces Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency, which leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost.

Abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

Authors

A. Mensch

5 papers

Guillaume Lample

15 papers

Thibaut Lavril

10 papers

References29 items

Efficient Memory Management for Large Language Model Serving with PagedAttention

Code Llama: Open Foundation Models for Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Mistral 7B

Published in

arXiv.org(2023)

External Links:

Generate Graph DownloadPDF

TL;DR

Abstract

Authors

A. Mensch

5 papers

Guillaume Lample

15 papers

Thibaut Lavril

10 papers

References29 items

Efficient Memory Management for Large Language Model Serving with PagedAttention

Code Llama: Open Foundation Models for Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Diego de Las Casas

5 papers

Thomas Wang

4 papers

Teven Le Scao

6 papers

Timothée Lacroix

6 papers

Albert Qiaochu Jiang

2 papers

Alexandre Sablayrolles

2 papers

Chris Bamford

2 papers

Devendra Singh Chaplot

6 papers

Florian Bressand

2 papers

Gianna Lengyel

2 papers

Lucile Saulnier

3 papers

Lélio Renard Lavaud

2 papers

Pierre Stock

2 papers

William El Sayed

2 papers

LLaMA: Open and Efficient Foundation Language Models

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Training Verifiers to Solve Math Word Problems

Program Synthesis with Large Language Models

Evaluating Large Language Models Trained on Code

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring Massive Multitask Language Understanding

Longformer: The Long-Document Transformer

PIQA: Reasoning about Physical Commonsense in Natural Language

Natural Questions: A Benchmark for Question Answering Research

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

HellaSwag: Can a Machine Really Finish Your Sentence?

Generating Long Sequences with Sparse Transformers

QuAC: Question Answering in Context

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Attention is All you Need

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

An empirical analysis of compute-optimal large language model training

An Adversarial Winograd Schema Challenge at Scale

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

Socialiqa: Com-monsense reasoning about social interactions

xformers: A modular and hackable transformer modelling library

Field of Study

Computer Science

Journal Information

Name

ArXiv

Volume

abs/2005.00687

Venue Information

Name

arXiv.org

Type

URL

https://arxiv.org

Alternate Names

ArXiv