LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models (2023-08-20T00:00:00.000000Z)

TL;DR

To enable cross-disciplinary conversations about LLMs in the law, it is shown how popular legal frameworks for describing legal reasoning correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary.

Abstract

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.

References59 items

Generative Interpretation

Don't Use a Cannon to Kill a Fly: An Efficient Cascading Pipeline for Long Documents

Legal Judgment Prediction: A Survey of the State of the Art

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

The Promise of AI in an Open Justice System

Overview and Discussion of the Competition on Legal Information Extraction/Entailment (COLIEE) 2021

Rethinking the field of automatic prediction of court decisions

LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

True Few-Shot Learning with Language Models

When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Language Models are Few-Shot Learners

Legal Tech, Civil Procedure, and the Future of Adversarialism

Modeling law search as prediction

A Computational Analysis of Oral Argument in the Supreme Court

MAPS: Scaling Privacy Compliance Analysis to a Million Apps

Artificial Intelligence and Law: An Overview

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Deep learning in law: early adaptation and legal word embeddings trained on large corpora

NLP Based Latent Semantic Analysis for Legal Text Summarization

The biggest lie on the Internet: ignoring the privacy policies and terms of service policies of social networking services

Neural Network Acceptability Judgments

CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service

Datasheets for datasets

A Logic for Statutes

Text summarization from legal documents: a survey

Opportunities and obstacles for deep learning in biology and medicine

Predicting and understanding law-making with word vectors and an ensemble model

Values Embedded in Legal Artificial Intelligence

The Limitations of Supply Chain Disclosure Regimes

Campaign Finance and American Democracy

From Policy Confusion to Doctrinal Clarity: Successor Liability from the Prospective of Big Data

Does Anyone Read the Fine Print? Consumer Attention to Standard-Form Contracts

The No Reading Problem in Consumer Contract Law

The Litigation State: Public Regulation and Private Lawsuits in the United States

Oral Arguments Before the Supreme Court

AI in Law Practice? So far, not much

Exploring the Effectiveness of Prompt Engineering for Legal Reasoning Tasks

Applying Large Language Models for Enhancing Contract Drafting

How Smart are Smart Readers? LLMs and the Future of the No-Reading Problem

Defeating the Empire of Forms

Private Enforcement in the States

Predicting consensus in legal document interpretation

Learning How to Use Large Language Models for Empirical Legal Research

Here's what happens when your lawyer uses chatgpt

Chatgpt coming to court, by way of self-represented litigants

Shelter check: Proactively finding tax minimization strategies via ai

Predicting consumer contracts

Artificial Intelligence for Adjudication: The Social Security Administration and AI Governance

Breaking news: Drafting client alerts to prepare for practice

Legal Judgment Prediction via Topological Learning

The Creation and Analysis of a Website Privacy Policy Corpus

Precedent and Analogy in Legal Reasoning

The PASCAL Recognising Textual Entailment Challenge

1 Models HuggingFace links for the studied open-source models in Section 5.2 can be found below