3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in jurisprudence
Use these libraries to find jurisprudence models and implementations
No subtasks available.
This paper presents an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher.
Evaluation of the 47 cutting-edge LLMs on Xiezhi indicates that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management.
A highly complex task that is challenging even for humans: the classification of legal reasoning according to jurisprudential philosophy is considered, and it is found that generative models perform poorly when given instructions equal to the instructions presented to human annotators through the codebook.
It is demonstrated that large language models can produce reasonable numerical ratings of the logical consistency of claims, and a mathematical approach based on sheaf theory for lifting such ratings to hypertexts such as laws, jurisprudence, and social media and evaluating their consistency globally is outlined.
Adding a benchmark result helps the community track progress.