3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in legal-reasoning-3
Use these libraries to find legal-reasoning-3 models and implementations
No datasets available.
No subtasks available.
Quantitative and qualitative results on DISC-Law-Eval demonstrate the effectiveness of the system in serving various users across diverse legal scenarios, and enhances models' ability to access and utilize external legal knowledge.
The construction of a new data set is described and some preliminary experiments on it, treating the problem of finding the justification for the answers to questions as a baseline performance measure against which to evaluate future improvements.
Off-the-shelf theorem provers and model finders for HOL are assisting the LogiKEy designer of ethical intelligent agents to flexibly experiment with underlying logics and their combinations, with ethico-legal domain theories, and with concrete examples---all at the same time.
How IRAC-a framework legal scholars use to distinguish different types of legal reasoning-can guide the construction of a Foundation Model oriented benchmark is described and a seed set of 44 tasks built according to this framework is presented.
To enable cross-disciplinary conversations about LLMs in the law, it is shown how popular legal frameworks for describing legal reasoning correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary.
Practical baseline solutions based on LLMs are designed and an intriguing paradox wherein an IR system surpasses the performance of LLM+IR due to limited gains acquired by weaker LLMs from powerful IR systems is presented.
A highly complex task that is challenging even for humans: the classification of legal reasoning according to jurisprudential philosophy is considered, and it is found that generative models perform poorly when given instructions equal to the instructions presented to human annotators through the codebook.
A novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Australian Social Act for Dependent Child is constructed and the first empirical assessment of ChatGPT for IRAC analysis is conducted in order to understand how well it aligns with the analysis of legal professionals.
This study introduces TMID, a novel dataset to detect trademark infringement in merchant registrations sourced directly from Alipay, one of the world's largest e-commerce and digital payment platforms, and offers a thorough collection of legal rules and merchant and trademark-related contextual information with annotations from legal experts.
Adding a benchmark result helps the community track progress.