3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in moral-scenarios-3
Use these libraries to find moral-scenarios-3 models and implementations
No subtasks available.
This paper presents an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher.
This paper proposes ETHICSSUITE, a test suite that presents complex, contextualized, and realistic moral scenarios to test LLMs, and proposes a novel suggest-critic-reflect (SCR) process, serving as an automated test oracle to detect unethical suggestions.
This paper introduces statistical measures and evaluation metrics that quantify the probability of an LLM"making a choice", the associated uncertainty, and the consistency of that choice, and applies this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious.
This work annotates a new dataset, **MORAL EVENTS**, and proposes **MOKA**, a moral event extraction framework with **MO**ral **K**nowledge **A**ugmentation, which leverages knowledge derived from moral words and moral scenarios to produce structural representations of morality-bearing events.
An information-theoretic measure called Semantic Graph Entropy (SaGE), grounded in the concept of “Rules of Thumb” (RoTs) to measure a model’s moral consistency is proposed, which reveals that task accuracy and consistency are independent problems, and there is a dire need to investigate these issues further.
Adding a benchmark result helps the community track progress.