Steerbench

Introduced in A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs2025

About this Dataset

Steerability probe example for text-rewriting.

Source: A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs

Dataset Variants

Steerbench

Papers1

A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs

The findings suggest that even strong LLMs struggle with steerability, and existing alignment strategies may be insufficient, and a framework based on a multi-dimensional goal-space that models user goals and LLM outputs as vectors with dimensions corresponding to text attributes is introduced.

Similar Datasets

MNIST

CelebA

GLUE

Statistics

Papers

1

Tasks

0

Introduced

2025

License

MIT