Introduced in A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs2025
Steerability probe example for text-rewriting.
The findings suggest that even strong LLMs struggle with steerability, and existing alignment strategies may be insufficient, and a framework based on a multi-dimensional goal-space that models user goals and LLM outputs as vectors with dimensions corresponding to text attributes is introduced.
MIT