dialogue-understanding

Dialogue Safety Prediction

3260 papers • 126 benchmarks • 313 datasets

Determine the safety of a given dialogue context.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in dialogue-understanding

Trend

Dataset

Best Model

Actions

rt-inod-jailbreaking

ProsocialDialog

Libraries

i

Use these libraries to find dialogue-understanding models and implementations

Datasets

No datasets available.

Subtasks

No subtasks available.

Most implemented papers

ProsocialDialog: A Prosocial Backbone for Conversational Agents

Yejin Choi, Ximing Lu, Liwei Jiang, Maarten Sap, Daniel Khashabi, Hyunwoo Kim, Youngjae Yu, Gunhee Kim•Tue May 24 2022

This work introduces ProsocialDialog, the first large-scale multi-turn dialogue dataset to teach conversational agents to respond to problematic content following social norms, and introduces a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost.

Content

146

0

Paper Graph

Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations

David Nadeau, Mike Kroutikov, Karen McNeil, Simon Baribeau•Sun Apr 14 2024

This paper introduces fourteen novel datasets for the evaluation of Large Language Models' safety in the context of enterprise tasks. A method was devised to evaluate a model's safety, as determined by its ability to follow instructions and output factual, unbiased, grounded, and appropriate content. In this research, we used OpenAI GPT as point of comparison since it excels at all levels of safety. On the open-source side, for smaller models, Meta Llama2 performs well at factuality and toxicity but has the highest propensity for hallucination. Mistral hallucinates the least but cannot handle toxicity well. It performs well in a dataset mixing several tasks and safety vectors in a narrow vertical domain. Gemma, the newly introduced open-source model based on Google Gemini, is generally balanced but trailing behind. When engaging in back-and-forth conversation (multi-turn prompts), we find that the safety of open-source models degrades significantly. Aside from OpenAI's GPT, Mistral is the only model that still performed well in multi-turn tests.

9 0

Paper Graph

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

P. Schramowski, Felix Friedrich, K. Kersting, Simone Tedeschi, Roberto Navigli, Huu Nguyen, Bo Li•Fri Apr 05 2024

ALERT, a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy, is introduced, designed to evaluate the safety of LLMs through red teaming methodologies and consists of more than 45k instructions categorized using the authors' novel taxonomy.

81 0

Paper Graph

Adding a benchmark result helps the community track progress.