3260 papers • 126 benchmarks • 313 datasets
Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.
(Image credit: Papersgraph)
These leaderboards are used to track progress in dialogue-14
No benchmarks available.
Use these libraries to find dialogue-14 models and implementations
No datasets available.
Adding a benchmark result helps the community track progress.