3260 papers • 126 benchmarks • 313 datasets
Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models
(Image credit: Papersgraph)
These leaderboards are used to track progress in nlg-evaluation-17
No benchmarks available.
Use these libraries to find nlg-evaluation-17 models and implementations
No datasets available.
No subtasks available.
Adding a benchmark result helps the community track progress.