3260 papers • 126 benchmarks • 313 datasets
Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.
(Image credit: Papersgraph)
These leaderboards are used to track progress in nethack-score-12
Use these libraries to find nethack-score-12 models and implementations
No subtasks available.
Adding a benchmark result helps the community track progress.