3260 papers • 126 benchmarks • 313 datasets
Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.
(Image credit: Papersgraph)
These leaderboards are used to track progress in nethack-score-9
Use these libraries to find nethack-score-9 models and implementations
No subtasks available.
It is argued that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience.
Adding a benchmark result helps the community track progress.