3260 papers • 126 benchmarks • 313 datasets
Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.
(Image credit: Papersgraph)
These leaderboards are used to track progress in nethack
Use these libraries to find nethack models and implementations
No subtasks available.
It is argued that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience.
Adding a benchmark result helps the community track progress.