3260 papers • 126 benchmarks • 313 datasets
An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks. (Description by Evolutionary learning of interpretable decision trees) (Image Credit: OpenAI Gym)
(Image credit: Papersgraph)
These leaderboards are used to track progress in openai-gym-10
Use these libraries to find openai-gym-10 models and implementations
This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
A suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware and following a Multi-Goal Reinforcement Learning (RL) framework are introduced.
Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens.
A simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines and is able to acquire more effective policies than most off-policy algorithms when learning from purely static datasets with no additional environmental interactions is developed.
An OpenAI-gym-like gaming environment is created with the game of Little Fighter 2 (LF2), and a novel A3C+ network is presented for learning RL agents, which includes a Recurrent Info network, which utilizes game-related info features with recurrent layers to observe combo skills for fighting.
The TorchBeast design principles and implementation are described and it is demonstrated that it performs on-par with IMPALA on Atari.
A novel multi-goal RL objective based on weighted entropy is proposed, which encourages the agent to maximize the expected return, as well as to achieve more diverse goals and a maximum entropy-based prioritization framework is developed to optimize the proposed objective.
An implicit distributional actor critic that consists of a distributional critic, built on two deep generator networks, and a semi-implicit actor (SIA), powered by a flexible policy distribution to improve the sample efficiency of policy-gradient based reinforcement learning algorithms.
COOL-MC is presented, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking and algorithms to obtain bounds on the performance of so-called permissive policies.
Adding a benchmark result helps the community track progress.